Data transformation

Certain data transformations that are unavailable in TensorFlow/Keras are implemented as custom Keras layers in Koogu.

class koogu.data.tf_transformations.Audio2Spectral(*args: Any, **kwargs: Any)

Layer for converting waveforms into time-frequency representations.

Parameters:
  • fs – sampling frequency of the data in the last dimension of inputs.

  • spec_settings

    A Python dictionary describing the settings to be used for producing spectrograms. Supported keys in the dictionary include:

    • win_len: (required) Length of the analysis window (in seconds)

    • win_overlap_prc: (required) Fraction of the analysis window to have as overlap between successive analysis windows. Commonly, a 50% (or 0.50) overlap is considered.

    • nfft_equals_win_len: (optional; boolean) If True (default), NFFT will equal the number of samples resulting from win_len. If False, NFFT will be set to the next power of 2 that is ≥ the number of samples resulting from win_len.

    • tf_rep_type: (optional) A string specifying the transformation output. ‘spec’ results in a linear scale spectrogram. ‘spec_db’ (default) results in a logarithmic scale (dB) spectrogram.

    • eps: (default: 1e-10) A small positive quantity added to avoid computing log(0.0).

    • bandwidth_clip: (optional; 2-element list/tuple) If specified, the generated spectrogram will be clipped along the frequency axis to only include components in the specified bandwidth.

  • eps – (optional) If specified, will override the eps value in spec_settings.

  • name – (optional; string) Name for the layer.

class koogu.data.tf_transformations.GaussianBlur(*args: Any, **kwargs: Any)

Layer for applying Gaussian blur to time-frequency (tf) representations.

Parameters:
  • sigma – Scalar value defining the Gaussian kernel.

  • apply_2d – (boolean; default: True) If True, will apply smoothing along both time- and frequency axes. Otherwise, smoothing is only applied along the frequency axis.

class koogu.data.tf_transformations.Linear2dB(*args: Any, **kwargs: Any)

Layer for converting time-frequency (tf) representations from linear to decibel scale.

Parameters:
  • eps – Epsilon value to add, for avoiding computing log(0.0).

  • full_scale – (boolean) Whether to convert to dB full-scale.

  • name – (optional; string) Name for the layer.

class koogu.data.tf_transformations.Spec2Img(*args: Any, **kwargs: Any)

Layer for converting time-frequency representations into images. The layer’s inputs can either be a single spectrogram (shape: H x W) or a batch of B spectrograms (shape: B x H x W).

Parameters:
  • cmap – An Nx3 array of RGB color values. Typically, N is 256. If cmap also contains alpha values (Nx4 instead of Nx3), the last channel will be discarded. For example, to specify a ‘jet’ colorscale, you could use matplotlib.cm.jet(range(256)).

  • vmin – (optional; default: None) If specified along with vmax, spectrogram values will be scaled to the range [vmin, vmax].

  • vmax – (optional; default: None) If specified along with vmin, spectrogram values will be scaled to the range [vmin, vmax].

  • img_size – (optional; default: None) If not None, must specify a 2-element tuple (new H, new W) that indicates the shape that the output image must be resized to.

  • resize_method – (optional; default: ‘bilinear’) If resizing of spectrogram(s) is enabled (via img_size), this parameter will define the method used for resizing. For available options, see TensorFlow’s tf.image.resize().

If only one of vmin or vmax is specified, it will be ignored and spectrogram values will be scaled relative to the minimum and maximum values within each spectrogram. If both vmin and vmax are specified, vmin must be < vmax.

Returns:

If img_size was None, will return a tensor of shape [H x W x 3] or [B x H x W x 3]. If img_size was specified, then replace H with img_size[0] and W with img_size[1].