Data transformation

Certain data transformations that are unavailable in TensorFlow/Keras are implemented as custom Keras layers in Koogu.

class koogu.data.tf_transformations.Audio2Spectral(*args: Any, **kwargs: Any)

Layer for converting waveforms into time-frequency representations.

Parameters:
  • fs – sampling frequency of the data in the last dimension of inputs.

  • spec_settings

    A Python dictionary describing the settings to be used for producing spectrograms. Supported keys in the dictionary include:

    • win_len: (required) Length of the analysis window (in seconds)

    • win_overlap_prc: (required) Fraction of the analysis window to have as overlap between successive analysis windows. Commonly, a 50% (or 0.50) overlap is considered.

    • nfft_equals_win_len: (optional; boolean) If True (default), NFFT will equal the number of samples resulting from win_len. If False, NFFT will be set to the next power of 2 that is ≥ the number of samples resulting from win_len.

    • tf_rep_type: (optional) A string specifying the transformation output. ‘spec’ results in a linear scale spectrogram. ‘spec_db’ (default) results in a logarithmic scale (dB) spectrogram.

    • eps: (default: 1e-10) A small positive quantity added to avoid computing log(0.0).

    • bandwidth_clip: (optional; 2-element list/tuple) If specified, the generated spectrogram will be clipped along the frequency axis to only include components in the specified bandwidth.

  • eps – (optional) If specified, will override the eps value in spec_settings.

  • name – (optional; string) Name for the layer.

class koogu.data.tf_transformations.GaussianBlur(*args: Any, **kwargs: Any)

Layer for applying Gaussian blur to time-frequency (tf) representations.

Parameters:
  • sigma – Scalar value defining the Gaussian kernel.

  • apply_2d – (boolean; default: True) If True, will apply smoothing along both time- and frequency axes. Otherwise, smoothing is only applied along the frequency axis.

class koogu.data.tf_transformations.Linear2dB(*args: Any, **kwargs: Any)

Layer for converting time-frequency (tf) representations from linear to decibel scale.

Parameters:
  • eps – Epsilon value to add, for avoiding computing log(0.0).

  • full_scale – (boolean) Whether to convert to dB full-scale.

  • name – (optional; string) Name for the layer.