Data augmentation
Koogu supports applying randomized on-the-fly augmentations to input samples during training/validation.
Time-domain augmentations
- class koogu.data.augmentations.Temporal.AddEcho(delay_range, fs, level_range=None)
Add echo. Produce echo effect by adding a dampened and delayed copy of the input to the input. The dampened copy is produced by using a random attenuation factor, and the phase of the dampened copy is also randomly inverted.
- Parameters:
delay_range – A 2-element list/tuple, values specified in seconds. The delay amount will be randomly chosen from this range.
fs – Sampling frequency of the input. The chosen delay amount will be converted to number of samples using this value.
level_range – A 2-element list/tuple or None (default). The attenuation factor is derived from this range. If None, it will default to [-18 dB, -12 dB].
- class koogu.data.augmentations.Temporal.AddGaussianNoise(val_range)
Add Gaussian noise.
- Parameters:
val_range – A 2-element list/tuple. The level of the added noise will be randomly chosen from the range val_range[0] dB to val_range[1] dB (both must be non-positive). The peak noise level will approximately be as many dB below the peak level of the input signal.
- class koogu.data.augmentations.Temporal.RampVolume(val_range)
Alter the volume of signal by ramping up/down its amplitude linearly across the duration of the signal. In a way simulates the effect of the source moving away or towards the receiver.
- Parameters:
val_range – A 2-element list/tuple. Ramp factor will be randomly chosen in the range val_range[0] dB to val_range[1] dB. If the chosen factor is non-negative, will ramp up from ~-val dB. If the chosen factor is negative, will ramp down to -abs(~val) dB.
- class koogu.data.augmentations.Temporal.ShiftPitch(val_range)
Shift the pitch of the contained sound(s) up or down.
- Parameters:
val_range – A 2-element list/tuple. The factor by which the pitch will be shifted will be chosen randomly from the range val_range[0] to val_range[1]. Set the range around 1.0. If the chosen value is above 1.0, pitch will be shifted upwards. If the chosen value is below 1.0, pitch will be shifted downwards. If the chosen value equals 1.0, there will be no change.
Spectro-temporal augmentations
- class koogu.data.augmentations.SpectroTemporal.AlterDistance(val_range)
Mimic the effect of increasing/reducing the distance between a source and receiver by attenuating/amplifying higher frequencies while keeping lower frequencies relatively unchanged.
- Parameters:
val_range – A 2-element list/tuple. The attenuation/amplification factor will be randomly chosen from the range val_range[0] dB to val_range[1] dB. A negative value chosen effects attenuation, while a positive value chosen effects amplification.
- class koogu.data.augmentations.SpectroTemporal.SmearFrequency(val_range)
Smear the spectrogram along the frequency axis. Can have the effect of shifting the pitch of the contained sounds.
- Parameters:
val_range – A 2-element integer list/tuple. The amount to smear is derived from a value chosen in the integer range val_range[0] to val_range[1]. Specify the range to reflect the number of frequency bins that will be involved in the smearing operation. If a positive value is chosen, the smearing occurs upwards. If a negative value is chosen, the smearing occurs downwards.
- class koogu.data.augmentations.SpectroTemporal.SmearTime(val_range)
Smear the spectrogram along the time axis. Can have the effect of elongating the duration of the contained sounds.
- Parameters:
val_range – A 2-element integer list/tuple. The amount to smear is derived from a value chosen in the integer range val_range[0] to val_range[1]. Specify the range to reflect the number of time windows that will be involved in the smearing operation. If a positive value is chosen, the smearing occurs forwards. If a negative value is chosen, the smearing occurs backwards.
- class koogu.data.augmentations.SpectroTemporal.SquishFrequency(val_range)
Squish the spectrogram along the frequency axis. Can have the effect of shifting the pitch of the contained sounds.
- Parameters:
val_range – A 2-element integer list/tuple. The amount to squish is derived from a value chosen in the integer range val_range[0] to val_range[1]. Specify the range to reflect the number of frequency bins that will be involved in the squishing operation. If a positive value is chosen, the squishing occurs upwards. If a negative value is chosen, the squishing occurs downwards.
- class koogu.data.augmentations.SpectroTemporal.SquishTime(val_range)
Squish the spectrogram along the time axis. Can have the effect of compressing the duration of the contained sounds.
- Parameters:
val_range – A 2-element integer list/tuple. The amount to squish is derived from a value chosen in the integer range val_range[0] to val_range[1]. Specify the range to reflect the number of time windows that will be involved in the squishing operation. If a positive value is chosen, the squishing occurs forwards. If a negative value is chosen, the squishing occurs backwards.
Convenience interface
- static Temporal.apply_chain(clip, augmentations, probabilities, t_axis=-1)
Apply a chain of Temporal augmentations.
- Parameters:
clip – The audio clip to apply the augmentations to.
augmentations – List of the Temporal augmentations to apply.
probabilities – List of probabilities (each in the range 0-1), one per augmentation listed in augmentations.
t_axis – (Defaults to -1, the last dimension) Index of the axis in clip corresponding to its time axis.
- Returns:
The tensor clip after applying the specified augmentations.
- static SpectroTemporal.apply_chain(spec, augmentations, probabilities, f_axis=0, t_axis=1)
Apply a chain of SpectroTemporal augmentations.
- Parameters:
spec – The spectrogram to apply the augmentations to.
augmentations – List of the SpectroTemporal augmentations to apply.
probabilities – List of probabilities (each in the range 0-1), one per augmentation listed in augmentations.
f_axis – (Defaults to 0) Index of the axis in spec corresponding to its frequency axis.
t_axis – (Defaults to 1) Index of the axis in spec corresponding to its time axis.
- Returns:
The tensor spec after applying the specified augmentations.