Data augmentation
On-the-fly data augmentations can be applied during training/validation by implementing the desired augmentation operations in the pre_transform()
and post_transform()
methods of the classes derived from
. Given that the CNN models used in bioacoustics typically operate on inputs that are transformed into 2-dimensional spectrograms, augmentations applicable to time-domain waveforms can be implemented in pre_transform()
and augmentations applicable to spectrograms can be implemented in post_transform()
This requires writing code to use the TensorFlow API directly.
The below example extends
by adding two augmentation operations in the time-domain and one in the spectro-temporal domain. The example also demonstrates the use of a few pre-defined & customizable augmentations. You may also add code in these methods to implement your own types of augmentation.
import tensorflow as tf
from import Temporal, SpectroTemporal
class MySpectralDataFeeder(
def pre_transform(self, clip, label, is_training, **kwargs):
Applying augmentations to waveform.
output = clip
# Added noise will have an amplitude that is -30 dB to -18 dB below
# the peak amplitude of the input.
gauss_noise = Temporal.AddGaussianNoise((-30, -18))
# Add Gaussian noise to 25% of inputs.
output = tf.cond(tf.random.uniform([], 0, 1) <= 1 / 4,
lambda: gauss_noise(output),
lambda: output)
# The volume of the input will be linearly lowered/increased over its
# duration, by a factor ≤ 3 dB.
vol_ramp = Temporal.RampVolume((-3, 3))
# Alter volume for 10% of the inputs.
output = tf.cond(tf.random.uniform([], 0, 1) <= 1 / 10,
lambda: vol_ramp(output),
lambda: output)
return output, label
def post_transform(self, spec, label, is_training, **kwargs):
Applying augmentations to power spectral density spectrogram.
output = spec
# Smear energies along the time-axis while retaining the frequency
# content intact.
smear_time = SpectroTemporal.SmearTime((-2, 2))
# Apply to one in three inputs.
output = tf.cond(tf.random.uniform([], 0, 1) <= 1 / 3,
lambda: smear_time(output),
lambda: output)
return output, label
The above example demonstrates finer control in implementing augmentations wherein one may employ branching/looping constructs to combine different augmentations as desired.
Convenience interface
Sometimes, you may want to simply apply a series of augmentations in a particular order, with respective chosen probabilities. The below code snippet demonstrates the use of convenience interface to apply chained augmentations. You need not use any TensorFlow API here 😀.
def pre_transform(self, clip, label, is_training, **kwargs):
Applying augmentations to waveform.
# List of time-domain augmentations
augmentations = [
Temporal.AddGaussianNoise((-30, -18)),
Temporal.RampVolume((-3, 3))
# At what rates should each be applied (same ordering as above)
probabilities = [
0.25, # apply to 1 in 4 clips
0.10 # apply to 1 in 10 clips
output = Temporal.apply_chain(clip, augmentations, probabilities)
return output, label
def post_transform(self, spec, label, is_training, **kwargs):
Applying augmentations to power spectral density spectrogram.
# List of spectrogram augmentations
augmentations = [
SpectroTemporal.SmearTime((-2, 2)),
SpectroTemporal.SquishFrequency((-1, 1))
# At what rates should each be applied (same ordering as above)
probabilities = [
0.33, # apply to 1 in 3 input spectrograms
0.20 # apply to 1 in 5 input spectrograms
output = SpectrroTemporal.apply_chain(spec, augmentations, probabilities)
return output, label