Data augmentation

On-the-fly data augmentations can be applied during training/validation by implementing the desired augmentation operations in the pre_transform() and post_transform() methods of the classes derived from koogu.data.feeder.BaseFeeder. Given that the CNN models used in bioacoustics typically operate on inputs that are transformed into 2-dimensional spectrograms, augmentations applicable to time-domain waveforms can be implemented in pre_transform() and augmentations applicable to spectrograms can be implemented in post_transform().

Note

This requires writing code to use the TensorFlow API directly.

The below example extends koogu.data.feeder.SpectralDataFeeder by adding two augmentation operations in the time-domain and one in the spectro-temporal domain. The example also demonstrates the use of a few pre-defined & customizable augmentations. You may also add code in these methods to implement your own types of augmentation.

import tensorflow as tf
from koogu.data.feeder import SpectralDataFeeder
from koogu.data.augmentations import Temporal, SpectroTemporal


class MySpectralDataFeeder(SpectralDataFeeder):

    def pre_transform(self, clip, label, is_training, **kwargs):
        """
        Applying augmentations to waveform.
        """

        output = clip

        # Added noise will have an amplitude that is -30 dB to -18 dB below
        # the peak amplitude of the input.
        gauss_noise = Temporal.AddGaussianNoise((-30, -18))

        # Add Gaussian noise to 25% of inputs.
        output = tf.cond(tf.random.uniform([], 0, 1) <= 1 / 4,
                         lambda: gauss_noise(output),
                         lambda: output)

        # The volume of the input will be linearly lowered/increased over its
        # duration, by a factor ≤ 3 dB.
        vol_ramp = Temporal.RampVolume((-3, 3))

        # Alter volume for 10% of the inputs.
        output = tf.cond(tf.random.uniform([], 0, 1) <= 1 / 10,
                         lambda: vol_ramp(output),
                         lambda: output)

        return output, label

    def post_transform(self, spec, label, is_training, **kwargs):
        """
        Applying augmentations to power spectral density spectrogram.
        """

        output = spec

        # Smear energies along the time-axis while retaining the frequency
        # content intact.
        smear_time = SpectroTemporal.SmearTime((-2, 2))

        # Apply to one in three inputs.
        output = tf.cond(tf.random.uniform([], 0, 1) <= 1 / 3,
                         lambda: smear_time(output),
                         lambda: output)

        return output, label

The above example demonstrates finer control in implementing augmentations wherein one may employ branching/looping constructs to combine different augmentations as desired.

Convenience interface

Sometimes, you may want to simply apply a series of augmentations in a particular order, with respective chosen probabilities. The below code snippet demonstrates the use of convenience interface to apply chained augmentations. You need not use any TensorFlow API here 😀.

    def pre_transform(self, clip, label, is_training, **kwargs):
        """
        Applying augmentations to waveform.
        """

        # List of time-domain augmentations
        augmentations = [
            Temporal.AddGaussianNoise((-30, -18)),
            Temporal.RampVolume((-3, 3))
        ]

        # At what rates should each be applied (same ordering as above)
        probabilities = [
            0.25,       # apply to 1 in 4 clips
            0.10        # apply to 1 in 10 clips
        ]

        output = Temporal.apply_chain(clip, augmentations, probabilities)

        return output, label

    def post_transform(self, spec, label, is_training, **kwargs):
        """
        Applying augmentations to power spectral density spectrogram.
        """

        # List of spectrogram augmentations
        augmentations = [
            SpectroTemporal.SmearTime((-2, 2)),
            SpectroTemporal.SquishFrequency((-1, 1))
        ]

        # At what rates should each be applied (same ordering as above)
        probabilities = [
            0.33,       # apply to 1 in 3 input spectrograms
            0.20        # apply to 1 in 5 input spectrograms
        ]

        output = SpectroTemporal.apply_chain(spec, augmentations, probabilities)

        return output, label