Transfer learning

Bioacoustics researchers often tend to employ transfer learning approaches, using models that were pre-trained on other data (e.g., images). Such approaches are sometimes useful when training datasets are small. Using transfer learning in a Koogu workflow simply involves ‘plugging in’ a pre-trained model in an implementation of the abstract class koogu.model.architectures.BaseArchitecture.

The below example implements BaseArchitecture, by defining build_network() to incorporate a publicly available pre-trained MobileNetV2 whose weights were trained using the large ImageNet dataset. You may choose to use a different pre-trained model from here or from other sources. Often, these pre-trained models have specific expectations with respect to input shapes, types, etc.

Assuming a data pipeline similar to the example in Quick-start guide (which uses SpectralDataFeeder), the 1-channel spectrograms generated by the feeder must be converted to a 3-channel RGB image using a suitable colorscale (and resized) before they become inputs to the pre-trained model. We use Spec2Img for this purpose. The pre-trained model will be used as a “feature extractor”, and a classification layer will be added. The “training” of the full model will only update weights of the added classification layer.

import tensorflow as tf
from koogu.data.tf_transformations import Spec2Img
from koogu.model.architectures import BaseArchitecture
from matplotlib import colormaps


class MyTransferModel(BaseArchitecture):

    def build_network(self, input_tensor, is_training, **kwargs):

        # Many of the available pre-trained models expect inputs to be of a
        # particular size. The `input_tensor` may not already be in that shape,
        # depending on the chosen data preparation parameters (e.g., with
        # koogu.data.feeder.SpectralDataFeeder). We need to resize the images to
        # match the input shape of the pre-trained model.
        # MobileNetV2 defaults to an input size of 224x224, and also supports a
        # few other sizes. 160x160 is a supported size, and we use that in this
        # example.
        target_img_size = (160, 160)

        # Choose your favourite colorscale from matplotlib or other sources.
        my_cmap = colormaps['jet'](range(256))
        # `my_cmap` will be a 256 element array of RGB color values from the
        # "Jet" colorscale.

        # First, need to convert input spectrograms to equivalent RGB images.
        # Spec2Img will convert 1-channel spectrograms to 3-channel RGB images
        # (with values in the range [0.0, 1.0]) and resize them as desired.
        to_image = Spec2Img(my_cmap, img_size=target_img_size)

        # The pre-trained MobileNetV2 expects RGB values to be scaled to the
        # range [-1.0, 1.0].
        rescale = tf.keras.layers.Rescaling(2.0, offset=-1.0)
        # NOTE: `Rescaling` was added in TensorFlow v2.6.0. If you use an older
        #       version, you can implement this operation by simply multiplying
        #       the output of to_image() by 2.0 and then subtracting 1.0.

        # Load the pre-trained MobileNetV2 model with ImageNet weights, and
        # without the trailing fully-connected layer.
        pretrained_cnn = tf.keras.applications.MobileNetV2(
            input_shape=target_img_size + (3, ),    # Include RGB dimension
            include_top=False,
            weights='imagenet')
        pretrained_cnn.trainable = False            # Freeze CNN weights

        # Pooling layer
        global_average_layer = tf.keras.layers.GlobalAveragePooling2D()

        # Put them all together now.
        # NOTE: The "training=False" parameter to `pretrained_cnn` is required
        #       to ensure that BatchNorm layers in the model operate in
        #       inference mode (for more details, see TensorFlow's webpage on
        #       transfer learning).
        output = to_image(input_tensor)
        output = rescale(output)
        output = pretrained_cnn(output, training=False)
        output = global_average_layer(output)

        # NOTE: Do not add the classification layer. It will be added by Koogu's
        #       internal code.

        return output

With an architecture defined this way, you can simply replace the following code block in step 2 of the Quick-start guide

71model = architectures.DenseNet(
72    [4, 4, 4],                                 # 3 dense-blocks, 4 layers each
73    preproc=[ ('Conv2D', {'filters': 16}) ],   # Add a 16-filter pre-conv layer
74    dense_layers=[32]                          # End with a 32-node dense layer
75)

… with this -

model = MyTransferModel()

That’s it! The remainder of the Koogu workflow described in the Quick-start guide will now employ transfer learning.