Prepare training inputs

Prepared data (clips + associated label/class information) are written to the filesystem for later consumption during model training. In addition to extracting clips from raw audio, the below interfaces also support the following audio pre-processing operations -

  • standardizing the sampling frequencies of all recordings,

  • application of low-pass, high-pass or band-pass filters, and

  • waveform normalization.

The parameters for data preparation are specified using a Python dictionary object that is passed as a parameter (named audio_settings) to the below functions. The following keys are supported:

  • desired_fs (required) The target sampling frequency (in Hz). Audio files having other sampling frequencies will be resampled to this value. Note that upsampling from a lower sampling rate introduces frequency banding in the resulting audio.

  • clip_length (required) The duration of each audio segment (in seconds).

  • clip_advance (required) The amount (in seconds) of overlap between successive segments is controlled by this. If clip_advance equals clip_length, then the overlap between successive segments will be zero.

  • filterspec (optional) If specified, must be a 3-element ordered list/tuple specifying -

    • filter order (integer)

    • cutoff frequency(ies) (a 1-element or 2-element list/tuple)

    • filter type (string; one of ‘lowpass’, ‘highpass’ or ‘bandpass’)

    If filter type is ‘bandpass’, the the cutoff frequency must be a 2-element list/tuple.

  • normalize_clips (optional; default: True) If True, will scale the waveform within each resulting clip to be in the range [-1.0, 1.0].

koogu.prepare.from_selection_table_map(audio_settings, audio_seltab_list, audio_root, seltab_root, output_root, annotation_reader=None, desired_labels=None, remap_labels_dict=None, negative_class_label=None, **kwargs)

Pre-process training data using info contained in audio_seltab_list.

Parameters:
  • audio_settings – A dictionary specifying the parameters for processing audio from files.

  • audio_seltab_list – A list containing pairs (tuples or sub-lists) of relative paths to audio files and the corresponding annotation (selection table) files. Alternatively, you could also specify (path to) a 2-column csv file containing these pairs of entries (in the same order). Only use the csv option if the paths are simple (i.e., the filenames do not contain commas or other special characters).

  • audio_root – The full paths of audio files listed in audio_seltab_list are resolved using this as the base directory.

  • seltab_root – The full paths of annotations files listed in audio_seltab_list are resolved using this as the base directory.

  • output_root – “Prepared” data will be written to this directory.

  • annotation_reader – If not None, must be an annotation reader instance from annotations. Defaults to Raven Reader.

  • desired_labels – The target set of class labels. If not None, must be a list of class labels. Any selections (read from the selection tables) having labels that are not in this list will be discarded. This list will be used to populate classes_list.json that will define the classes for the project. If None, then the list of classes will be populated with the annotation labels read from all selection tables.

  • remap_labels_dict

    If not None, must be a Python dictionary describing mapping of class labels. For details, see similarly named parameter to the constructor of koogu.utils.detections.LabelHelper.

    Note

    If desired_labels is not None, mappings for which targets are not listed in desired_labels will be ignored.

  • negative_class_label – A string (e.g. ‘Other’, ‘Noise’) which will be used as a label to identify the negative class clips (those that did not match any annotations). If None (default), saving of negative class clips will be disabled.

Other parameters specific to koogu.utils.detections.assess_annotations_and_clips_match() can also be specified, and will be passed as-is to the function.

Returns:

A dictionary whose keys are annotation tags (either discovered from the set of annotations, or same as desired_labels if not None) and the values are the number of clips produced for the corresponding class.

koogu.prepare.from_top_level_dirs(audio_settings, class_dirs, audio_root, output_root, remap_labels_dict=None, **kwargs)

Pre-process training data available as audio files in class_dirs.

Parameters:
  • audio_settings – A dictionary specifying the parameters for processing audio from files.

  • class_dirs – A list containing relative paths to class-specific directories containing audio files. Each directory’s contents will be recursively searched for audio files.

  • audio_root – The full paths of the class-specific directories listed in class_dirs are resolved using this as the base directory.

  • output_root – “Prepared” data will be written to this directory.

  • remap_labels_dict – If not None, must be a Python dictionary describing mapping of class labels. For details, see similarly named parameter to the constructor of koogu.utils.detections.LabelHelper.

  • filetypes – (optional) Restrict listing to files matching extensions specified in this parameter. Has defaults if unspecified.

Returns:

A dictionary whose keys are annotation tags (discovered from the set of annotations) and the values are the number of clips produced for the corresponding class.