Data pre-processing

Pre-processed data (clips + associated label/class information) are written to the filesystem for later consumption during model training. In addition to extracting clips from raw audio, the below interfaces also support the following audio pre-processing operations -

  • standardizing the sampling frequencies of all recordings,

  • application of low-pass, high-pass or band-pass filters, and

  • waveform normalization.

The parameters for pre-processing data are specified using a Python dictionary object that is passed as a parameter (named audio_settings) to the below functions. The following keys are supported:

  • desired_fs (required) The target sampling frequency (in Hz). Audio files having other sampling frequencies will be resampled to this value. Note that upsampling from a lower sampling rate introduces frequency banding in the resulting audio.

  • clip_length (required) The duration of each audio segment (in seconds).

  • clip_advance (required) The amount (in seconds) of overlap between successive segments is controlled by this. If clip_advance equals clip_length, then the overlap between successive segments will be zero.

  • filterspec (optional) If specified, must be a 3-element ordered list/tuple specifying -

    • filter order (integer)

    • cutoff frequency(ies) (a 1-element or 2-element list/tuple)

    • filter type (string; one of ‘lowpass’, ‘highpass’ or ‘bandpass’)

    If filter type is ‘bandpass’, the the cutoff frequency must be a 2-element list/tuple.

  • normalize_clips (optional; default: True) If True, will scale the waveform within each resulting clip to be in the range [-1.0, 1.0].

koogu.data.preprocess.from_selection_table_map(audio_settings, audio_seltab_list, audio_root, seltab_root, output_root, desired_labels=None, remap_labels_dict=None, negative_class_label=None, **kwargs)

Pre-process training data using info contained in audio_seltab_list.

Parameters:
  • audio_settings – A dictionary specifying the parameters for processing audio from files.

  • audio_seltab_list – A list containing pairs (tuples or sub-lists) of relative paths to audio files and the corresponding annotation (selection table) files.

  • audio_root – The full paths of audio files listed in audio_seltab_list are resolved using this as the base directory.

  • seltab_root – The full paths of annotations files listed in audio_seltab_list are resolved using this as the base directory.

  • output_root – “Prepared” data will be written to this directory.

  • desired_labels – The target set of class labels. If not None, must be a list of class labels. Any selections (read from the selection tables) having labels that are not in this list will be discarded. This list will be used to populate classes_list.json that will define the classes for the project. If None, then the list of classes will be populated with the annotation labels read from all selection tables.

  • remap_labels_dict

    If not None, must be a Python dictionary describing mapping of class labels. For details, see similarly named parameter to the constructor of koogu.utils.detections.LabelHelper.

    Note

    If desired_labels is not None, mappings for which targets are not listed in desired_labels will be ignored.

  • negative_class_label – A string (e.g. ‘Other’, ‘Noise’) which will be used as a label to identify the negative class clips (those that did not match any annotations). If None (default), saving of negative class clips will be disabled.

Other parameters specific to koogu.utils.detections.assess_annotations_and_clips_match() can also be specified, and will be passed as-is to the function.

Returns:

A dictionary whose keys are annotation tags (either discovered from the set of annotations, or same as desired_labels if not None) and the values are the number of clips produced for the corresponding class.

koogu.data.preprocess.from_top_level_dirs(audio_settings, class_dirs, audio_root, output_root, remap_labels_dict=None, **kwargs)

Pre-process training data available as audio files in class_dirs.

Parameters:
  • audio_settings – A dictionary specifying the parameters for processing audio from files.

  • class_dirs – A list containing relative paths to class-specific directories containing audio files. Each directory’s contents will be recursively searched for audio files.

  • audio_root – The full paths of the class-specific directories listed in class_dirs are resolved using this as the base directory.

  • output_root – “Prepared” data will be written to this directory.

  • remap_labels_dict – If not None, must be a Python dictionary describing mapping of class labels. For details, see similarly named parameter to the constructor of koogu.utils.detections.LabelHelper.

  • filetypes – (optional) Restrict listing to files matching extensions specified in this parameter. Has defaults if unspecified.

Returns:

A dictionary whose keys are annotation tags (discovered from the set of annotations) and the values are the number of clips produced for the corresponding class.