Prepare training inputs

Prepared data (clips + associated label/class information) are written to the filesystem for later consumption during model training. In addition to extracting clips from raw audio, the below interfaces also support the following audio pre-processing operations -

standardizing the sampling frequencies of all recordings,
application of low-pass, high-pass or band-pass filters, and
waveform normalization.

The parameters for data preparation are specified using a Python dictionary object that is passed as a parameter (named audio_settings) to the below functions. The following keys are supported:

desired_fs (required) The target sampling frequency (in Hz). Audio files having other sampling frequencies will be resampled to this value. Note that upsampling from a lower sampling rate introduces frequency banding in the resulting audio.

clip_length (required) The duration of each audio segment (in seconds).

clip_advance (required) The amount (in seconds) of overlap between successive segments is controlled by this. If clip_advance equals clip_length, then the overlap between successive segments will be zero.

filterspec (optional) If specified, must be a 3-element ordered list/tuple specifying -

filter order (integer)

cutoff frequency(ies) (a 1-element or 2-element list/tuple)

filter type (string; one of ‘lowpass’, ‘highpass’ or ‘bandpass’)

If filter type is ‘bandpass’, the the cutoff frequency must be a 2-element list/tuple.

normalize_clips (optional; default: True) If True, will scale the waveform within each resulting clip to be in the range [-1.0, 1.0].

koogu.prepare.from_annotations(audio_settings, audio_annot_list, audio_root, annot_root, output_root, annotation_reader=None, desired_labels=None, remap_labels_dict=None, negative_class_label=None, **kwargs)

Pre-process training data using info contained in audio_annot_list.

Parameters:

audio_settings – A dictionary specifying the parameters for processing audio from files.
audio_annot_list – A list containing pairs (list-like) of relative paths to audio files and the corresponding annotation file(s). The latter can be a single path string or a nested list of path strings. Alternatively, you could also specify (path to) a csv file containing these pairs of entries (in the same order; include 3rd, 4th, … columns if you need to specify additional annotation files corresponding to an audio path). Only use the csv option if the paths are simple (i.e., the filenames do not contain commas or other special characters).
audio_root – The full paths of audio files listed in audio_annot_list are resolved using this as the base directory.
annot_root – The full paths of annotations files listed in audio_annot_list are resolved using this as the base directory.
output_root – “Prepared” data will be written to this directory.
annotation_reader – If not None, must be an annotation reader instance from the annotations module. Defaults to Raven Reader.
desired_labels – The target set of class labels. If not None, must be a list of class labels. Any annotations (read from the annotation files) having labels that are not in this list will be discarded. This list will be used to populate classes_list.json that will define the classes for the project. If None, then the list of classes will be populated with the annotation labels read from all annotation files.
remap_labels_dict –
If not None, must be a Python dictionary describing mapping of class labels. For details, see similarly named parameter to the constructor of koogu.utils.detections.LabelHelper.

Note

If desired_labels is not None, mappings for which targets are not listed in desired_labels will be ignored.
negative_class_label – A string (e.g. ‘Other’, ‘Noise’) which will be used as a label to identify the negative class clips (those that did not match any annotations). If None (default), saving of negative class clips will be disabled.

Other parameters specific to koogu.utils.detections.assess_annotations_and_clips_match() can also be specified, and will be passed as-is to the function.

Returns:: A dictionary whose keys are annotation tags (either discovered from the set of annotations, or same as desired_labels if not None) and the values are the number of clips produced for the corresponding class.