Data pre-processing
Pre-processed data (clips + associated label/class information) are written to the filesystem for later consumption during model training. In addition to extracting clips from raw audio, the below interfaces also support the following audio pre-processing operations -
standardizing the sampling frequencies of all recordings,
application of low-pass, high-pass or band-pass filters, and
waveform normalization.
The parameters for pre-processing data are specified using a Python dictionary object that is passed as a parameter (named audio_settings
) to the below functions. The following keys are supported:
desired_fs (required) The target sampling frequency (in Hz). Audio files having other sampling frequencies will be resampled to this value. Note that upsampling from a lower sampling rate introduces frequency banding in the resulting audio.
clip_length (required) The duration of each audio segment (in seconds).
clip_advance (required) The amount (in seconds) of overlap between successive segments is controlled by this. If clip_advance equals clip_length, then the overlap between successive segments will be zero.
filterspec (optional) If specified, must be a 3-element ordered list/tuple specifying -
filter order (integer)
cutoff frequency(ies) (a 1-element or 2-element list/tuple)
filter type (string; one of ‘lowpass’, ‘highpass’ or ‘bandpass’)
If filter type is ‘bandpass’, the the cutoff frequency must be a 2-element list/tuple.
normalize_clips (optional; default: True) If True, will scale the waveform within each resulting clip to be in the range [-1.0, 1.0].
- koogu.data.preprocess.from_selection_table_map(audio_settings, audio_seltab_list, audio_root, seltab_root, output_root, desired_labels=None, remap_labels_dict=None, negative_class_label=None, **kwargs)
Pre-process training data using info contained in
audio_seltab_list
.- Parameters:
audio_settings – A dictionary specifying the parameters for processing audio from files.
audio_seltab_list – A list containing pairs (tuples or sub-lists) of relative paths to audio files and the corresponding annotation (selection table) files.
audio_root – The full paths of audio files listed in
audio_seltab_list
are resolved using this as the base directory.seltab_root – The full paths of annotations files listed in
audio_seltab_list
are resolved using this as the base directory.output_root – “Prepared” data will be written to this directory.
desired_labels – The target set of class labels. If not None, must be a list of class labels. Any selections (read from the selection tables) having labels that are not in this list will be discarded. This list will be used to populate classes_list.json that will define the classes for the project. If None, then the list of classes will be populated with the annotation labels read from all selection tables.
remap_labels_dict –
If not None, must be a Python dictionary describing mapping of class labels. For details, see similarly named parameter to the constructor of
koogu.utils.detections.LabelHelper
.Note
If
desired_labels
is not None, mappings for which targets are not listed indesired_labels
will be ignored.negative_class_label – A string (e.g. ‘Other’, ‘Noise’) which will be used as a label to identify the negative class clips (those that did not match any annotations). If None (default), saving of negative class clips will be disabled.
Other parameters specific to
koogu.utils.detections.assess_annotations_and_clips_match()
can also be specified, and will be passed as-is to the function.- Returns:
A dictionary whose keys are annotation tags (either discovered from the set of annotations, or same as
desired_labels
if not None) and the values are the number of clips produced for the corresponding class.
- koogu.data.preprocess.from_top_level_dirs(audio_settings, class_dirs, audio_root, output_root, remap_labels_dict=None, **kwargs)
Pre-process training data available as audio files in
class_dirs
.- Parameters:
audio_settings – A dictionary specifying the parameters for processing audio from files.
class_dirs – A list containing relative paths to class-specific directories containing audio files. Each directory’s contents will be recursively searched for audio files.
audio_root – The full paths of the class-specific directories listed in
class_dirs
are resolved using this as the base directory.output_root – “Prepared” data will be written to this directory.
remap_labels_dict – If not None, must be a Python dictionary describing mapping of class labels. For details, see similarly named parameter to the constructor of
koogu.utils.detections.LabelHelper
.filetypes – (optional) Restrict listing to files matching extensions specified in this parameter. Has defaults if unspecified.
- Returns:
A dictionary whose keys are annotation tags (discovered from the set of annotations) and the values are the number of clips produced for the corresponding class.