Prepare training inputs
Prepared data (clips + associated label/class information) are written to the filesystem for later consumption during model training. In addition to extracting clips from raw audio, the below interfaces also support the following audio pre-processing operations -
standardizing the sampling frequencies of all recordings,
application of low-pass, high-pass or band-pass filters, and
waveform normalization.
The parameters for data preparation are specified using a Python dictionary object that is passed as a parameter (named audio_settings
) to the below functions. The following keys are supported:
desired_fs (required) The target sampling frequency (in Hz). Audio files having other sampling frequencies will be resampled to this value. Note that upsampling from a lower sampling rate introduces frequency banding in the resulting audio.
clip_length (required) The duration of each audio segment (in seconds).
clip_advance (required) The amount (in seconds) of overlap between successive segments is controlled by this. If clip_advance equals clip_length, then the overlap between successive segments will be zero.
filterspec (optional) If specified, must be a 3-element ordered list/tuple specifying -
filter order (integer)
cutoff frequency(ies) (a 1-element or 2-element list/tuple)
filter type (string; one of ‘lowpass’, ‘highpass’ or ‘bandpass’)
If filter type is ‘bandpass’, the the cutoff frequency must be a 2-element list/tuple.
normalize_clips (optional; default: True) If True, will scale the waveform within each resulting clip to be in the range [-1.0, 1.0].
- koogu.prepare.from_selection_table_map(audio_settings, audio_seltab_list, audio_root, seltab_root, output_root, annotation_reader=None, desired_labels=None, remap_labels_dict=None, negative_class_label=None, **kwargs)
Pre-process training data using info contained in
audio_seltab_list
.- Parameters:
audio_settings – A dictionary specifying the parameters for processing audio from files.
audio_seltab_list – A list containing pairs (tuples or sub-lists) of relative paths to audio files and the corresponding annotation (selection table) files. Alternatively, you could also specify (path to) a 2-column csv file containing these pairs of entries (in the same order). Only use the csv option if the paths are simple (i.e., the filenames do not contain commas or other special characters).
audio_root – The full paths of audio files listed in
audio_seltab_list
are resolved using this as the base directory.seltab_root – The full paths of annotations files listed in
audio_seltab_list
are resolved using this as the base directory.output_root – “Prepared” data will be written to this directory.
annotation_reader – If not None, must be an annotation reader instance from
annotations
. Defaults to RavenReader
.desired_labels – The target set of class labels. If not None, must be a list of class labels. Any selections (read from the selection tables) having labels that are not in this list will be discarded. This list will be used to populate classes_list.json that will define the classes for the project. If None, then the list of classes will be populated with the annotation labels read from all selection tables.
remap_labels_dict –
If not None, must be a Python dictionary describing mapping of class labels. For details, see similarly named parameter to the constructor of
koogu.utils.detections.LabelHelper
.Note
If
desired_labels
is not None, mappings for which targets are not listed indesired_labels
will be ignored.negative_class_label – A string (e.g. ‘Other’, ‘Noise’) which will be used as a label to identify the negative class clips (those that did not match any annotations). If None (default), saving of negative class clips will be disabled.
Other parameters specific to
koogu.utils.detections.assess_annotations_and_clips_match()
can also be specified, and will be passed as-is to the function.- Returns:
A dictionary whose keys are annotation tags (either discovered from the set of annotations, or same as
desired_labels
if not None) and the values are the number of clips produced for the corresponding class.
See also
- koogu.prepare.from_top_level_dirs(audio_settings, class_dirs, audio_root, output_root, remap_labels_dict=None, **kwargs)
Pre-process training data available as audio files in
class_dirs
.- Parameters:
audio_settings – A dictionary specifying the parameters for processing audio from files.
class_dirs – A list containing relative paths to class-specific directories containing audio files. Each directory’s contents will be recursively searched for audio files.
audio_root – The full paths of the class-specific directories listed in
class_dirs
are resolved using this as the base directory.output_root – “Prepared” data will be written to this directory.
remap_labels_dict – If not None, must be a Python dictionary describing mapping of class labels. For details, see similarly named parameter to the constructor of
koogu.utils.detections.LabelHelper
.filetypes – (optional) Restrict listing to files matching extensions specified in this parameter. Has defaults if unspecified.
- Returns:
A dictionary whose keys are annotation tags (discovered from the set of annotations) and the values are the number of clips produced for the corresponding class.