Data pre-processing

Pre-processed data (clips + associated label/class information) are written to the filesystem for later consumption during model training. In addition to extracting clips from raw audio, the below interfaces also support the following audio pre-processing operations -

standardizing the sampling frequencies of all recordings,
application of low-pass, high-pass or band-pass filters, and
waveform normalization.

The parameters for pre-processing data are specified using a Python dictionary object that is passed as a parameter (named audio_settings) to the below functions. The following keys are supported:

desired_fs (required) The target sampling frequency (in Hz). Audio files having other sampling frequencies will be resampled to this value. Note that upsampling from a lower sampling rate introduces frequency banding in the resulting audio.

clip_length (required) The duration of each audio segment (in seconds).

clip_advance (required) The amount (in seconds) of overlap between successive segments is controlled by this. If clip_advance equals clip_length, then the overlap between successive segments will be zero.

filterspec (optional) If specified, must be a 3-element ordered list/tuple specifying -

filter order (integer)

cutoff frequency(ies) (a 1-element or 2-element list/tuple)

filter type (string; one of ‘lowpass’, ‘highpass’ or ‘bandpass’)

If filter type is ‘bandpass’, the the cutoff frequency must be a 2-element list/tuple.

normalize_clips (optional; default: True) If True, will scale the waveform within each resulting clip to be in the range [-1.0, 1.0].