Inferencing

koogu.inference.analyze_clips(trained_model, clips, batch_size=1, audio_filepath=None)

Apply a trained model to one or more audio clips and obtain scores.

Parameters:

trained_model – A koogu.model.TrainedModel instance.
clips – An [N x ?] numpy array of N input waveforms.
batch_size – (default: 1) Control how many clips are processed in a single batch. Increasing this helps improve throughput, but requires more RAM.
audio_filepath – (default: None) If not None, will display a progress bar.

Returns:

A 2-element tuple consisting of -

detection/classification scores ([N x M] numpy array corresponding to the N clips and M target classes), and
the total time taken to process all the clips.

koogu.inference.recognize(model_dir, audio_root, output_dir=None, raw_detections_dir=None, **kwargs)

Batch-process audio files using a trained model.

Parameters:

model_dir – Path to directory where the trained model for use in making inferences is available.
audio_root – Path to directory from which to load audio files for inferences. Can also set this to a single audio file instead of a directory. See optional parameters recursive and combine_outputs that may be used when audio_root points to a directory.
output_dir – If not None, processed recognition results (Raven selection tables) will be written out into this directory. At least one of output_dir or raw_detections_dir must be specified.
raw_detections_dir – If not None, raw outputs from the model will be written out into this directory. At least one of output_dir or raw_detections_dir must be specified.

Optional parameters

Parameters:

clip_advance – If specified, override the value that was read from the model’s files. The value defines the amount of clip advance when preparing audio.
threshold – (float, 0-1) Suppress writing of detections with scores below this value. Defaults to 0.
recursive – (bool) If set, the contents of audio_root will be processed recursively.
filetypes – Audio file types to restrict processing to. Option is ignored if processing a single file. Can specify multiple types, as a list. Defaults to [‘.wav’, ‘.WAV’, ‘.flac’, ‘.aif’, ‘.mp3’].
combine_outputs – (bool) When processing audio files from entire directories, enabling this option combines recognition results of processing every file within a directory and writes them to a single output file. When enabled, outputs will contain 2 additional fields describing offsets of detections in the corresponding audio files.
channels – (int or list of ints) When audio files have multiple channels, set which channels to restrict processing to. If unspecified, all available channels will be processed. E.g., setting to 0 saves the first channel, setting to [0, 2] saves the first and third channels.
scale_scores – (bool) Enabling this will scale the raw scores before they are written out. Use of this setting is recommended only when the output of a model is based on softmax (not multi-label) and the model was trained with training data where each input corresponded to a single class.
frequency_extents – A dictionary of per-class frequency bounds of each label class. Will be used when producing the output selection table files. If unspecified, the “Low Frequency (Hz)” and “High Frequency (Hz)” fields in the output table will be the same for all classes and will be set equal to the bandwidth used in preparing model inputs.
reject_class – Name (case sensitive) of the class (like ‘Noise’ or ‘Other’) that must be ignored from the recognition results. The corresponding detections will not be written to the output selection tables. Can specify multiple classes for rejection, as a list.
batch_size – (int; default: 1) Size to batch audio file’s clips into. Increasing this may improve speed on computers with high RAM.
num_fetch_threads – (int; default: 1) Number of background threads that will fetch audio from files in parallel.
show_progress – (bool) If enabled, messages indicating progress of processing will be shown on console output.