Annotations and Detections

class koogu.utils.detections.LabelHelper(classes_list, remap_labels_dict=None, negative_class_label=None, fixed_labels=True, assessment_mode=False)

Provides functionality for manipulating and managing class labels in a problem space, without resorting to altering selection tables.

Parameters:

classes_list – List of class labels. When used during data preparation, the list may be generated from available classes or be provided as a pre-defined list. When used during performance assessments, it is typically populated from the classes_list.json file that is saved alongside raw detections.
remap_labels_dict –
(default: None) If not None, must be a dictionary describing mapping of class labels. Use this to
- update existing class’ labels
  
  (e.g. {'c1': 'new_c1'}),
- merge together existing classes
  
  (e.g. {'c4': 'c1'}), or
- combine existing classes into new ones
  
  (e.g. {'c4': 'new_c2', 'c23', 'new_c2'}).
Avoid chaining of mappings (e.g. {'c1': 'c2', 'c2': 'c3'}).
negative_class_label – (default: None) If not None, must be a string (e.g. ‘Other’, ‘Noise’) which will be used as a label to identify the negative class clips (those that did not match any annotations). If specified, will be used in conjunction with remap_labels_dict.
fixed_labels – (bool; default: True) When True, classes_list will remain unchanged - any new mapping targets specified in remap_labels_dict will not be added and any mapped-out class labels will not be omitted. Typically, it should be set to True when classes_list is a pre-defined list during data preparation, and always during performance assessments.
assessment_mode – (bool; default: False) Set to True when invoked during performance assessments.

property classes_list: The final list of class names in the problem space, after performing manipulations based on remap_labels_dict (if specified).

property labels_to_indices: A Python dictionary mapping class names (string) to zero-based indices.

property negative_class_index: Index (zero-based) of the negative class (if specified) in classes_list.

koogu.utils.detections.assess_annotations_and_clips_match(clip_offsets, clip_len, num_classes, annots_times, annots_class_idxs, min_annot_overlap_fraction=1.0, keep_only_centralized_annots=False, negative_class_idx=None, max_nonmatch_overlap_fraction=0.0)

Match clips to annotations and return “coverage scores” and a mask of ‘matched annotations’. Coverage score is a value between 0.0 and 1.0 and describes how much of a particular class’ annotation(s) is/are covered by each clip.

Parameters:

clip_offsets – M-length array of start samples (offset from the start of the audio file) of M clips.
clip_len – Number of waveform samples in each clip.
num_classes – Number of classes in the given application.
annots_times – A numpy array (shape Nx2) of start-end pairs defining annotations’ temporal extents, in terms of sample indices.
annots_class_idxs – An N-length list of zero-based indices to the class corresponding to each annotation.
min_annot_overlap_fraction – Lower threshold on how much coverage a clip must have with an annotation for the annotation to be considered “matched”.
keep_only_centralized_annots – If enabled (default: False), very short annotations (< half of clip_len) will generate full coverage (1.0) only if they occur within the central 50% extents of the clip or if the annotation cuts across the center of the clip. For short annotations that do not satisfy these conditions, their normally-computed coverage value will be scaled down based on the annotation’s distance from the center of the clip.
negative_class_idx – If not None, clips that do have no (or small) overlap with any annotation will be marked as clips of the non-target class whose index this parameter specifies. See max_non_match_overlap_fraction for further control.
max_nonmatch_overlap_fraction – A clip without enough overlap with any annotations will be marked as non-target class only if its overlap with any annotation is less than this amount (default 0.0). This parameter is only used when negative_class_idx is set.

Returns:

A 2-element tuple containing -

MxP “coverage” matrix corresponding to the M clips and P classes. The values in the matrix will be:

1.0 - if either the m-th clip fully contained an annotation from the

p-th class or vice versa (possible when annotation is longer

than clip_len);

<1.0 - if there was partial coverage (the number of overlapping

samples is divided by the shorter of clip_len or

annotation length);

0.0 - if the m-th clip had no overlap with any annotations from the

p-th class.
N-length boolean mask of annotations that were matched with at least one clip under the condition of min_annot_overlap_fraction.

koogu.utils.detections.assess_annotations_and_detections_match(num_classes, gt_times, gt_labels, det_times, det_labels, min_gt_coverage=0.5, min_det_usage=0.5)

Match elements describing time-spans from two collections. Typically, one collection corresponds to ground-truth (gt) temporal extents and the other collection corresponds to detection (det) temporal extents.

Parameters:

num_classes – Number of classes of the various time-events.
gt_times – Mx2 numpy array representing the start-end times of M ground-truth events.
gt_labels – M-length integer array indicating the class of each of the M ground-truth events.
det_times – Nx2 numpy array representing the start-end times of N detection events.
det_labels – N-length integer array indicating the class of each of the N detection events.
min_gt_coverage – A floating point value (in the range 0-1) indicating the minimum fraction of a ground-truth event that must be covered by one or more detections for it to be considered “recalled”.
min_det_usage – A floating point value (in the range 0-1) indicating the minimum fraction of a detection event that must have covered parts of one or more ground-truth events for it to be considered a “true positive”.

Returns:

A 5-element tuple containing -

per-class counts of true positives
per-class counts of detections (true + false positives)
numerator for computing recall (note that given our definition of ‘true positive’ and ‘recall’, this value may not be the same as the per-class counts of true positives).
mask of ground-truth events that were “recalled”
mask of detections that were true positives

koogu.utils.detections.postprocess_detections(clip_scores, clip_offsets, clip_length, threshold=None, suppress_nonmax=False, squeeze_min_samps=None)

Post-process detections to group together successive detections from each class.

Parameters:

clip_scores – An [N x M] array containing M per-class scores for each of the N clips.
clip_offsets – An N-length integer array containing indices of the first sample in each clip.
clip_length – Number of waveform samples in each clip.
threshold – (default: None) If not None, scores below this value will be ignored.
suppress_nonmax – (bool; default: False) If True, will apply non-max suppression to only consider the top-scoring class for each clip.
squeeze_min_samps – (default: None) If not None, will run the algorithm to squish contiguous detections of the same class. Squeezing will be limited to produce detections that are at least this many samples long.

Returns:

A 3-element or 4-element tuple containing -

sample indices (array of start and end pairs),
aggregated scores,
class IDs, and
if requested, start-end indices making up each combined streak.