Annotations and Detections

class koogu.utils.detections.LabelHelper(classes_list, remap_labels_dict=None, negative_class_label=None, fixed_labels=True, assessment_mode=False)

Provides functionality for manipulating and managing class labels in a problem space, without resorting to altering original annotation files.

Parameters:
  • classes_list – List of class labels. When used during data preparation, the list may be generated from available classes or be provided as a pre-defined list. When used during performance assessments, it is typically populated from the classes_list.json file that is saved alongside raw detections.

  • remap_labels_dict

    (default: None) If not None, must be a dictionary describing mapping of class labels. Use this to

    • update existing class’ labels
      (e.g. {'c1': 'new_c1'}),
    • merge together existing classes
      (e.g. {'c4': 'c1'}), or
    • combine existing classes into new ones
      (e.g. {'c4': 'new_c2', 'c23', 'new_c2'}).

    Avoid chaining of mappings (e.g. {'c1': 'c2', 'c2': 'c3'}).

  • negative_class_label – (default: None) If not None, must be a string (e.g. ‘Other’, ‘Noise’) which will be used as a label to identify the negative class clips (those that did not match any annotations). If specified, will be used in conjunction with remap_labels_dict.

  • fixed_labels – (bool; default: True) When True, classes_list will remain unchanged - any new mapping targets specified in remap_labels_dict will not be added and any mapped-out class labels will not be omitted. Typically, it should be set to True when classes_list is a pre-defined list during data preparation, and always during performance assessments.

  • assessment_mode – (bool; default: False) Set to True when invoked during performance assessments.

property classes_list

The final list of class names in the problem space, after performing manipulations based on remap_labels_dict (if specified).

property labels_to_indices

A Python dictionary mapping class names (string) to zero-based indices.

property negative_class_index

Index (zero-based) of the negative class (if specified) in classes_list.

koogu.utils.detections.assess_annotations_and_clips_match(clip_offsets, clip_len, num_classes, annots_times, annots_class_idxs, min_annot_overlap_fraction=1.0, keep_only_centralized_annots=False, negative_class_idx=None, max_nonmatch_overlap_fraction=0.0)

Match clips to annotations and return “coverage scores” and a mask of ‘matched annotations’. Coverage score is a value between 0.0 and 1.0 and describes how much of a particular class’ annotation(s) is/are covered by each clip.

Parameters:
  • clip_offsets – M-length array of start samples (offset from the start of the audio file) of M clips.

  • clip_len – Number of waveform samples in each clip.

  • num_classes – Number of classes in the given application.

  • annots_times – A numpy array (shape Nx2) of start-end pairs defining annotations’ temporal extents, in terms of sample indices.

  • annots_class_idxs – An N-length list of zero-based indices to the class corresponding to each annotation.

  • min_annot_overlap_fraction – Lower threshold on how much coverage a clip must have with an annotation for the annotation to be considered “matched”.

  • keep_only_centralized_annots – If enabled (default: False), very short annotations (< half of clip_len) will generate full coverage (1.0) only if they occur within the central 50% extents of the clip or if the annotation cuts across the center of the clip. For short annotations that do not satisfy these conditions, their normally-computed coverage value will be scaled down based on the annotation’s distance from the center of the clip.

  • negative_class_idx – If not None, clips that do have no (or small) overlap with any annotation will be marked as clips of the non-target class whose index this parameter specifies. See max_non_match_overlap_fraction for further control.

  • max_nonmatch_overlap_fraction – A clip without enough overlap with any annotations will be marked as non-target class only if its overlap with any annotation is less than this amount (default 0.0). This parameter is only used when negative_class_idx is set.

Returns:

A 2-element tuple containing -

  • MxP “coverage” matrix corresponding to the M clips and P classes. The values in the matrix will be:

    1.0 - if either the m-th clip fully contained an annotation from the
    p-th class or vice versa (possible when annotation is longer
    than clip_len);
    <1.0 - if there was partial coverage (the number of overlapping
    samples is divided by the shorter of clip_len or
    annotation length);
    0.0 - if the m-th clip had no overlap with any annotations from the
    p-th class.
  • N-length boolean mask of annotations that were matched with at least one clip under the condition of min_annot_overlap_fraction.

koogu.utils.detections.assess_annotations_and_detections_match(num_classes, gt_times, gt_labels, det_times, det_labels, min_gt_coverage=0.5, min_det_usage=0.5)

Match elements describing time-spans from two collections. Typically, one collection corresponds to ground-truth (gt) temporal extents and the other collection corresponds to detection (det) temporal extents.

Parameters:
  • num_classes – Number of classes of the various time-events.

  • gt_times – Mx2 numpy array representing the start-end times of M ground-truth events.

  • gt_labels – M-length integer array indicating the class of each of the M ground-truth events.

  • det_times – Nx2 numpy array representing the start-end times of N detection events.

  • det_labels – N-length integer array indicating the class of each of the N detection events.

  • min_gt_coverage – A floating point value (in the range 0-1) indicating the minimum fraction of a ground-truth event that must be covered by one or more detections for it to be considered “recalled”.

  • min_det_usage – A floating point value (in the range 0-1) indicating the minimum fraction of a detection event that must have covered parts of one or more ground-truth events for it to be considered a “true positive”.

Returns:

A 5-element tuple containing -

  • per-class counts of true positives

  • per-class counts of detections (true + false positives)

  • numerator for computing recall (note that given our definition of ‘true positive’ and ‘recall’, this value may not be the same as the per-class counts of true positives).

  • mask of ground-truth events that were “recalled”

  • mask of detections that were true positives

koogu.utils.detections.postprocess_detections(clip_scores, clip_offsets, clip_length, threshold=None, suppress_nonmax=False, squeeze_min_samps=None)

Post-process detections to group together successive detections from each class.

Parameters:
  • clip_scores – An [N x M] array containing M per-class scores for each of the N clips.

  • clip_offsets – An N-length integer array containing indices of the first sample in each clip.

  • clip_length – Number of waveform samples in each clip.

  • threshold – (default: None) If not None, scores below this value will be ignored.

  • suppress_nonmax – (bool; default: False) If True, will apply non-max suppression to only consider the top-scoring class for each clip.

  • squeeze_min_samps – (default: None) If not None, will run the algorithm to squish contiguous detections of the same class. Squeezing will be limited to produce detections that are at least this many samples long.

Returns:

A 3-element or 4-element tuple containing -

  • sample indices (array of start and end pairs),

  • aggregated scores,

  • class IDs, and

  • if requested, start-end indices making up each combined streak.