visualization package

Submodules

visualization.plot_utils module

plot_utils.py

Utility functions for plotting, particularly for plotting confusion matrices and precision-recall curves.

megadetector.visualization.plot_utils.calibration_ece(true_scores, pred_scores, num_bins)[source]

Expected calibration error (ECE) as defined in equation (3) of Guo et al. “On Calibration of Modern Neural Networks.” (2017).

Implementation modified from sklearn.calibration.calibration_curve() in order to implement ECE calculation. See:

https://github.com/scikit-learn/scikit-learn/issues/18268

Parameters:
  • true_scores (list of int) – true values, length N, binary-valued (0 = neg, 1 = pos)

  • pred_scores (list of float) – predicted confidence values, length N, pred_scores[i] is the predicted confidence that example i is positive

  • num_bins (int) – number of bins to use (M in eq. (3) of Guo 2017)

Returns:

a length-three tuple containing:
  • accs: np.ndarray, shape [M], type float64, accuracy in each bin, M <= num_bins because bins with no samples are not returned

  • confs: np.ndarray, shape [M], type float64, mean model confidence in each bin

  • ece: float, expected calibration error

Return type:

tuple

megadetector.visualization.plot_utils.plot_calibration_curve(true_scores, pred_scores, num_bins, name='calibration', plot_perf=True, plot_hist=True, ax=None, **fig_kwargs)[source]

Plots a calibration curve.

Parameters:
  • true_scores (list of int) – true values, length N, binary-valued (0 = neg, 1 = pos)

  • pred_scores (list of float) – predicted confidence values, length N, pred_scores[i] is the predicted confidence that example i is positive

  • num_bins (int) – number of bins to use (M in eq. (3) of Guo 2017)

  • name (str, optional) – label in legend for the calibration curve

  • plot_perf (bool, optional) – whether to plot y=x line indicating perfect calibration

  • plot_hist (bool, optional) – whether to plot histogram of counts

  • ax (Axes, optional) – if given then no legend is drawn, and fig_kwargs are ignored

  • fig_kwargs (dict) – only used if [ax] is None

Returns:

the (new) figure

Return type:

matplotlib.figure.Figure

megadetector.visualization.plot_utils.plot_confusion_matrix(matrix, classes, normalize=False, title='Confusion matrix', cmap=<matplotlib.colors.LinearSegmentedColormap object>, vmax=None, use_colorbar=True, y_label=True, fmt='{:.0f}', fig=None)[source]

Plots a confusion matrix.

Parameters:
  • matrix (np.ndarray) – shape [num_classes, num_classes], confusion matrix where rows are ground-truth classes and columns are predicted classes

  • classes (list of str) – class names for each row/column

  • normalize (bool, optional) – whether to perform row-wise normalization; by default, assumes values in the confusion matrix are percentages

  • title (str, optional) – figure title

  • cmap (matplotlib.colors.colormap, optional) – colormap for cell backgrounds

  • vmax (float, optional) – value corresponding to the largest value of the colormap; if None, the maximum value in [matrix] will be used

  • use_colorbar (bool, optional) – whether to show colorbar

  • y_label (bool, optional) – whether to show class names on the y axis

  • fmt (str, optional) – format string for rendering numeric values

  • fig (Figure, optional) – existing figure to which we should render, otherwise creates a new figure

Returns:

the figure we rendered to or created

Return type:

matplotlib.figure.Figure

megadetector.visualization.plot_utils.plot_precision_recall_curve(precisions, recalls, title='Precision/recall curve', xlim=(0.0, 1.05), ylim=(0.0, 1.05))[source]

Plots a precision/recall curve given lists of (ordered) precision and recall values.

Parameters:
  • precisions (list of float) – precision for corresponding recall values, should have same length as [recalls].

  • recalls (list of float) – recall for corresponding precision values, should have same length as [precisions].

  • title (str, optional) – plot title

  • xlim (tuple, optional) – x-axis limits as a length-2 tuple

  • ylim (tuple, optional) – y-axis limits as a length-2 tuple

Returns:

the (new) figure

Return type:

matplotlib.figure.Figure

megadetector.visualization.plot_utils.plot_stacked_bar_chart(data, series_labels=None, col_labels=None, x_label=None, y_label=None, log_scale=False)[source]

Plot a stacked bar chart, for plotting e.g. species distribution across locations.

Reference: https://stackoverflow.com/q/44309507

Parameters:
  • data (np.ndarray or list of list) – data to plot; rows (series) are species, columns are locations

  • series_labels (list of str, optional) – series labels, typically species names

  • col_labels (list of str, optional) – column labels, typically location names

  • x_label (str, optional) – x-axis label

  • y_label (str, optional) – y-axis label

  • log_scale (bool, optional) – whether to plot the y axis in log-scale

Returns:

the (new) figure

Return type:

matplotlib.figure.Figure

visualization.render_images_with_thumbnails module

render_images_with_thumbnails.py

Renders an output image with one primary image and crops from many secondary images, used primarily to check whether candidate repeat detections are actually false positives or not.

megadetector.visualization.render_images_with_thumbnails.crop_image_with_normalized_coordinates(image, bounding_box)[source]
Parameters:
  • image (PIL.Image) – image to crop

  • bounding_box (tuple) – tuple formatted as (x,y,w,h), where (0,0) is the upper-left of the image, and coordinates are normalized (so (0,0,1,1) is a box containing the entire image).

Returns:

cropped image

Return type:

PIL.Image

megadetector.visualization.render_images_with_thumbnails.render_images_with_thumbnails(primary_image_filename, primary_image_width, secondary_image_filename_list, secondary_image_bounding_box_list, cropped_grid_width, output_image_filename, primary_image_location='right')[source]

Given a primary image filename and a list of secondary images, writes to the provided output_image_filename an image where the one side is the primary image, and the other side is a grid of the secondary images, cropped according to the provided list of bounding boxes.

The output file will be primary_image_width + cropped_grid_width pixels wide.

The height of the output image will be determined by the original aspect ratio of the primary image.

Parameters:
  • primary_image_filename (str) – filename of the primary image to load as str

  • primary_image_width (int) – width at which to render the primary image; if this is None, will render at the original image width

  • secondary_image_filename_list (list) – list of filenames of the secondary images

  • secondary_image_bounding_box_list (list) – list of tuples, one per secondary image. Each tuple is a bounding box of the secondary image, formatted as (x,y,w,h), where (0,0) is the upper-left of the image, and coordinates are normalized (so (0,0,1,1) is a box containing the entire image.

  • cropped_grid_width (int) – width of the cropped-image area

  • output_image_filename (str) – filename to write the output image

  • primary_image_location (str, optional) – ‘right’ or left’; reserving ‘top’, ‘bottom’, etc. for future use

visualization.visualization_utils module

visualization_utils.py

Rendering functions shared across visualization scripts

class megadetector.visualization.visualization_utils.TestVisualizationUtils[source]

Bases: object

Tests for visualization_utils.py.

set_up()[source]

Download (if necessary) and locate the shared md-tests image data, and create a scratch folder for test-specific outputs.

tear_down()[source]

Remove test-specific output directories. Leaves the shared md-tests image data in place for other tests to use.

test_check_image_integrity()[source]

Test check_image_integrity on known-good and deliberately-corrupted images.

test_parallel_check_image_integrity()[source]

Test parallel_check_image_integrity on a mix of good and corrupt images.

test_parallel_get_image_sizes()[source]

Test parallel_get_image_sizes on good and corrupt images.

test_resize_image_folder()[source]

Test resize_image_folder, including the overwrite=False skip path.

test_resize_images()[source]

Test resize_images: write resized copies and confirm output sizes.

megadetector.visualization.visualization_utils.blur_detections(image, detections, blur_radius=40)[source]

Blur the regions in [image] corresponding to the MD-formatted list [detections]. [image] is modified in place.

Parameters:
  • image (PIL.Image.Image) – image in which we should blur specific regions

  • detections (list) – list of detections in the MD output format, see render detection_bounding_boxes for more detail.

  • blur_radius (int, optional) – radius of blur kernel in pixels

megadetector.visualization.visualization_utils.check_image_integrity(filename, modes=None)[source]

Check whether we can successfully load an image via OpenCV and/or PIL.

Parameters:
  • filename (str) – the filename to evaluate

  • modes (list, optional) –

    a list containing one or more of:

    • ’cv’

    • ’pil’

    • ’skimage’

    • ’jpeg_trailer’

    ’jpeg_trailer’ checks that the binary data ends with ffd9. It does not check whether the image is actually a jpeg, and even if it is, there are lots of reasons the image might not end with ffd9. It’s also true the JPEGs that cause “premature end of jpeg segment” issues don’t end with ffd9, so this may be a useful diagnostic. High precision, very low recall for corrupt jpegs.

    Set to None to use all modes.

Returns:

a dict with a key called ‘file’ (the value of [filename]), one key for each string in [modes] (a success indicator for that mode, specifically a string starting with either ‘success’ or ‘error’).

Return type:

dict

megadetector.visualization.visualization_utils.crop_image(detections, image, confidence_threshold=0.15, expansion=0)[source]

Crops detections above [confidence_threshold] from the PIL image [image], returning a list of PIL Images, preserving the order of [detections].

Parameters:
  • detections (list) – a list of dictionaries with keys ‘conf’ and ‘bbox’; boxes are length-four arrays formatted as [x,y,w,h], normalized, upper-left origin (this is the standard MD detection format)

  • image (Image or str) – the PIL Image object from which we should crop detections, or an image filename

  • confidence_threshold (float, optional) – only crop detections above this threshold

  • expansion (int, optional) – a number of pixels to include on each side of a cropped detection

Returns:

a possibly-empty list of PIL Image objects

Return type:

list

megadetector.visualization.visualization_utils.draw_bounding_box_on_image(image, ymin, xmin, ymax, xmax, clss=None, thickness=4, expansion=0, display_str_list=None, use_normalized_coordinates=True, label_font_size=16, colormap=None, textalign=0, vtextalign=0, text_rotation=None, label_font='arial.ttf')[source]

Adds a bounding box to an image. Modifies the image in place.

Bounding box coordinates can be specified in either absolute (pixel) or normalized coordinates by setting the use_normalized_coordinates argument.

Each string in display_str_list is displayed on a separate line above the bounding box in black text on a rectangle filled with the input ‘color’. If the top of the bounding box extends to the edge of the image, the strings are displayed below the bounding box.

Adapted from:

https://github.com/tensorflow/models/blob/master/research/object_detection/utils/visualization_utils.py

Parameters:
  • image (PIL.Image.Image) – the image on which we should draw a box

  • ymin (float) – ymin of bounding box

  • xmin (float) – xmin of bounding box

  • ymax (float) – ymax of bounding box

  • xmax (float) – xmax of bounding box

  • clss (int, optional) – the class index of the object in this bounding box, used for choosing a color; should be either an integer or a string-formatted integer

  • thickness (int or float, optional) – line thickness in pixels. If this is a float less than 1.0, it’s treated as a fraction of the image width.

  • expansion (int or float, optional) – number of pixels to expand bounding boxes on each side. If this is a float less than 1.0, it’s treated as a fraction of the image width.

  • display_str_list (list, optional) – list of strings to display above the box (each to be shown on its own line)

  • use_normalized_coordinates (bool, optional) – if True (default), treat coordinates ymin, xmin, ymax, xmax as relative to the image, otherwise coordinates as absolute pixel values

  • label_font_size (float, optional) – font size in pixels. If this is less than one, it’s treated as a fraction of the image width.

  • colormap (list, optional) – list of color names, used to choose colors for categories by indexing with the values in [classes]; defaults to a reasonable set of colors

  • textalign (int, optional) – TEXTALIGN_LEFT, TEXTALIGN_CENTER, or TEXTALIGN_RIGHT

  • vtextalign (int, optional) – VTEXTALIGN_TOP or VTEXTALIGN_BOTTOM

  • text_rotation (float, optional) – rotation to apply to text

  • label_font (str, optional) – font filename to use for label text (default ‘arial.ttf’); falls back to the PIL default font if the specified font is not found

megadetector.visualization.visualization_utils.draw_bounding_boxes_on_file(input_file, output_file, detections, confidence_threshold=0.0, detector_label_map={'0': 'empty', '1': 'animal', '2': 'person', '3': 'vehicle'}, thickness=4, expansion=0, colormap=None, label_font_size=16, custom_strings=None, target_size=None, ignore_exif_rotation=False, quality=None, label_font='arial.ttf')[source]

Renders detection bounding boxes on an image loaded from file, optionally writing the results to a new image file.

Parameters:
  • input_file (str) – filename or URL to load

  • output_file (str) – filename to which we should write the rendered image

  • detections (list) – a list of dictionaries with keys ‘conf’, ‘bbox’, and ‘category’; boxes are length-four arrays formatted as [x,y,w,h], normalized, upper-left origin (this is the standard MD detection format). ‘category’ is a string-int.

  • confidence_threshold (float, optional) – only render detections with confidence above this threshold

  • detector_label_map (dict, optional) – a dict mapping category IDs to strings. If this is None, no confidence values or identifiers are shown. If this is {}, just category indices and confidence values are shown.

  • thickness (int or float, optional) – line thickness in pixels. If this is a float less than 1.0, it’s treated as a fraction of the image width.

  • expansion (int or float, optional) – number of pixels to expand bounding boxes on each side. If this is a float less than 1.0, it’s treated as a fraction of the image width.

  • colormap (list, optional) – list of color names, used to choose colors for categories by indexing with the values in [classes]; defaults to a reasonable set of colors

  • label_font_size (float, optional) – font size in pixels. If this is less than one, it’s treated as a fraction of the image width.

  • custom_strings (list, optional) – set of strings to append to detection labels, should have the same length as [detections]. Appended before any classification labels.

  • target_size (tuple, optional) – tuple of (target_width,target_height). Either or both can be -1, see resize_image() for documentation. If None or (-1,-1), uses the original image size.

  • ignore_exif_rotation (bool, optional) – don’t rotate the loaded pixels, even if we are loading a JPEG and that JPEG says it should be rotated.

  • quality (int, optional) – jpeg quality to use for output (None to use PIL default)

  • label_font (str, optional) – font filename to use for label text (default ‘arial.ttf’)

Returns:

loaded and modified image

Return type:

PIL.Image.Image

megadetector.visualization.visualization_utils.draw_bounding_boxes_on_image(image, boxes, classes, thickness=4, expansion=0, display_strs=None, colormap=None, textalign=0, vtextalign=0, text_rotation=None, label_font_size=16, label_font='arial.ttf')[source]

Draws bounding boxes on an image. Modifies the image in place.

Parameters:
  • image (PIL.Image) – the image on which we should draw boxes

  • boxes (np.array) – a two-dimensional numpy array of size [N, 4], where N is the number of boxes, and each row is (ymin, xmin, ymax, xmax). Coordinates should be normalized to image height/width.

  • classes (list) – a list of ints or string-formatted ints corresponding to the class labels of the boxes. This is only used for color selection. Should have the same length as [boxes].

  • thickness (int or float, optional) – line thickness in pixels. If this is a float less than 1.0, it’s treated as a fraction of the image width.

  • expansion (int or float, optional) – number of pixels to expand bounding boxes on each side. If this is a float less than 1.0, it’s treated as a fraction of the image width.

  • display_strs (list, optional) – list of list of strings (the outer list should have the same length as [boxes]). Typically this is used to show (possibly multiple) detection or classification categories and/or confidence values.

  • colormap (list, optional) – list of color names, used to choose colors for categories by indexing with the values in [classes]; defaults to a reasonable set of colors

  • textalign (int, optional) – TEXTALIGN_LEFT, TEXTALIGN_CENTER, or TEXTALIGN_RIGHT

  • vtextalign (int, optional) – VTEXTALIGN_TOP or VTEXTALIGN_BOTTOM

  • text_rotation (float, optional) – rotation to apply to text

  • label_font_size (float, optional) – font size in pixels. If this is less than one, it’s treated as a fraction of the image width.

  • label_font (str, optional) – font filename to use for label text (default ‘arial.ttf’)

megadetector.visualization.visualization_utils.draw_db_boxes_on_file(input_file, output_file, boxes, classes=None, label_map=None, thickness=4, expansion=0, ignore_exif_rotation=False, quality=None)[source]

Render COCO-formatted bounding boxes (in absolute coordinates) on an image loaded from file, writing the results to a new image file.

Parameters:
  • input_file (str) – image file to read

  • output_file (str) – image file to write

  • boxes (list) – list of length-4 tuples, foramtted as (x,y,w,h) (in pixels)

  • classes (list, optional) – list of ints (or string-formatted ints), used to choose labels (either by literally rendering the class labels, or by indexing into [label_map])

  • label_map (dict, optional) – int –> str dictionary, typically mapping category IDs to species labels; if None, category labels are rendered verbatim (typically as numbers)

  • thickness (int or float, optional) – line thickness in pixels. If this is a float less than 1.0, it’s treated as a fraction of the image width.

  • expansion (int or float, optional) – number of pixels to expand bounding boxes on each side. If this is a float less than 1.0, it’s treated as a fraction of the image width.

  • ignore_exif_rotation (bool, optional) – don’t rotate the loaded pixels, even if we are loading a JPEG and that JPEG says it should be rotated

  • quality (int, optional) – jpeg quality to use for output (None to use PIL default)

Returns:

the loaded and modified image

Return type:

PIL.Image.Image

megadetector.visualization.visualization_utils.exif_preserving_save(pil_image, output_file, quality='keep', default_quality=85, verbose=False, tags_to_exclude=None)[source]

Saves [pil_image] to [output_file], making a moderate attempt to preserve EXIF data and JPEG quality. Neither is guaranteed.

Also see:

https://discuss.dizzycoding.com/determining-jpg-quality-in-python-pil/

…for more ways to preserve jpeg quality if quality=’keep’ doesn’t do the trick.

Parameters:
  • pil_image (Image) – the PIL Image object to save

  • output_file (str) – the destination file

  • quality (str or int, optional) – can be “keep” (default), or an integer from 0 to 100. This is only used if PIL thinks the the source image is a JPEG. If you load a JPEG and resize it in memory, for example, it’s no longer a JPEG.

  • default_quality (int, optional) – determines output quality when quality == ‘keep’ and we are saving a non-JPEG source to a JPEG file

  • verbose (bool, optional) – enable additional debug console output

  • tags_to_exclude (list, optional) – tags to exclude from the output file

megadetector.visualization.visualization_utils.get_image_size(im, verbose=False)[source]

Retrieve the size of an image. Returns None if the image fails to load.

Parameters:
  • im (str or PIL.Image) – filename or PIL image

  • verbose (bool, optional) – enable additional debug output

Returns:

tuple (w,h), or None if the image fails to load.

megadetector.visualization.visualization_utils.get_text_size(font, s)[source]

Get the expected width and height when rendering the string [s] in the font [font].

Parameters:
  • font (PIL.ImageFont) – the font whose size we should query

  • s (str) – the string whose size we should query

Returns:

(w,h), both floats in pixel coordinates

Return type:

tuple

megadetector.visualization.visualization_utils.gray_scale_fraction(image, crop_size=(0.1, 0.1))[source]

Computes the fraction of the pixels in [image] that appear to be grayscale (R==G==B), useful for approximating whether this is a night-time image when flash information is not available in EXIF data (or for video frames, where this information is often not available in structured metadata at all).

Parameters:
  • image (str or PIL.Image.Image) – Image, filename, or URL to analyze

  • crop_size (tuple of floats, optional) – a 2-element list/tuple, representing the fraction of the image to crop at the top and bottom, respectively, before analyzing (to minimize the possibility of including color elements in the image overlay)

Returns:

the fraction of pixels in [image] that appear to be grayscale (R==G==B)

Return type:

float

megadetector.visualization.visualization_utils.load_image(input_file, ignore_exif_rotation=False)[source]

Loads an image file. This is the non-lazy version of open_file(); i.e., it forces image decoding before returning.

Parameters:
  • input_file (str or BytesIO) – can be a path to an image file (anything that PIL can open), a URL, or an image as a stream of bytes

  • ignore_exif_rotation (bool, optional) – don’t rotate the loaded pixels, even if we are loading a JPEG and that JPEG says it should be rotated

Returns:

a PIL Image object in RGB mode

Return type:

PIL.Image.Image

megadetector.visualization.visualization_utils.open_image(input_file, ignore_exif_rotation=False)[source]

Opens an image in binary format using PIL.Image and converts to RGB mode.

Supports local files or URLs.

This operation is lazy; image will not be actually loaded until the first operation that needs to load it (for example, resizing), so file opening errors can show up later. load_image() is the non-lazy version of this function.

Parameters:
  • input_file (str or BytesIO) – can be a path to an image file (anything that PIL can open), a URL, or an image as a stream of bytes

  • ignore_exif_rotation (bool, optional) – don’t rotate the loaded pixels, even if we are loading a JPEG and that JPEG says it should be rotated

Returns:

A PIL Image object in RGB mode

Return type:

PIL.Image.Image

megadetector.visualization.visualization_utils.parallel_check_image_integrity(filenames, modes=None, max_workers=16, use_threads=True, recursive=True, verbose=False)[source]

Check whether we can successfully load a list of images via OpenCV and/or PIL.

Parameters:
  • filenames (list or str) – a list of image filenames or a folder

  • modes (list, optional) – see check_image_integrity() for documentation on the [modes] parameter

  • max_workers (int, optional) – the number of parallel workers to use; set to <=1 to disable parallelization

  • use_threads (bool, optional) – whether to use threads (True) or processes (False) for parallelization

  • recursive (bool, optional) – if [filenames] is a folder, whether to search recursively for images. Ignored if [filenames] is a list.

  • verbose (bool, optional) – enable additional debug output

Returns:

a list of dicts, each with a key called ‘file’ (the value of [filename]), one key for each string in [modes] (a success indicator for that mode, specifically a string starting with either ‘success’ or ‘error’).

Return type:

list

megadetector.visualization.visualization_utils.parallel_get_image_sizes(filenames, max_workers=16, use_threads=True, recursive=True, verbose=False)[source]

Retrieve image sizes for a list or folder of images

Parameters:
  • filenames (list or str) – a list of image filenames or a folder. Non-image files and unreadable images will be returned with a file size of None.

  • max_workers (int, optional) – the number of parallel workers to use; set to <=1 to disable parallelization

  • use_threads (bool, optional) – whether to use threads (True) or processes (False) for parallelization

  • recursive (bool, optional) – if [filenames] is a folder, whether to search recursively for images. Ignored if [filenames] is a list.

  • verbose (bool, optional) – enable additional debug output

Returns:

a dict mapping filenames to (w,h) tuples; the value will be None for images that fail to load. Filenames will always be absolute.

Return type:

dict

megadetector.visualization.visualization_utils.render_db_bounding_boxes(boxes, classes, image, original_size=None, label_map=None, thickness=4, expansion=0, colormap=None, textalign=0, vtextalign=0, text_rotation=None, label_font_size=16, tags=None, boxes_are_normalized=False, label_font='arial.ttf')[source]

Render bounding boxes (with class labels) on an image. This is a wrapper for draw_bounding_boxes_on_image, allowing the caller to operate on a resized image by providing the original size of the image; boxes will be scaled accordingly.

This function assumes that bounding boxes are in absolute coordinates, typically because they come from COCO camera traps .json files, unless boxes_are_normalized is True.

Parameters:
  • boxes (list) – list of length-4 tuples, foramtted as (x,y,w,h) (in pixels)

  • classes (list) – list of ints (or string-formatted ints), used to choose labels (either by literally rendering the class labels, or by indexing into [label_map])

  • image (PIL.Image.Image) – image object to modify

  • original_size (tuple, optional) – if this is not None, and the size is different than the size of [image], we assume that [boxes] refer to the original size, and we scale them accordingly before rendering

  • label_map (dict, optional) – int –> str dictionary, typically mapping category IDs to species labels; if None, category labels are rendered verbatim (typically as numbers)

  • thickness (int or float, optional) – line thickness in pixels. If this is a float less than 1.0, it’s treated as a fraction of the image width.

  • expansion (int or float, optional) – number of pixels to expand bounding boxes on each side. If this is a float less than 1.0, it’s treated as a fraction of the image width.

  • colormap (list, optional) – list of color names, used to choose colors for categories by indexing with the values in [classes]; defaults to a reasonable set of colors

  • textalign (int, optional) – TEXTALIGN_LEFT, TEXTALIGN_CENTER, or TEXTALIGN_RIGHT

  • vtextalign (int, optional) – VTEXTALIGN_TOP or VTEXTALIGN_BOTTOM

  • text_rotation (float, optional) – rotation to apply to text

  • label_font_size (float, optional) – font size in pixels. If this is less than one, it’s treated as a fraction of the image width.

  • tags (list, optional) – list of strings of length len(boxes) that should be appended after each class name (e.g. to show scores)

  • boxes_are_normalized (bool, optional) – whether boxes have already been normalized

  • label_font (str, optional) – font filename to use for label text (default ‘arial.ttf’)

megadetector.visualization.visualization_utils.render_detection_bounding_boxes(detections, image, label_map='show_categories', classification_label_map='show_categories', confidence_threshold=0.0, thickness=4, expansion=0, classification_confidence_threshold=0.3, max_classifications=3, colormap=None, textalign=0, vtextalign=0, label_font_size=16, custom_strings=None, box_sort_order='confidence', verbose=False, label_font='arial.ttf')[source]

Renders bounding boxes (with labels and confidence values) on an image for all detections above a threshold.

Renders classification labels if present.

[image] is modified in place.

Parameters:
  • detections (list) –

    list of detections in the MD output format, for example:

    Supports classification results, in the standard format:

  • image (PIL.Image.Image) – image on which we should render detections

  • label_map (dict, optional) – optional, mapping the numeric label to a string name. The type of the numeric label (typically strings) needs to be consistent with the keys in label_map; no casting is carried out. If [label_map] is None, no labels are shown (not even numbers and confidence values). If you want category numbers and confidence values without class labels, use the default value, the string ‘show_categories’.

  • classification_label_map (dict, optional) – optional, mapping of the string class labels to the actual class names. The type of the numeric label (typically strings) needs to be consistent with the keys in label_map; no casting is carried out. If [label_map] is None, no labels are shown (not even numbers and confidence values). If you want category numbers and confidence values without class labels, use the default value, the string ‘show_categories’.

  • confidence_threshold (float or dict, optional) – threshold above which boxes are rendered. Can also be a dictionary mapping category IDs to thresholds.

  • thickness (int or float, optional) – line thickness in pixels. If this is a float less than 1.0, it’s treated as a fraction of the image width.

  • expansion (int or float, optional) – number of pixels to expand bounding boxes on each side. If this is a float less than 1.0, it’s treated as a fraction of the image width.

  • classification_confidence_threshold (float, optional) – confidence above which classification results are displayed

  • max_classifications (int, optional) – maximum number of classification results rendered for one image

  • colormap (list, optional) – list of color names, used to choose colors for categories by indexing with the values in [classes]; defaults to a reasonable set of colors

  • textalign (int, optional) – TEXTALIGN_LEFT, TEXTALIGN_CENTER, or TEXTALIGN_RIGHT

  • vtextalign (int, optional) – VTEXTALIGN_TOP or VTEXTALIGN_BOTTOM

  • label_font_size (float, optional) – font size in pixels. If this is less than one, it’s treated as a fraction of the image width.

  • custom_strings (list of str, optional) – optional set of strings to append to detection labels, should have the same length as [detections]. Appended before any classification labels.

  • box_sort_order (str, optional) – sorting scheme for detection boxes, can be None, “confidence”, or “reverse_confidence”. “confidence” puts the highest-confidence boxes on top.

  • verbose (bool, optional) – enable additional debug output

  • label_font (str, optional) – font filename to use for label text (default ‘arial.ttf’)

megadetector.visualization.visualization_utils.resize_image(image, target_width=-1, target_height=-1, output_file=None, no_enlarge_width=False, verbose=False, quality='keep')[source]

Resizes a PIL Image object to the specified width and height; does not resize in place. If either width or height are -1, resizes with aspect ratio preservation.

If target_width and target_height are both -1, does not modify the image, but will write to output_file if supplied.

If no resizing is required, and an Image object is supplied, returns the original Image object (i.e., does not copy).

Parameters:
  • image (Image or str) – PIL Image object or a filename (local file or URL)

  • target_width (int, optional) – width to which we should resize this image, or -1 to let target_height determine the size

  • target_height (int, optional) – height to which we should resize this image, or -1 to let target_width determine the size

  • output_file (str, optional) – file to which we should save this image; if None, just returns the image without saving

  • no_enlarge_width (bool, optional) – if [no_enlarge_width] is True, and [target width] is larger than the original image width, does not modify the image, but will write to output_file if supplied

  • verbose (bool, optional) – enable additional debug output

  • quality (str or int, optional) – passed to exif_preserving_save, see docs for more detail

Returns:

the resized image, which may be the original image if no resizing is

required

Return type:

PIL.Image.Image

megadetector.visualization.visualization_utils.resize_image_folder(input_folder, output_folder=None, target_width=-1, target_height=-1, no_enlarge_width=False, verbose=False, quality='keep', pool_type='process', n_workers=10, recursive=True, image_files_relative=None, overwrite=True)[source]

Resize all images in a folder (defaults to recursive).

Defaults to in-place resizing (output_folder is optional).

Parameters:
  • input_folder (str) – folder in which we should find images to resize

  • output_folder (str, optional) – folder in which we should write resized images. If None, resizes images in place. Otherwise, maintains relative paths in the target folder.

  • target_width (int, optional) – width to which we should resize this image, or -1 to let target_height determine the size

  • target_height (int, optional) – height to which we should resize this image, or -1 to let target_width determine the size

  • no_enlarge_width (bool, optional) – if [no_enlarge_width] is True, and [target width] is larger than the original image width, does not modify the image, but will write to output_file if supplied

  • verbose (bool, optional) – enable additional debug output

  • quality (str or int, optional) – passed to exif_preserving_save, see docs for more detail

  • pool_type (str, optional) – whether use use processes (‘process’) or threads (‘thread’) for parallelization; ignored if n_workers <= 1

  • n_workers (int, optional) – number of workers to use for parallel resizing; set to <=1 to disable parallelization

  • recursive (bool, optional) – whether to search [input_folder] recursively for images.

  • image_files_relative (list, optional) – if not None, skips any relative paths not in this list

  • overwrite (bool, optional) – whether to overwrite existing target images

Returns:

a list of dicts with keys ‘input_fn’, ‘output_fn’, ‘status’, and ‘error’. ‘status’ will be ‘success’, ‘skipped’, or ‘error’; ‘error’ will be None for successful cases, otherwise will contain the image-specific error.

Return type:

list

megadetector.visualization.visualization_utils.resize_images(input_file_to_output_file, target_width=-1, target_height=-1, no_enlarge_width=False, verbose=False, quality='keep', pool_type='process', n_workers=10, overwrite=True)[source]

Resizes all images the dictionary [input_file_to_output_file].

Parameters:
  • input_file_to_output_file (dict) – dict mapping images that exist to the locations where the resized versions should be written

  • target_width (int, optional) – width to which we should resize this image, or -1 to let target_height determine the size

  • target_height (int, optional) – height to which we should resize this image, or -1 to let target_width determine the size

  • no_enlarge_width (bool, optional) – if [no_enlarge_width] is True, and [target width] is larger than the original image width, does not modify the image, but will write to output_file if supplied

  • verbose (bool, optional) – enable additional debug output

  • quality (str or int, optional) – passed to exif_preserving_save, see docs for more detail

  • pool_type (str, optional) – whether use use processes (‘process’) or threads (‘thread’) for parallelization; ignored if n_workers <= 1

  • n_workers (int, optional) – number of workers to use for parallel resizing; set to <=1 to disable parallelization

  • overwrite (bool, optional) – whether to overwrite existing target images

Returns:

a list of dicts with keys ‘input_fn’, ‘output_fn’, ‘status’, and ‘error’. ‘status’ will be ‘success’ or ‘error’; ‘error’ will be None for successful cases, otherwise will contain the image-specific error.

Return type:

list

megadetector.visualization.visualization_utils.test_visualization_utils()[source]

Runs all tests in the TestVisualizationUtils class.

visualization.visualize_db module

visualize_db.py

Outputs an HTML page visualizing annotations (class labels and/or bounding boxes) on a sample of images in a database in the COCO Camera Traps format.

class megadetector.visualization.visualize_db.DbVizOptions[source]

Bases: object

Parameters controlling the behavior of visualize_db().

Should we include Web search links for each category name?

box_expansion

Number of pixels to expand each bounding box

box_thickness

Line width in pixels

classes_to_exclude

Exclude images that contain annotations with these class names (not IDs) (list)

Mutually exclusive with classes_to_include

classes_to_include

Only include images that contain annotations with these class names (not IDs) (list)

Mutually exclusive with classes_to_exclude

colormap

List of PIL color names, which will be indexed by category IDs, or None to use the default color map.

For example: [‘AliceBlue’, ‘Red’, ‘RoyalBlue’, ‘Gold’, ‘Chartreuse’]

confidence_field_name

COCO files used for evaluation may contain confidence scores, this determines the field name used for confidence scores

confidence_threshold

Optionally apply a confidence threshold; this requires that [confidence_field_name] be present in all detections.

create_category_pages

Should we create separate pages for each category (within the sampled set)?

Images with multiple categories will be included in all relevant pages.

custom_category_mapping

Custom mapping from category IDs to labels, replacing what’s in the .json file

extra_annotation_fields_to_print

List of additional fields in the annotation struct that we should print in image headers

extra_image_fields_to_print

List of additional fields in the image struct that we should print in image headers

force_rendering

Set to False to skip existing images

html_options

HTML rendering options; see write_html_image_list for details

The most relevant option one might want to set here is:

html_options[‘maxFiguresPerHtmlFile’]

…which can be used to paginate previews to a number of images that will load well in a browser (1000 is a reasonable limit).

Should there be a text link back to each original image?

Should each thumbnail image link back to the original image?

max_sequence_length

If this is None, we just sample images, and show images. If this is not None, we sample images, but we also show the other images in the sequences containing our sampled images. If this is <=0, there is no limit on the number of images we’ll show per sequences. If this is >0, we will cap the number of images shown per sequence; no guarantee is made about which images will be selected in that case. This only impacts the number of images added as “sequence friends” of images that get sampled.

num_to_visualize

Number of images to sample from the database, or None to visualize all images

parallelize_rendering

Parallelize rendering across multiple workers

parallelize_rendering_n_cores

Number of workers to use for parallelization; ignored if parallelize_rendering is False

parallelize_rendering_with_threads

In theory, whether to parallelize with threads (True) or processes (False), but process-based parallelization in this function is currently unsupported

quality

JPEG quality to use for saving images (None for Pillow default)

random_seed

Random seed to use for sampling images

show_full_paths

Should we show absolute (True) or relative (False) paths for each image?

sort_by_filename

Whether to sort images by filename (True) or randomly (False)

This is ignored if max_sequence_length is not None, in which case we always sort by sequence ID, then frame number.

trim_to_images_with_bboxes

Only show images that contain bounding boxes

verbose

Enable additionald debug console output

viz_size

Target size for rendering; set either dimension to -1 to preserve aspect ratio.

If viz_size is None or (-1,-1), the original image size is used.

megadetector.visualization.visualize_db.visualize_db(db_path, output_dir, image_base_dir, options=None)[source]

Writes images and html to output_dir to visualize the images and annotations in a COCO-formatted .json file.

Parameters:
  • db_path (str or dict) – the .json filename to load, or a previously-loaded database

  • output_dir (str) – the folder to which we should write annotated images

  • image_base_dir (str) – the folder where the images live; filenames in [db_path] should be relative to this folder.

  • options (DbVizOptions, optional) – See DbVizOptions for details

Returns:

A length-two tuple containing (the html filename) and (the loaded database).

Return type:

tuple

visualize_db - CLI interface

visualize_db [-h] [--num_to_visualize NUM_TO_VISUALIZE] [--random_sort]
             [--trim_to_images_with_bboxes] [--random_seed RANDOM_SEED]
             db_path output_dir image_base_dir

visualize_db positional arguments

visualize_db options

visualization.visualize_detector_output module

visualize_detector_output.py

Render images with bounding boxes annotated on them to a folder, based on a detector output result file (.json), optionally writing an HTML index file.

megadetector.visualization.visualize_detector_output.visualize_detector_output(detector_output_path, out_dir, images_dir=None, confidence_threshold=0.15, sample=-1, output_image_width=1000, random_seed=0, render_detections_only=False, classification_confidence_threshold=0.1, html_output_file=None, html_output_options=None, preserve_path_structure=False, parallelize_rendering=True, parallelize_rendering_n_cores=10, parallelize_rendering_with_threads=True, box_sort_order='confidence', category_names_to_blur=None, link_images_to_originals=False, detector_label_map=None, box_thickness=4, box_expansion=0, label_font_size=16, label_font='arial.ttf')[source]

Draws bounding boxes on images given the output of a detector.

Parameters:
  • detector_output_path (str) – path to detector output .json file, or a loaded MD results dict

  • out_dir (str) – path to directory for saving annotated images

  • images_dir (str, optional) – folder where the images live; filenames in [detector_output_path] should be relative to [image_dir]. Can be None if paths are absolute.

  • confidence_threshold (float, optional) – threshold above which detections will be rendered

  • sample (int, optional) – maximum number of images to render, -1 for all

  • output_image_width (int, optional) – width in pixels to resize images for display, preserving aspect ration; set to -1 to use original image width

  • random_seed (int, optional) – seed to use for choosing images when sample != -1, use None to avoid forcing a seed

  • render_detections_only (bool, optional) – only render images with above-threshold detections. Empty images are discarded after sampling, so if you want to see, e.g., 1000 non-empty images, you can set [render_detections_only], but you need to sample more than 1000 images.

  • classification_confidence_threshold (float, optional) – only show classifications above this threshold; does not impact whether images are rendered, only whether classification labels (not detection categories) are displayed

  • html_output_file (str, optional) – output path for an HTML index file (not written if None)

  • html_output_options (dict, optional) – HTML formatting options; see write_html_image_list for details. The most common option you may want to supply here is ‘maxFiguresPerHtmlFile’.

  • preserve_path_structure (bool, optional) – if False (default), writes images to unique names in a flat structure in the output folder; if True, preserves relative paths within the output folder

  • parallelize_rendering (bool, optional) – whether to use concurrent workers for rendering

  • parallelize_rendering_n_cores (int, optional) – number of concurrent workers to use (ignored if parallelize_rendering is False)

  • parallelize_rendering_with_threads (bool, optional) – determines whether we use threads (True) or processes (False) for parallelization (ignored if parallelize_rendering is False)

  • box_sort_order (str, optional) – sorting scheme for detection boxes, can be None, “confidence”, or “reverse_confidence”

  • category_names_to_blur (list of str, optional) – category names for which we should blur detections, most commonly [‘person’]

  • link_images_to_originals (bool, optional) – include a link from every rendered image back to the corresponding original image

  • detector_label_map (dict, optional) – mapping from category IDs to labels; by default (None) uses the values in the detector file. If this is the string ‘no_detection_labels’, hides labels.

  • box_thickness (int or float, optional) – box thickness in pixels. If this is a float less than 1.0, it’s treated as a fraction of the image width.

  • box_expansion (int or float , optional) – box expansion in pixels. If this is a float less than 1.0, it’s treated as a fraction of the image width.

  • label_font_size (float, optional) – label font size in pixels. If this is a float less than 1.0, it’s treated as a fraction of the image width.

  • label_font (str, optional) – font filename to use for label text (default ‘arial.ttf’)

Returns:

list of paths to annotated images

Return type:

list

visualize_detector_output - CLI interface

Annotate the bounding boxes predicted by a detector above some confidence threshold, and save the annotated images.

visualize_detector_output [-h] [--confidence CONFIDENCE] [--images_dir IMAGES_DIR]
                          [--sample SAMPLE] [--output_image_width OUTPUT_IMAGE_WIDTH]
                          [--random_seed RANDOM_SEED] [--html_output_file HTML_OUTPUT_FILE]
                          [--open_html_output_file] [--detections_only]
                          [--preserve_path_structure]
                          [--category_names_to_blur CATEGORY_NAMES_TO_BLUR]
                          [--classification_confidence CLASSIFICATION_CONFIDENCE]
                          [--box_thickness BOX_THICKNESS] [--box_expansion BOX_EXPANSION]
                          [--label_font_size LABEL_FONT_SIZE] [--label_font LABEL_FONT]
                          detector_output_path out_dir

visualize_detector_output positional arguments

  • detector_output_path - Path to json output file of the detector

  • out_dir - Path to directory where the annotated images will be saved. The directory will be created if it does not exist.

visualize_detector_output options

  • -h, --help - show this help message and exit

  • --confidence CONFIDENCE - Value between 0 and 1, indicating the confidence threshold above which to visualize bounding boxes

  • --images_dir IMAGES_DIR - Path to a local directory where images are stored. This serves as the root directory for image paths in detector_output_path. Omit if image paths are absolute.

  • --sample SAMPLE - Number of images to be annotated and rendered. Set to -1 (default) to annotate all images in the detector output file. There may be fewer images if some are not found in images_dir.

  • --output_image_width OUTPUT_IMAGE_WIDTH - Integer, desired width in pixels of the output annotated images. Use -1 to not resize. Default: 1000.

  • --random_seed RANDOM_SEED - Integer, for deterministic order of image sampling

  • --html_output_file HTML_OUTPUT_FILE - Filename to which we should write an HTML image index (off by default)

  • --open_html_output_file - Open the .html output file when done

  • --detections_only - Only render images with above-threshold detections (by default, both empty and non-empty images are rendered).

  • --preserve_path_structure - Preserve relative image paths (otherwise flattens and assigns unique file names)

  • --category_names_to_blur CATEGORY_NAMES_TO_BLUR - Comma-separated list of category names to blur (or a single category name, typically "person")

  • --classification_confidence CLASSIFICATION_CONFIDENCE - If classification results are present, render results above this threshold

  • --box_thickness BOX_THICKNESS - Line thickness in pixels for box rendering. If this is less than 1.0, it is treated as a fraction of the image width.

  • --box_expansion BOX_EXPANSION - Number of pixels to expand bounding boxes on each side. If this is less than 1.0, it is treated as a fraction of the image width.

  • --label_font_size LABEL_FONT_SIZE - Font size in pixels for detection labels. If this is less than 1.0, it is treated as a fraction of the image width.

  • --label_font LABEL_FONT - Font filename to use for label text (default arial.ttf).

visualization.visualize_video_output module

visualize_video_output.py

Render a folder of videos with bounding boxes to a new folder, based on a detector output file.

class megadetector.visualization.visualize_video_output.VideoVisualizationOptions[source]

Bases: object

Options controlling the behavior of visualize_video_output()

classification_confidence_threshold

Confidence threshold for including classifications

confidence_threshold

Confidence threshold for including detections

exclude_category_name_strings

List of category name strings to skip (e.g. “none”, “bear_moose”), or None

This tests against combined name strings (“bear_moose”), not individual category names (“bear”, “moose”).

exclude_category_names

List of individual category names to skip (e.g. “bear”, “moose”) or None

This tests against individual category names (“bear”, “moose”), not combined name strings (“bear_moose”).

flatten_output

By default, relative paths are preserved in the output folder; this flattens the output structure.

fourcc

Fourcc codec specification for video encoding

include_category_names

List of individual category names to includes (e.g. “bear”, “moose”), or None. At least one of these categories must be present for a video to be included.

This tests against individual category names (“bear”, “moose”), not combined name strings (“bear_moose”).

include_category_names_in_filenames

Should we include classification category names in the output filenames?

Helps for finding showcase videos. Can be “start”, “end”, or None.

min_output_length_seconds

Don’t render videos below this length

output_extension

By default, output videos use the same extension as input videos, use this to force a particular extension

parallelize_rendering

Enable parallel processing of videos

parallelize_rendering_n_cores

Number of concurrent workers (None = default based on CPU count)

parallelize_rendering_with_threads

Use threads (True) vs processes (False) for parallelization

path_separator_replacement

When flatten_output is True, path separators will be replaced with this string.

random_seed

Random seed for sampling

rendering_fs

Frame rate for output videos. Either a float (fps) or ‘auto’ to calculate based on detection frame intervals

sample

Sample N videos to process (-1 for all videos)

trim_to_detections

Skip frames before first and after last above-threshold detection

megadetector.visualization.visualize_video_output.visualize_video_output(detector_output_path, out_dir, video_dir, options=None)[source]

Renders videos with bounding boxes based on detector output.

Parameters:
  • detector_output_path (str) – path to .json file containing detection results

  • out_dir (str) – output directory for rendered videos

  • video_dir (str) – input video directory

  • options (VideoVisualizationOptions, optional) – processing options

Returns:

list of processing results for each video

Return type:

list

visualize_video_output - CLI interface

Render videos with bounding boxes predicted by a detector above a confidence threshold, and save the rendered videos.

visualize_video_output [-h] [--confidence_threshold CONFIDENCE_THRESHOLD] [--sample SAMPLE]
                       [--random_seed RANDOM_SEED]
                       [--classification_confidence_threshold CLASSIFICATION_CONFIDENCE_THRESHOLD]
                       [--rendering_fs RENDERING_FS] [--fourcc FOURCC] [--trim_to_detections]
                       detector_output_path out_dir video_dir

visualize_video_output positional arguments

  • detector_output_path - Path to json output file of the detector

  • out_dir - Path to directory where the rendered videos will be saved. The directory will be created if it does not exist.

  • video_dir - Path to directory containing the input videos

visualize_video_output options

  • -h, --help - show this help message and exit

  • --confidence_threshold CONFIDENCE_THRESHOLD - Confidence threshold above which detections will be rendered

  • --sample SAMPLE - Number of videos to randomly sample for processing. Set to -1 to process all videos

  • --random_seed RANDOM_SEED - Random seed for reproducible sampling

  • --classification_confidence_threshold CLASSIFICATION_CONFIDENCE_THRESHOLD - Value between 0 and 1, indicating the confidence threshold above which classifications will be rendered

  • --rendering_fs RENDERING_FS - Frame rate for output videos. Use "auto" to calculate based on detection frame intervals, positive float for explicit fps, or negative float for speedup factor (e.g. -2.0 = 2x faster)

  • --fourcc FOURCC - Fourcc codec specification for video encoding

  • --trim_to_detections - Skip frames before first and after last above-threshold detection