visualization package

Submodules

visualization.plot_utils module

plot_utils.py

Utility functions for plotting, particularly for plotting confusion matrices and precision-recall curves.

megadetector.visualization.plot_utils.calibration_ece(true_scores, pred_scores, num_bins)[source]

Expected calibration error (ECE) as defined in equation (3) of Guo et al. “On Calibration of Modern Neural Networks.” (2017).

Implementation modified from sklearn.calibration.calibration_curve() in order to implement ECE calculation. See:

https://github.com/scikit-learn/scikit-learn/issues/18268

Parameters:

true_scores (list of int) – true values, length N, binary-valued (0 = neg, 1 = pos)
pred_scores (list of float) – predicted confidence values, length N, pred_scores[i] is the predicted confidence that example i is positive
num_bins (int) – number of bins to use (M in eq. (3) of Guo 2017)

Returns:

a length-three tuple containing:

accs: np.ndarray, shape [M], type float64, accuracy in each bin, M <= num_bins because bins with no samples are not returned
confs: np.ndarray, shape [M], type float64, mean model confidence in each bin
ece: float, expected calibration error

Return type:

tuple

megadetector.visualization.plot_utils.plot_calibration_curve(true_scores, pred_scores, num_bins, name='calibration', plot_perf=True, plot_hist=True, ax=None, **fig_kwargs)[source]

Plots a calibration curve.

Parameters:

true_scores (list of int) – true values, length N, binary-valued (0 = neg, 1 = pos)
pred_scores (list of float) – predicted confidence values, length N, pred_scores[i] is the predicted confidence that example i is positive
num_bins (int) – number of bins to use (M in eq. (3) of Guo 2017)
name (str, optional) – label in legend for the calibration curve
plot_perf (bool, optional) – whether to plot y=x line indicating perfect calibration
plot_hist (bool, optional) – whether to plot histogram of counts
ax (Axes, optional) – if given then no legend is drawn, and fig_kwargs are ignored
fig_kwargs (dict) – only used if [ax] is None

Returns:

the (new) figure

Return type:

matplotlib.figure.Figure

megadetector.visualization.plot_utils.plot_confusion_matrix(matrix, classes, normalize=False, title='Confusion matrix', cmap=<matplotlib.colors.LinearSegmentedColormap object>, vmax=None, use_colorbar=True, y_label=True, fmt='{:.0f}', fig=None)[source]

Plots a confusion matrix.

Parameters:

matrix (np.ndarray) – shape [num_classes, num_classes], confusion matrix where rows are ground-truth classes and columns are predicted classes
classes (list of str) – class names for each row/column
normalize (bool, optional) – whether to perform row-wise normalization; by default, assumes values in the confusion matrix are percentages
title (str, optional) – figure title
cmap (matplotlib.colors.colormap, optional) – colormap for cell backgrounds
vmax (float, optional) – value corresponding to the largest value of the colormap; if None, the maximum value in [matrix] will be used
use_colorbar (bool, optional) – whether to show colorbar
y_label (bool, optional) – whether to show class names on the y axis
fmt (str, optional) – format string for rendering numeric values
fig (Figure, optional) – existing figure to which we should render, otherwise creates a new figure

Returns:

the figure we rendered to or created

Return type:

matplotlib.figure.Figure

megadetector.visualization.plot_utils.plot_precision_recall_curve(precisions, recalls, title='Precision/recall curve', xlim=(0.0, 1.05), ylim=(0.0, 1.05))[source]

Plots a precision/recall curve given lists of (ordered) precision and recall values.

Parameters:

precisions (list of float) – precision for corresponding recall values, should have same length as [recalls].
recalls (list of float) – recall for corresponding precision values, should have same length as [precisions].
title (str, optional) – plot title
xlim (tuple, optional) – x-axis limits as a length-2 tuple
ylim (tuple, optional) – y-axis limits as a length-2 tuple

Returns:

the (new) figure

Return type:

matplotlib.figure.Figure

megadetector.visualization.plot_utils.plot_stacked_bar_chart(data, series_labels=None, col_labels=None, x_label=None, y_label=None, log_scale=False)[source]

Plot a stacked bar chart, for plotting e.g. species distribution across locations.

Reference: https://stackoverflow.com/q/44309507

Parameters:

data (np.ndarray or list of list) – data to plot; rows (series) are species, columns are locations
series_labels (list of str, optional) – series labels, typically species names
col_labels (list of str, optional) – column labels, typically location names
x_label (str, optional) – x-axis label
y_label (str, optional) – y-axis label
log_scale (bool, optional) – whether to plot the y axis in log-scale

Returns:

the (new) figure

Return type:

matplotlib.figure.Figure

visualization.render_images_with_thumbnails module

render_images_with_thumbnails.py

Renders an output image with one primary image and crops from many secondary images, used primarily to check whether candidate repeat detections are actually false positives or not.

megadetector.visualization.render_images_with_thumbnails.crop_image_with_normalized_coordinates(image, bounding_box)[source]

Parameters:

image (PIL.Image) – image to crop
bounding_box (tuple) – tuple formatted as (x,y,w,h), where (0,0) is the upper-left of the image, and coordinates are normalized (so (0,0,1,1) is a box containing the entire image).

Returns:

cropped image

Return type:

PIL.Image

megadetector.visualization.render_images_with_thumbnails.render_images_with_thumbnails(primary_image_filename, primary_image_width, secondary_image_filename_list, secondary_image_bounding_box_list, cropped_grid_width, output_image_filename, primary_image_location='right')[source]

Given a primary image filename and a list of secondary images, writes to the provided output_image_filename an image where the one side is the primary image, and the other side is a grid of the secondary images, cropped according to the provided list of bounding boxes.

The output file will be primary_image_width + cropped_grid_width pixels wide.

The height of the output image will be determined by the original aspect ratio of the primary image.

Parameters:

primary_image_filename (str) – filename of the primary image to load as str
primary_image_width (int) – width at which to render the primary image; if this is None, will render at the original image width
secondary_image_filename_list (list) – list of filenames of the secondary images
secondary_image_bounding_box_list (list) – list of tuples, one per secondary image. Each tuple is a bounding box of the secondary image, formatted as (x,y,w,h), where (0,0) is the upper-left of the image, and coordinates are normalized (so (0,0,1,1) is a box containing the entire image.
cropped_grid_width (int) – width of the cropped-image area
output_image_filename (str) – filename to write the output image
primary_image_location (str, optional) – ‘right’ or left’; reserving ‘top’, ‘bottom’, etc. for future use

visualization.visualization_utils module

visualization_utils.py

Rendering functions shared across visualization scripts

class megadetector.visualization.visualization_utils.TestVisualizationUtils[source]

Bases: object

Tests for visualization_utils.py.

set_up()[source]: Download (if necessary) and locate the shared md-tests image data, and create a scratch folder for test-specific outputs.

tear_down()[source]: Remove test-specific output directories. Leaves the shared md-tests image data in place for other tests to use.

test_check_image_integrity()[source]: Test check_image_integrity on known-good and deliberately-corrupted images.

test_parallel_check_image_integrity()[source]: Test parallel_check_image_integrity on a mix of good and corrupt images.

test_parallel_get_image_sizes()[source]: Test parallel_get_image_sizes on good and corrupt images.

test_resize_image_folder()[source]: Test resize_image_folder, including the overwrite=False skip path.

test_resize_images()[source]: Test resize_images: write resized copies and confirm output sizes.

megadetector.visualization.visualization_utils.blur_detections(image, detections, blur_radius=40)[source]

Blur the regions in [image] corresponding to the MD-formatted list [detections]. [image] is modified in place.

Parameters:

image (PIL.Image.Image) – image in which we should blur specific regions
detections (list) – list of detections in the MD output format, see render detection_bounding_boxes for more detail.
blur_radius (int, optional) – radius of blur kernel in pixels

megadetector.visualization.visualization_utils.check_image_integrity(filename, modes=None)[source]

Check whether we can successfully load an image via OpenCV and/or PIL.

Parameters:

filename (str) – the filename to evaluate
modes (list, optional) –
a list containing one or more of:
- ’cv’
- ’pil’
- ’skimage’
- ’jpeg_trailer’
’jpeg_trailer’ checks that the binary data ends with ffd9. It does not check whether the image is actually a jpeg, and even if it is, there are lots of reasons the image might not end with ffd9. It’s also true the JPEGs that cause “premature end of jpeg segment” issues don’t end with ffd9, so this may be a useful diagnostic. High precision, very low recall for corrupt jpegs.

Set to None to use all modes.

Returns:

a dict with a key called ‘file’ (the value of [filename]), one key for each string in [modes] (a success indicator for that mode, specifically a string starting with either ‘success’ or ‘error’).

Return type:

dict

megadetector.visualization.visualization_utils.crop_image(detections, image, confidence_threshold=0.15, expansion=0)[source]

Crops detections above [confidence_threshold] from the PIL image [image], returning a list of PIL Images, preserving the order of [detections].

Parameters:

detections (list) – a list of dictionaries with keys ‘conf’ and ‘bbox’; boxes are length-four arrays formatted as [x,y,w,h], normalized, upper-left origin (this is the standard MD detection format)
image (Image or str) – the PIL Image object from which we should crop detections, or an image filename
confidence_threshold (float, optional) – only crop detections above this threshold
expansion (int, optional) – a number of pixels to include on each side of a cropped detection

Returns:

a possibly-empty list of PIL Image objects

Return type:

list

megadetector.visualization.visualization_utils.draw_bounding_box_on_image(image, ymin, xmin, ymax, xmax, clss=None, thickness=4, expansion=0, display_str_list=None, use_normalized_coordinates=True, label_font_size=16, colormap=None, textalign=0, vtextalign=0, text_rotation=None, label_font='arial.ttf')[source]

Adds a bounding box to an image. Modifies the image in place.

Bounding box coordinates can be specified in either absolute (pixel) or normalized coordinates by setting the use_normalized_coordinates argument.

Each string in display_str_list is displayed on a separate line above the bounding box in black text on a rectangle filled with the input ‘color’. If the top of the bounding box extends to the edge of the image, the strings are displayed below the bounding box.

Adapted from:

https://github.com/tensorflow/models/blob/master/research/object_detection/utils/visualization_utils.py

Parameters:

image (PIL.Image.Image) – the image on which we should draw a box
ymin (float) – ymin of bounding box
xmin (float) – xmin of bounding box
ymax (float) – ymax of bounding box
xmax (float) – xmax of bounding box
clss (int, optional) – the class index of the object in this bounding box, used for choosing a color; should be either an integer or a string-formatted integer
thickness (int or float, optional) – line thickness in pixels. If this is a float less than 1.0, it’s treated as a fraction of the image width.
expansion (int or float, optional) – number of pixels to expand bounding boxes on each side. If this is a float less than 1.0, it’s treated as a fraction of the image width.
display_str_list (list, optional) – list of strings to display above the box (each to be shown on its own line)
use_normalized_coordinates (bool, optional) – if True (default), treat coordinates ymin, xmin, ymax, xmax as relative to the image, otherwise coordinates as absolute pixel values
label_font_size (float, optional) – font size in pixels. If this is less than one, it’s treated as a fraction of the image width.
colormap (list, optional) – list of color names, used to choose colors for categories by indexing with the values in [classes]; defaults to a reasonable set of colors
textalign (int, optional) – TEXTALIGN_LEFT, TEXTALIGN_CENTER, or TEXTALIGN_RIGHT
vtextalign (int, optional) – VTEXTALIGN_TOP or VTEXTALIGN_BOTTOM
text_rotation (float, optional) – rotation to apply to text
label_font (str, optional) – font filename to use for label text (default ‘arial.ttf’); falls back to the PIL default font if the specified font is not found

megadetector.visualization.visualization_utils.draw_bounding_boxes_on_file(input_file, output_file, detections, confidence_threshold=0.0, detector_label_map={'0': 'empty', '1': 'animal', '2': 'person', '3': 'vehicle'}, thickness=4, expansion=0, colormap=None, label_font_size=16, custom_strings=None, target_size=None, ignore_exif_rotation=False, quality=None, label_font='arial.ttf')[source]

Renders detection bounding boxes on an image loaded from file, optionally writing the results to a new image file.

Parameters:

input_file (str) – filename or URL to load
output_file (str) – filename to which we should write the rendered image
detections (list) – a list of dictionaries with keys ‘conf’, ‘bbox’, and ‘category’; boxes are length-four arrays formatted as [x,y,w,h], normalized, upper-left origin (this is the standard MD detection format). ‘category’ is a string-int.
confidence_threshold (float, optional) – only render detections with confidence above this threshold
detector_label_map (dict, optional) – a dict mapping category IDs to strings. If this is None, no confidence values or identifiers are shown. If this is {}, just category indices and confidence values are shown.
thickness (int or float, optional) – line thickness in pixels. If this is a float less than 1.0, it’s treated as a fraction of the image width.
expansion (int or float, optional) – number of pixels to expand bounding boxes on each side. If this is a float less than 1.0, it’s treated as a fraction of the image width.
colormap (list, optional) – list of color names, used to choose colors for categories by indexing with the values in [classes]; defaults to a reasonable set of colors
label_font_size (float, optional) – font size in pixels. If this is less than one, it’s treated as a fraction of the image width.
custom_strings (list, optional) – set of strings to append to detection labels, should have the same length as [detections]. Appended before any classification labels.
target_size (tuple, optional) – tuple of (target_width,target_height). Either or both can be -1, see resize_image() for documentation. If None or (-1,-1), uses the original image size.
ignore_exif_rotation (bool, optional) – don’t rotate the loaded pixels, even if we are loading a JPEG and that JPEG says it should be rotated.
quality (int, optional) – jpeg quality to use for output (None to use PIL default)
label_font (str, optional) – font filename to use for label text (default ‘arial.ttf’)

Returns:

loaded and modified image

Return type:

PIL.Image.Image

megadetector.visualization.visualization_utils.draw_bounding_boxes_on_image(image, boxes, classes, thickness=4, expansion=0, display_strs=None, colormap=None, textalign=0, vtextalign=0, text_rotation=None, label_font_size=16, label_font='arial.ttf')[source]

Draws bounding boxes on an image. Modifies the image in place.

Parameters:

image (PIL.Image) – the image on which we should draw boxes
boxes (np.array) – a two-dimensional numpy array of size [N, 4], where N is the number of boxes, and each row is (ymin, xmin, ymax, xmax). Coordinates should be normalized to image height/width.
classes (list) – a list of ints or string-formatted ints corresponding to the class labels of the boxes. This is only used for color selection. Should have the same length as [boxes].
thickness (int or float, optional) – line thickness in pixels. If this is a float less than 1.0, it’s treated as a fraction of the image width.
expansion (int or float, optional) – number of pixels to expand bounding boxes on each side. If this is a float less than 1.0, it’s treated as a fraction of the image width.
display_strs (list, optional) – list of list of strings (the outer list should have the same length as [boxes]). Typically this is used to show (possibly multiple) detection or classification categories and/or confidence values.
colormap (list, optional) – list of color names, used to choose colors for categories by indexing with the values in [classes]; defaults to a reasonable set of colors
textalign (int, optional) – TEXTALIGN_LEFT, TEXTALIGN_CENTER, or TEXTALIGN_RIGHT
vtextalign (int, optional) – VTEXTALIGN_TOP or VTEXTALIGN_BOTTOM
text_rotation (float, optional) – rotation to apply to text
label_font_size (float, optional) – font size in pixels. If this is less than one, it’s treated as a fraction of the image width.
label_font (str, optional) – font filename to use for label text (default ‘arial.ttf’)

megadetector.visualization.visualization_utils.draw_db_boxes_on_file(input_file, output_file, boxes, classes=None, label_map=None, thickness=4, expansion=0, ignore_exif_rotation=False, quality=None)[source]

Render COCO-formatted bounding boxes (in absolute coordinates) on an image loaded from file, writing the results to a new image file.

Parameters:

input_file (str) – image file to read
output_file (str) – image file to write
boxes (list) – list of length-4 tuples, foramtted as (x,y,w,h) (in pixels)
classes (list, optional) – list of ints (or string-formatted ints), used to choose labels (either by literally rendering the class labels, or by indexing into [label_map])
label_map (dict, optional) – int –> str dictionary, typically mapping category IDs to species labels; if None, category labels are rendered verbatim (typically as numbers)
thickness (int or float, optional) – line thickness in pixels. If this is a float less than 1.0, it’s treated as a fraction of the image width.
expansion (int or float, optional) – number of pixels to expand bounding boxes on each side. If this is a float less than 1.0, it’s treated as a fraction of the image width.
ignore_exif_rotation (bool, optional) – don’t rotate the loaded pixels, even if we are loading a JPEG and that JPEG says it should be rotated
quality (int, optional) – jpeg quality to use for output (None to use PIL default)

Returns:

the loaded and modified image

Return type:

PIL.Image.Image

megadetector.visualization.visualization_utils.exif_preserving_save(pil_image, output_file, quality='keep', default_quality=85, verbose=False, tags_to_exclude=None)[source]

Saves [pil_image] to [output_file], making a moderate attempt to preserve EXIF data and JPEG quality. Neither is guaranteed.

Also see:

https://discuss.dizzycoding.com/determining-jpg-quality-in-python-pil/

…for more ways to preserve jpeg quality if quality=’keep’ doesn’t do the trick.

Parameters:

pil_image (Image) – the PIL Image object to save
output_file (str) – the destination file
quality (str or int, optional) – can be “keep” (default), or an integer from 0 to 100. This is only used if PIL thinks the the source image is a JPEG. If you load a JPEG and resize it in memory, for example, it’s no longer a JPEG.
default_quality (int, optional) – determines output quality when quality == ‘keep’ and we are saving a non-JPEG source to a JPEG file
verbose (bool, optional) – enable additional debug console output
tags_to_exclude (list, optional) – tags to exclude from the output file

megadetector.visualization.visualization_utils.get_image_size(im, verbose=False)[source]

Retrieve the size of an image. Returns None if the image fails to load.

Parameters:

im (str or PIL.Image) – filename or PIL image
verbose (bool, optional) – enable additional debug output

Returns:

tuple (w,h), or None if the image fails to load.

megadetector.visualization.visualization_utils.get_text_size(font, s)[source]

Get the expected width and height when rendering the string [s] in the font [font].

Parameters:

font (PIL.ImageFont) – the font whose size we should query
s (str) – the string whose size we should query

Returns:

(w,h), both floats in pixel coordinates

Return type:

tuple

megadetector.visualization.visualization_utils.gray_scale_fraction(image, crop_size=(0.1, 0.1))[source]

Computes the fraction of the pixels in [image] that appear to be grayscale (R==G==B), useful for approximating whether this is a night-time image when flash information is not available in EXIF data (or for video frames, where this information is often not available in structured metadata at all).

Parameters:

image (str or PIL.Image.Image) – Image, filename, or URL to analyze
crop_size (tuple of floats, optional) – a 2-element list/tuple, representing the fraction of the image to crop at the top and bottom, respectively, before analyzing (to minimize the possibility of including color elements in the image overlay)

Returns:

the fraction of pixels in [image] that appear to be grayscale (R==G==B)

Return type:

float

megadetector.visualization.visualization_utils.load_image(input_file, ignore_exif_rotation=False)[source]

Loads an image file. This is the non-lazy version of open_file(); i.e., it forces image decoding before returning.

Parameters:

input_file (str or BytesIO) – can be a path to an image file (anything that PIL can open), a URL, or an image as a stream of bytes
ignore_exif_rotation (bool, optional) – don’t rotate the loaded pixels, even if we are loading a JPEG and that JPEG says it should be rotated

Returns:

a PIL Image object in RGB mode

Return type:

PIL.Image.Image

megadetector.visualization.visualization_utils.open_image(input_file, ignore_exif_rotation=False)[source]

Opens an image in binary format using PIL.Image and converts to RGB mode.

Supports local files or URLs.

This operation is lazy; image will not be actually loaded until the first operation that needs to load it (for example, resizing), so file opening errors can show up later. load_image() is the non-lazy version of this function.

Parameters:

input_file (str or BytesIO) – can be a path to an image file (anything that PIL can open), a URL, or an image as a stream of bytes
ignore_exif_rotation (bool, optional) – don’t rotate the loaded pixels, even if we are loading a JPEG and that JPEG says it should be rotated

Returns:

A PIL Image object in RGB mode

Return type:

PIL.Image.Image

megadetector.visualization.visualization_utils.parallel_check_image_integrity(filenames, modes=None, max_workers=16, use_threads=True, recursive=True, verbose=False)[source]

Check whether we can successfully load a list of images via OpenCV and/or PIL.

Parameters:

filenames (list or str) – a list of image filenames or a folder
modes (list, optional) – see check_image_integrity() for documentation on the [modes] parameter
max_workers (int, optional) – the number of parallel workers to use; set to <=1 to disable parallelization
use_threads (bool, optional) – whether to use threads (True) or processes (False) for parallelization
recursive (bool, optional) – if [filenames] is a folder, whether to search recursively for images. Ignored if [filenames] is a list.
verbose (bool, optional) – enable additional debug output

Returns:

a list of dicts, each with a key called ‘file’ (the value of [filename]), one key for each string in [modes] (a success indicator for that mode, specifically a string starting with either ‘success’ or ‘error’).

Return type:

list

megadetector.visualization.visualization_utils.parallel_get_image_sizes(filenames, max_workers=16, use_threads=True, recursive=True, verbose=False)[source]

Retrieve image sizes for a list or folder of images

Parameters:

filenames (list or str) – a list of image filenames or a folder. Non-image files and unreadable images will be returned with a file size of None.
max_workers (int, optional) – the number of parallel workers to use; set to <=1 to disable parallelization
use_threads (bool, optional) – whether to use threads (True) or processes (False) for parallelization
recursive (bool, optional) – if [filenames] is a folder, whether to search recursively for images. Ignored if [filenames] is a list.
verbose (bool, optional) – enable additional debug output

Returns:

a dict mapping filenames to (w,h) tuples; the value will be None for images that fail to load. Filenames will always be absolute.

Return type:

dict

megadetector.visualization.visualization_utils.render_db_bounding_boxes(boxes, classes, image, original_size=None, label_map=None, thickness=4, expansion=0, colormap=None, textalign=0, vtextalign=0, text_rotation=None, label_font_size=16, tags=None, boxes_are_normalized=False, label_font='arial.ttf')[source]

Render bounding boxes (with class labels) on an image. This is a wrapper for draw_bounding_boxes_on_image, allowing the caller to operate on a resized image by providing the original size of the image; boxes will be scaled accordingly.

This function assumes that bounding boxes are in absolute coordinates, typically because they come from COCO camera traps .json files, unless boxes_are_normalized is True.

Parameters:

boxes (list) – list of length-4 tuples, foramtted as (x,y,w,h) (in pixels)
classes (list) – list of ints (or string-formatted ints), used to choose labels (either by literally rendering the class labels, or by indexing into [label_map])
image (PIL.Image.Image) – image object to modify
original_size (tuple, optional) – if this is not None, and the size is different than the size of [image], we assume that [boxes] refer to the original size, and we scale them accordingly before rendering
label_map (dict, optional) – int –> str dictionary, typically mapping category IDs to species labels; if None, category labels are rendered verbatim (typically as numbers)
thickness (int or float, optional) – line thickness in pixels. If this is a float less than 1.0, it’s treated as a fraction of the image width.
expansion (int or float, optional) – number of pixels to expand bounding boxes on each side. If this is a float less than 1.0, it’s treated as a fraction of the image width.
colormap (list, optional) – list of color names, used to choose colors for categories by indexing with the values in [classes]; defaults to a reasonable set of colors
textalign (int, optional) – TEXTALIGN_LEFT, TEXTALIGN_CENTER, or TEXTALIGN_RIGHT
vtextalign (int, optional) – VTEXTALIGN_TOP or VTEXTALIGN_BOTTOM
text_rotation (float, optional) – rotation to apply to text
label_font_size (float, optional) – font size in pixels. If this is less than one, it’s treated as a fraction of the image width.
tags (list, optional) – list of strings of length len(boxes) that should be appended after each class name (e.g. to show scores)
boxes_are_normalized (bool, optional) – whether boxes have already been normalized
label_font (str, optional) – font filename to use for label text (default ‘arial.ttf’)

megadetector.visualization.visualization_utils.render_detection_bounding_boxes(detections, image, label_map='show_categories', classification_label_map='show_categories', confidence_threshold=0.0, thickness=4, expansion=0, classification_confidence_threshold=0.3, max_classifications=3, colormap=None, textalign=0, vtextalign=0, label_font_size=16, custom_strings=None, box_sort_order='confidence', verbose=False, label_font='arial.ttf')[source]

Renders bounding boxes (with labels and confidence values) on an image for all detections above a threshold.

Renders classification labels if present.

[image] is modified in place.

Parameters:

detections (list) –
list of detections in the MD output format, for example:

Supports classification results, in the standard format:
image (PIL.Image.Image) – image on which we should render detections
label_map (dict, optional) – optional, mapping the numeric label to a string name. The type of the numeric label (typically strings) needs to be consistent with the keys in label_map; no casting is carried out. If [label_map] is None, no labels are shown (not even numbers and confidence values). If you want category numbers and confidence values without class labels, use the default value, the string ‘show_categories’.
classification_label_map (dict, optional) – optional, mapping of the string class labels to the actual class names. The type of the numeric label (typically strings) needs to be consistent with the keys in label_map; no casting is carried out. If [label_map] is None, no labels are shown (not even numbers and confidence values). If you want category numbers and confidence values without class labels, use the default value, the string ‘show_categories’.
confidence_threshold (float or dict, optional) – threshold above which boxes are rendered. Can also be a dictionary mapping category IDs to thresholds.
thickness (int or float, optional) – line thickness in pixels. If this is a float less than 1.0, it’s treated as a fraction of the image width.
expansion (int or float, optional) – number of pixels to expand bounding boxes on each side. If this is a float less than 1.0, it’s treated as a fraction of the image width.
classification_confidence_threshold (float, optional) – confidence above which classification results are displayed
max_classifications (int, optional) – maximum number of classification results rendered for one image
colormap (list, optional) – list of color names, used to choose colors for categories by indexing with the values in [classes]; defaults to a reasonable set of colors
textalign (int, optional) – TEXTALIGN_LEFT, TEXTALIGN_CENTER, or TEXTALIGN_RIGHT
vtextalign (int, optional) – VTEXTALIGN_TOP or VTEXTALIGN_BOTTOM
label_font_size (float, optional) – font size in pixels. If this is less than one, it’s treated as a fraction of the image width.
custom_strings (list of str, optional) – optional set of strings to append to detection labels, should have the same length as [detections]. Appended before any classification labels.
box_sort_order (str, optional) – sorting scheme for detection boxes, can be None, “confidence”, or “reverse_confidence”. “confidence” puts the highest-confidence boxes on top.
verbose (bool, optional) – enable additional debug output
label_font (str, optional) – font filename to use for label text (default ‘arial.ttf’)

megadetector.visualization.visualization_utils.resize_image(image, target_width=-1, target_height=-1, output_file=None, no_enlarge_width=False, verbose=False, quality='keep')[source]

Resizes a PIL Image object to the specified width and height; does not resize in place. If either width or height are -1, resizes with aspect ratio preservation.

If target_width and target_height are both -1, does not modify the image, but will write to output_file if supplied.

If no resizing is required, and an Image object is supplied, returns the original Image object (i.e., does not copy).

Parameters:

image (Image or str) – PIL Image object or a filename (local file or URL)
target_width (int, optional) – width to which we should resize this image, or -1 to let target_height determine the size
target_height (int, optional) – height to which we should resize this image, or -1 to let target_width determine the size
output_file (str, optional) – file to which we should save this image; if None, just returns the image without saving
no_enlarge_width (bool, optional) – if [no_enlarge_width] is True, and [target width] is larger than the original image width, does not modify the image, but will write to output_file if supplied
verbose (bool, optional) – enable additional debug output
quality (str or int, optional) – passed to exif_preserving_save, see docs for more detail

Returns:

the resized image, which may be the original image if no resizing is: required

Return type:

PIL.Image.Image

megadetector.visualization.visualization_utils.resize_image_folder(input_folder, output_folder=None, target_width=-1, target_height=-1, no_enlarge_width=False, verbose=False, quality='keep', pool_type='process', n_workers=10, recursive=True, image_files_relative=None, overwrite=True)[source]

Resize all images in a folder (defaults to recursive).

Defaults to in-place resizing (output_folder is optional).

Parameters:

input_folder (str) – folder in which we should find images to resize
output_folder (str, optional) – folder in which we should write resized images. If None, resizes images in place. Otherwise, maintains relative paths in the target folder.
target_width (int, optional) – width to which we should resize this image, or -1 to let target_height determine the size
target_height (int, optional) – height to which we should resize this image, or -1 to let target_width determine the size
no_enlarge_width (bool, optional) – if [no_enlarge_width] is True, and [target width] is larger than the original image width, does not modify the image, but will write to output_file if supplied
verbose (bool, optional) – enable additional debug output
quality (str or int, optional) – passed to exif_preserving_save, see docs for more detail
pool_type (str, optional) – whether use use processes (‘process’) or threads (‘thread’) for parallelization; ignored if n_workers <= 1
n_workers (int, optional) – number of workers to use for parallel resizing; set to <=1 to disable parallelization
recursive (bool, optional) – whether to search [input_folder] recursively for images.
image_files_relative (list, optional) – if not None, skips any relative paths not in this list
overwrite (bool, optional) – whether to overwrite existing target images

Returns:

a list of dicts with keys ‘input_fn’, ‘output_fn’, ‘status’, and ‘error’. ‘status’ will be ‘success’, ‘skipped’, or ‘error’; ‘error’ will be None for successful cases, otherwise will contain the image-specific error.

Return type:

list

megadetector.visualization.visualization_utils.resize_images(input_file_to_output_file, target_width=-1, target_height=-1, no_enlarge_width=False, verbose=False, quality='keep', pool_type='process', n_workers=10, overwrite=True)[source]

Resizes all images the dictionary [input_file_to_output_file].

Parameters:

input_file_to_output_file (dict) – dict mapping images that exist to the locations where the resized versions should be written
target_width (int, optional) – width to which we should resize this image, or -1 to let target_height determine the size
target_height (int, optional) – height to which we should resize this image, or -1 to let target_width determine the size
no_enlarge_width (bool, optional) – if [no_enlarge_width] is True, and [target width] is larger than the original image width, does not modify the image, but will write to output_file if supplied
verbose (bool, optional) – enable additional debug output
quality (str or int, optional) – passed to exif_preserving_save, see docs for more detail
pool_type (str, optional) – whether use use processes (‘process’) or threads (‘thread’) for parallelization; ignored if n_workers <= 1
n_workers (int, optional) – number of workers to use for parallel resizing; set to <=1 to disable parallelization
overwrite (bool, optional) – whether to overwrite existing target images

Returns:

a list of dicts with keys ‘input_fn’, ‘output_fn’, ‘status’, and ‘error’. ‘status’ will be ‘success’ or ‘error’; ‘error’ will be None for successful cases, otherwise will contain the image-specific error.

Return type:

list

megadetector.visualization.visualization_utils.test_visualization_utils()[source]: Runs all tests in the TestVisualizationUtils class.

visualization.visualize_db module

visualize_db.py

Outputs an HTML page visualizing annotations (class labels and/or bounding boxes) on a sample of images in a database in the COCO Camera Traps format.

class megadetector.visualization.visualize_db.DbVizOptions[source]

Bases: object

Parameters controlling the behavior of visualize_db().

add_search_links: Should we include Web search links for each category name?

box_expansion: Number of pixels to expand each bounding box

box_thickness: Line width in pixels

classes_to_exclude

Exclude images that contain annotations with these class names (not IDs) (list)

Mutually exclusive with classes_to_include

classes_to_include

Only include images that contain annotations with these class names (not IDs) (list)

Mutually exclusive with classes_to_exclude

colormap

List of PIL color names, which will be indexed by category IDs, or None to use the default color map.

For example: [‘AliceBlue’, ‘Red’, ‘RoyalBlue’, ‘Gold’, ‘Chartreuse’]

confidence_field_name: COCO files used for evaluation may contain confidence scores, this determines the field name used for confidence scores

confidence_threshold: Optionally apply a confidence threshold; this requires that [confidence_field_name] be present in all detections.

create_category_pages

Should we create separate pages for each category (within the sampled set)?

Images with multiple categories will be included in all relevant pages.

custom_category_mapping: Custom mapping from category IDs to labels, replacing what’s in the .json file

extra_annotation_fields_to_print: List of additional fields in the annotation struct that we should print in image headers

extra_image_fields_to_print: List of additional fields in the image struct that we should print in image headers

force_rendering: Set to False to skip existing images

html_options

HTML rendering options; see write_html_image_list for details

The most relevant option one might want to set here is:

html_options[‘maxFiguresPerHtmlFile’]

…which can be used to paginate previews to a number of images that will load well in a browser (1000 is a reasonable limit).

include_filename_links: Should there be a text link back to each original image?

include_image_links: Should each thumbnail image link back to the original image?

max_sequence_length: If this is None, we just sample images, and show images. If this is not None, we sample images, but we also show the other images in the sequences containing our sampled images. If this is <=0, there is no limit on the number of images we’ll show per sequences. If this is >0, we will cap the number of images shown per sequence; no guarantee is made about which images will be selected in that case. This only impacts the number of images added as “sequence friends” of images that get sampled.

num_to_visualize: Number of images to sample from the database, or None to visualize all images

parallelize_rendering: Parallelize rendering across multiple workers

parallelize_rendering_n_cores: Number of workers to use for parallelization; ignored if parallelize_rendering is False

parallelize_rendering_with_threads: In theory, whether to parallelize with threads (True) or processes (False), but process-based parallelization in this function is currently unsupported

quality: JPEG quality to use for saving images (None for Pillow default)

random_seed: Random seed to use for sampling images

show_full_paths: Should we show absolute (True) or relative (False) paths for each image?

sort_by_filename

Whether to sort images by filename (True) or randomly (False)

This is ignored if max_sequence_length is not None, in which case we always sort by sequence ID, then frame number.

trim_to_images_with_bboxes: Only show images that contain bounding boxes

verbose: Enable additionald debug console output

viz_size

Target size for rendering; set either dimension to -1 to preserve aspect ratio.

If viz_size is None or (-1,-1), the original image size is used.

megadetector.visualization.visualize_db.visualize_db(db_path, output_dir, image_base_dir, options=None)[source]

Writes images and html to output_dir to visualize the images and annotations in a COCO-formatted .json file.

Parameters:

db_path (str or dict) – the .json filename to load, or a previously-loaded database
output_dir (str) – the folder to which we should write annotated images
image_base_dir (str) – the folder where the images live; filenames in [db_path] should be relative to this folder.
options (DbVizOptions, optional) – See DbVizOptions for details

Returns:

A length-two tuple containing (the html filename) and (the loaded database).

Return type:

tuple

visualize_db - CLI interface

visualize_db [-h] [--num_to_visualize NUM_TO_VISUALIZE] [--random_sort]
             [--trim_to_images_with_bboxes] [--random_seed RANDOM_SEED]
             db_path output_dir image_base_dir

visualize_db positional arguments

db_path - .json file to visualize
output_dir - Output directory for html and rendered images
image_base_dir - Base directory (or URL) for input images

visualize_db options

-h, --help - show this help message and exit
--num_to_visualize NUM_TO_VISUALIZE - Number of images to visualize (randomly drawn) (defaults to all)
--random_sort - Sort randomly (rather than by filename) in output html
--trim_to_images_with_bboxes - Only include images with bounding boxes (defaults to false)
--random_seed RANDOM_SEED - Random seed for image selection

visualization.visualize_detector_output module

visualize_detector_output.py

Render images with bounding boxes annotated on them to a folder, based on a detector output result file (.json), optionally writing an HTML index file.

megadetector.visualization.visualize_detector_output.visualize_detector_output(detector_output_path, out_dir, images_dir=None, confidence_threshold=0.15, sample=-1, output_image_width=1000, random_seed=0, render_detections_only=False, classification_confidence_threshold=0.1, html_output_file=None, html_output_options=None, preserve_path_structure=False, parallelize_rendering=True, parallelize_rendering_n_cores=10, parallelize_rendering_with_threads=True, box_sort_order='confidence', category_names_to_blur=None, link_images_to_originals=False, detector_label_map=None, box_thickness=4, box_expansion=0, label_font_size=16, label_font='arial.ttf')[source]

Draws bounding boxes on images given the output of a detector.

Parameters:

detector_output_path (str) – path to detector output .json file, or a loaded MD results dict
out_dir (str) – path to directory for saving annotated images
images_dir (str, optional) – folder where the images live; filenames in [detector_output_path] should be relative to [image_dir]. Can be None if paths are absolute.
confidence_threshold (float, optional) – threshold above which detections will be rendered
sample (int, optional) – maximum number of images to render, -1 for all
output_image_width (int, optional) – width in pixels to resize images for display, preserving aspect ration; set to -1 to use original image width
random_seed (int, optional) – seed to use for choosing images when sample != -1, use None to avoid forcing a seed
render_detections_only (bool, optional) – only render images with above-threshold detections. Empty images are discarded after sampling, so if you want to see, e.g., 1000 non-empty images, you can set [render_detections_only], but you need to sample more than 1000 images.
classification_confidence_threshold (float, optional) – only show classifications above this threshold; does not impact whether images are rendered, only whether classification labels (not detection categories) are displayed
html_output_file (str, optional) – output path for an HTML index file (not written if None)
html_output_options (dict, optional) – HTML formatting options; see write_html_image_list for details. The most common option you may want to supply here is ‘maxFiguresPerHtmlFile’.
preserve_path_structure (bool, optional) – if False (default), writes images to unique names in a flat structure in the output folder; if True, preserves relative paths within the output folder
parallelize_rendering (bool, optional) – whether to use concurrent workers for rendering
parallelize_rendering_n_cores (int, optional) – number of concurrent workers to use (ignored if parallelize_rendering is False)
parallelize_rendering_with_threads (bool, optional) – determines whether we use threads (True) or processes (False) for parallelization (ignored if parallelize_rendering is False)
box_sort_order (str, optional) – sorting scheme for detection boxes, can be None, “confidence”, or “reverse_confidence”
category_names_to_blur (list of str, optional) – category names for which we should blur detections, most commonly [‘person’]
link_images_to_originals (bool, optional) – include a link from every rendered image back to the corresponding original image
detector_label_map (dict, optional) – mapping from category IDs to labels; by default (None) uses the values in the detector file. If this is the string ‘no_detection_labels’, hides labels.
box_thickness (int or float, optional) – box thickness in pixels. If this is a float less than 1.0, it’s treated as a fraction of the image width.
box_expansion (int or float , optional) – box expansion in pixels. If this is a float less than 1.0, it’s treated as a fraction of the image width.
label_font_size (float, optional) – label font size in pixels. If this is a float less than 1.0, it’s treated as a fraction of the image width.
label_font (str, optional) – font filename to use for label text (default ‘arial.ttf’)

Returns:

list of paths to annotated images

Return type:

list

visualize_detector_output - CLI interface

Annotate the bounding boxes predicted by a detector above some confidence threshold, and save the annotated images.

visualize_detector_output [-h] [--confidence CONFIDENCE] [--images_dir IMAGES_DIR]
                          [--sample SAMPLE] [--output_image_width OUTPUT_IMAGE_WIDTH]
                          [--random_seed RANDOM_SEED] [--html_output_file HTML_OUTPUT_FILE]
                          [--open_html_output_file] [--detections_only]
                          [--preserve_path_structure]
                          [--category_names_to_blur CATEGORY_NAMES_TO_BLUR]
                          [--classification_confidence CLASSIFICATION_CONFIDENCE]
                          [--box_thickness BOX_THICKNESS] [--box_expansion BOX_EXPANSION]
                          [--label_font_size LABEL_FONT_SIZE] [--label_font LABEL_FONT]
                          detector_output_path out_dir

visualize_detector_output positional arguments

detector_output_path - Path to json output file of the detector
out_dir - Path to directory where the annotated images will be saved. The directory will be created if it does not exist.

visualize_detector_output options

-h, --help - show this help message and exit
--confidence CONFIDENCE - Value between 0 and 1, indicating the confidence threshold above which to visualize bounding boxes
--images_dir IMAGES_DIR - Path to a local directory where images are stored. This serves as the root directory for image paths in detector_output_path. Omit if image paths are absolute.
--sample SAMPLE - Number of images to be annotated and rendered. Set to -1 (default) to annotate all images in the detector output file. There may be fewer images if some are not found in images_dir.
--output_image_width OUTPUT_IMAGE_WIDTH - Integer, desired width in pixels of the output annotated images. Use -1 to not resize. Default: 1000.
--random_seed RANDOM_SEED - Integer, for deterministic order of image sampling
--html_output_file HTML_OUTPUT_FILE - Filename to which we should write an HTML image index (off by default)
--open_html_output_file - Open the .html output file when done
--detections_only - Only render images with above-threshold detections (by default, both empty and non-empty images are rendered).
--preserve_path_structure - Preserve relative image paths (otherwise flattens and assigns unique file names)
--category_names_to_blur CATEGORY_NAMES_TO_BLUR - Comma-separated list of category names to blur (or a single category name, typically "person")
--classification_confidence CLASSIFICATION_CONFIDENCE - If classification results are present, render results above this threshold
--box_thickness BOX_THICKNESS - Line thickness in pixels for box rendering. If this is less than 1.0, it is treated as a fraction of the image width.
--box_expansion BOX_EXPANSION - Number of pixels to expand bounding boxes on each side. If this is less than 1.0, it is treated as a fraction of the image width.
--label_font_size LABEL_FONT_SIZE - Font size in pixels for detection labels. If this is less than 1.0, it is treated as a fraction of the image width.
--label_font LABEL_FONT - Font filename to use for label text (default arial.ttf).

visualization.visualize_video_output module

visualize_video_output.py

Render a folder of videos with bounding boxes to a new folder, based on a detector output file.

class megadetector.visualization.visualize_video_output.VideoVisualizationOptions[source]

Bases: object

Options controlling the behavior of visualize_video_output()

classification_confidence_threshold: Confidence threshold for including classifications

confidence_threshold: Confidence threshold for including detections

exclude_category_name_strings

List of category name strings to skip (e.g. “none”, “bear_moose”), or None

This tests against combined name strings (“bear_moose”), not individual category names (“bear”, “moose”).

exclude_category_names

List of individual category names to skip (e.g. “bear”, “moose”) or None

This tests against individual category names (“bear”, “moose”), not combined name strings (“bear_moose”).

flatten_output: By default, relative paths are preserved in the output folder; this flattens the output structure.

fourcc: Fourcc codec specification for video encoding

include_category_names

List of individual category names to includes (e.g. “bear”, “moose”), or None. At least one of these categories must be present for a video to be included.

This tests against individual category names (“bear”, “moose”), not combined name strings (“bear_moose”).

include_category_names_in_filenames

Should we include classification category names in the output filenames?

Helps for finding showcase videos. Can be “start”, “end”, or None.

min_output_length_seconds: Don’t render videos below this length

output_extension: By default, output videos use the same extension as input videos, use this to force a particular extension

parallelize_rendering: Enable parallel processing of videos

parallelize_rendering_n_cores: Number of concurrent workers (None = default based on CPU count)

parallelize_rendering_with_threads: Use threads (True) vs processes (False) for parallelization

path_separator_replacement: When flatten_output is True, path separators will be replaced with this string.

random_seed: Random seed for sampling

rendering_fs: Frame rate for output videos. Either a float (fps) or ‘auto’ to calculate based on detection frame intervals

sample: Sample N videos to process (-1 for all videos)

trim_to_detections: Skip frames before first and after last above-threshold detection

megadetector.visualization.visualize_video_output.visualize_video_output(detector_output_path, out_dir, video_dir, options=None)[source]

Renders videos with bounding boxes based on detector output.

Parameters:

detector_output_path (str) – path to .json file containing detection results
out_dir (str) – output directory for rendered videos
video_dir (str) – input video directory
options (VideoVisualizationOptions, optional) – processing options

Returns:

list of processing results for each video

Return type:

list

visualize_video_output - CLI interface

Render videos with bounding boxes predicted by a detector above a confidence threshold, and save the rendered videos.

visualize_video_output [-h] [--confidence_threshold CONFIDENCE_THRESHOLD] [--sample SAMPLE]
                       [--random_seed RANDOM_SEED]
                       [--classification_confidence_threshold CLASSIFICATION_CONFIDENCE_THRESHOLD]
                       [--rendering_fs RENDERING_FS] [--fourcc FOURCC] [--trim_to_detections]
                       detector_output_path out_dir video_dir

visualize_video_output positional arguments

detector_output_path - Path to json output file of the detector
out_dir - Path to directory where the rendered videos will be saved. The directory will be created if it does not exist.
video_dir - Path to directory containing the input videos

visualize_video_output options

-h, --help - show this help message and exit
--confidence_threshold CONFIDENCE_THRESHOLD - Confidence threshold above which detections will be rendered
--sample SAMPLE - Number of videos to randomly sample for processing. Set to -1 to process all videos
--random_seed RANDOM_SEED - Random seed for reproducible sampling
--classification_confidence_threshold CLASSIFICATION_CONFIDENCE_THRESHOLD - Value between 0 and 1, indicating the confidence threshold above which classifications will be rendered
--rendering_fs RENDERING_FS - Frame rate for output videos. Use "auto" to calculate based on detection frame intervals, positive float for explicit fps, or negative float for speedup factor (e.g. -2.0 = 2x faster)
--fourcc FOURCC - Fourcc codec specification for video encoding
--trim_to_detections - Skip frames before first and after last above-threshold detection