detection package

This package contains tools for running object detectors, especially MegaDetector. With 90% probability, if you’re looking at this documentation, the function you’re looking for is run_detector_batch.run_detector_batch.

Submodules

detection.run_detector_batch module

run_detector_batch.py

Module to run MegaDetector on lots of images, writing the results to a file in the MegaDetector results format.

https://lila.science/megadetector-output-format

This enables the results to be used in our post-processing pipeline; see postprocess_batch_results.py.

This script can save results to checkpoints intermittently, in case disaster strikes. To enable this, set –checkpoint_frequency to n > 0, and results will be saved as a checkpoint every n images. Checkpoints will be written to a file in the same directory as the output_file, and after all images are processed and final results file written to output_file, the temporary checkpoint file will be deleted. If you want to resume from a checkpoint, set the checkpoint file’s path using –resume_from_checkpoint.

Has multiprocessing support for CPUs only; if a GPU is available, it will use the GPU instead of CPUs, and the –ncores option will be ignored. Checkpointing is not supported when using a GPU.

The lack of GPU multiprocessing support might sound annoying, but in practice we run a gazillion MegaDetector images on multiple GPUs using this script, we just only use one GPU per invocation of this script. Dividing a list of images into one chunk per GPU happens outside of this script.

Does not have a command-line option to bind the process to a particular GPU, but you can prepend with “CUDA_VISIBLE_DEVICES=0 “, for example, to bind to GPU 0, e.g.:

CUDA_VISIBLE_DEVICES=0 python detection/run_detector_batch.py md_v4.1.0.pb ~/data ~/mdv4test.json

You can disable GPU processing entirely by setting CUDA_VISIBLE_DEVICES=’’.

megadetector.detection.run_detector_batch.get_image_datetime(image)[source]

Reads EXIF datetime from a PIL Image object.

Parameters:: image (Image) – the PIL Image object from which we should read datetime information
Returns:: the EXIF datetime from [image] (a PIL Image object), if available, as a string; returns None if EXIF datetime is not available.
Return type:: str

megadetector.detection.run_detector_batch.load_and_run_detector_batch(model_file, image_file_names, checkpoint_path=None, confidence_threshold=0.005, checkpoint_frequency=-1, results=None, n_cores=1, use_image_queue=False, quiet=False, image_size=None, class_mapping_filename=None, include_image_size=False, include_image_timestamp=False, include_exif_tags=None, augment=False, force_model_download=False, detector_options=None, loader_workers=4, preprocess_on_image_queue=False, batch_size=1, verbose_output=False, use_threads_for_queue=False)[source]

Load a model file and run it on a list of images.

Parameters:

model_file (str) – path to model file, or supported model string (e.g. “MDV5A”)
image_file_names (list or str) – list of strings (image filenames), a single image filename, a folder to recursively search for images in, or a .json or .txt file containing a list of images.
checkpoint_path (str, optional) – path to use for checkpoints (if None, checkpointing is disabled)
confidence_threshold (float, optional) – only detections above this threshold are returned
checkpoint_frequency (int, optional) – int, write results to JSON checkpoint file every N images, -1 disabled checkpointing
results (list, optional) – list of dicts, existing results loaded from checkpoint; generally not useful if you’re using this function outside of the CLI
n_cores (int, optional) – number of parallel worker to use, ignored if we’re running on a GPU
use_image_queue (bool, optional) – use a dedicated worker for image loading
quiet (bool, optional) – disable per-image console output
image_size (int, optional) – image size to use for inference, only mess with this if (a) you’re using a model other than MegaDetector or (b) you know what you’re doing
class_mapping_filename (str, optional) – use a non-default class mapping supplied in a .json file or YOLOv5 dataset.yaml file
include_image_size (bool, optional) – should we include image size in the output for each image?
include_image_timestamp (bool, optional) – should we include image timestamps in the output for each image?
include_exif_tags (str, optional) – comma-separated list of EXIF tags to include in output
augment (bool, optional) – enable image augmentation
force_model_download (bool, optional) – force downloading the model file if a named model (e.g. “MDV5A”) is supplied, even if the local file already exists
detector_options (dict, optional) – key/value pairs that are interpreted differently by different detectors. Can also be a list of k=v pairs, or a comma-delimited string containing a list of k=v pairs.
loader_workers (int, optional) – number of loaders to use, only relevant when use_image_queue is True
preprocess_on_image_queue (bool, optional) – if the image queue is enabled, should it handle image loading and preprocessing (True), or just image loading (False)?
batch_size (int, optional) – batch size for GPU processing, automatically set to 1 for CPU processing
verbose_output (bool, optional) – enable additional debug output
use_threads_for_queue (bool, optional) – use threads (rather than processes) for the data loading workers

Returns:

list of dicts; each dict represents detections on one image

Return type:

results

megadetector.detection.run_detector_batch.load_checkpoint(checkpoint_path)[source]

Loads results from a checkpoint file. A checkpoint file is always a dict with the key “checkpoint”.

Parameters:: checkpoint_path (str) – the .json file to load
Returns:: object retrieved from the checkpoint, typically a list of results
Return type:: object

megadetector.detection.run_detector_batch.write_checkpoint(checkpoint_path, results)[source]

Writes the object in [results] to a json checkpoint file, as a dict with the key “checkpoint”. First backs up the checkpoint file if it exists, in case we crash while writing the file.

Parameters:

checkpoint_path (str) – the file to write the checkpoint to
results (object) – the object we should write

megadetector.detection.run_detector_batch.write_results_to_file(results, output_file, relative_path_base=None, detector_file=None, info=None, include_max_conf=False, custom_metadata=None, force_forward_slashes=True)[source]

Writes list of detection results to JSON output file. Format matches:

https://lila.science/megadetector-output-format

Parameters:

results (list) – list of dict, each dict represents detections on one image
output_file (str) – path to JSON output file, should end in ‘.json’
relative_path_base (str, optional) – path to a directory as the base for relative paths, can be None if the paths in [results] are absolute
detector_file (str, optional) – filename of the detector used to generate these results, only used to pull out a version number for the “info” field
info (dict, optional) – dictionary to put in the results file instead of the default “info” field
include_max_conf (bool, optional) – old files (version 1.2 and earlier) included a “max_conf” field in each image; this was removed in version 1.3. Set this flag to force the inclusion of this field.
custom_metadata (object, optional) – additional data to include as info[‘custom_metadata’]; typically a dictionary, but no type/format checks are performed
force_forward_slashes (bool, optional) – convert all slashes in filenames within [results] to forward slashes

Returns:

the MD-formatted dictionary that was written to [output_file]

Return type:

dict

run_detector_batch - CLI interface

Module to run a TF/PT animal detection model on lots of images

run_detector_batch [-h] [--recursive] [--output_relative_filenames] [--include_max_conf]
                   [--verbose] [--image_size IMAGE_SIZE] [--augment] [--use_image_queue]
                   [--preprocess_on_image_queue] [--use_threads_for_queue]
                   [--threshold THRESHOLD] [--checkpoint_frequency CHECKPOINT_FREQUENCY]
                   [--checkpoint_path CHECKPOINT_PATH]
                   [--resume_from_checkpoint RESUME_FROM_CHECKPOINT]
                   [--allow_checkpoint_overwrite] [--ncores NCORES]
                   [--loader_workers LOADER_WORKERS]
                   [--class_mapping_filename CLASS_MAPPING_FILENAME] [--include_image_size]
                   [--include_image_timestamp] [--include_exif_tags INCLUDE_EXIF_TAGS]
                   [--overwrite_handling OVERWRITE_HANDLING] [--force_model_download]
                   [--previous_results_file PREVIOUS_RESULTS_FILE]
                   [--detector_options [KEY=VALUE ...]] [--batch_size BATCH_SIZE]
                   detector_file image_file output_file

run_detector_batch positional arguments

detector_file - Path to detector model file (.pb or .pt). Can also be the strings "MDV4", "MDV5A", or "MDV5B" to request automatic download.
image_file - Path to a single image file, a .json or .txt file containing a list of paths to images, or a directory
output_file - Path to output JSON results file, should end with a .json extension

run_detector_batch options

-h, --help - show this help message and exit
--recursive - Recurse into directories, only meaningful if image_file points to a directory
--output_relative_filenames - Output relative file names, only meaningful if image_file points to a directory
--include_max_conf - Include the "max_detection_conf" field in the output
--verbose - Enable additional debug output
--image_size IMAGE_SIZE - Force image resizing to a specific integer size on the long axis (not recommended to change this)
--augment - Enable image augmentation
--use_image_queue - Pre-load images, may help keep your GPU busy; does not currently support checkpointing. Useful if you have a very fast GPU and a very slow disk.
--preprocess_on_image_queue - Whether to do image resizing on the image queue (PyTorch detectors only)
--use_threads_for_queue - Use threads (rather than processes) for the image queue; only relevant if --use_image_queue is set
--threshold THRESHOLD - Confidence threshold between 0 and 1.0, don’t include boxes below this confidence in the output file. Default is 0.005
--checkpoint_frequency CHECKPOINT_FREQUENCY - Write results to a temporary file every N images; default is -1, which disables this feature
--checkpoint_path CHECKPOINT_PATH - File name to which checkpoints will be written if checkpoint_frequency is > 0, defaults to md_checkpoint_[date].json in the same folder as the output file
--resume_from_checkpoint RESUME_FROM_CHECKPOINT - Path to a JSON checkpoint file to resume from, or "auto" to find the most recent checkpoint in the same folder as the output file. "auto" usescheckpoint_path (rather than searching the output folder) if checkpoint_path is specified.
--allow_checkpoint_overwrite - By default, this script will bail if the specified checkpoint file already exists; this option allows it to overwrite existing checkpoints
--ncores NCORES - Number of cores to use for inference; only applies to CPU-based inference (default 1)
--loader_workers LOADER_WORKERS - Number of image loader workers to use; only relevant when --use_image_queue is set (default 4)
--class_mapping_filename CLASS_MAPPING_FILENAME - Use a non-default class mapping, supplied in a .json file with a dictionary mappingint-strings to strings. This will also disable the addition of "1" to all category IDs, so your class mapping should start at zero. Can also be a YOLOv5 dataset.yaml file.
--include_image_size - Include image dimensions in output file
--include_image_timestamp - Include image datetime (if available) in output file
--include_exif_tags INCLUDE_EXIF_TAGS - Command-separated list of EXIF tags to include in output, or "all" to include all tags
--overwrite_handling OVERWRITE_HANDLING - What should we do if the output file exists? overwrite/skip/error (default overwrite)
--force_model_download - If a named model (e.g. "MDV5A") is supplied, force a download of that model even if the local file already exists.
--previous_results_file PREVIOUS_RESULTS_FILE - If supplied, this should point to a previous .json results file; any results in that file will be transferred to the output file without reprocessing those images. Useful for "updating" a set of results when you may have added new images to a folder you’ve already processed. Only supported when using relative paths.
--detector_options KEY=VALUE - Detector-specific options, as a space-separated list of key-value pairs
--batch_size BATCH_SIZE - Batch size for GPU inference (default 1). CPU inference will ignore this and use batch_size=1.

detection.run_detector module

run_detector.py

Module to run an animal detection model on images. The main function in this script also renders the predicted bounding boxes on images and saves the resulting images (with bounding boxes).

This script is not a good way to process lots of images. It does not produce a useful output format, and it does not facilitate checkpointing the results so if it crashes you would have to start from scratch. If you want to run a detector on lots of images, you should check out run_detector_batch.py.

That said, this script (run_detector.py) is a good way to test our detector on a handful of images and get super-satisfying, graphical results.

If you would like to not use the GPU, set the environment variable CUDA_VISIBLE_DEVICES to “-1”.

This script will only consider detections with > 0.005 confidence at all times. The threshold you provide is only for rendering the results. If you need to see lower-confidence detections, you can change DEFAULT_OUTPUT_CONFIDENCE_THRESHOLD.

megadetector.detection.run_detector.estimate_md_images_per_second(model_file, device_name=None)[source]

Estimates how fast MegaDetector will run on a particular device, based on benchmarks. Defaults to querying the current device. Returns None if no data is available for the current card/model. Estimates only available for a small handful of GPUs. Uses an absurdly simple lookup approach, e.g. if the string “4090” appears in the device name, congratulations, you have an RTX 4090.

Parameters:

model_file (str) – model filename, e.g. c:/x/z/md_v5a.0.0.pt
device_name (str, optional) – device name, e.g. blah-blah-4090-blah-blah

Returns:

the approximate number of images this model version can process on this device per second

Return type:

float

megadetector.detection.run_detector.get_detector_metadata_from_version_string(detector_version)[source]

Given a MegaDetector version string (e.g. “v4.1.0”), returns the metadata for the model. Used for writing standard defaults to batch output files.

Parameters:: detector_version (str) – a detection version string, e.g. “v4.1.0”, which you can extract from a filename using get_detector_version_from_filename()
Returns:: metadata for this model, suitable for writing to a MD output file
Return type:: dict

megadetector.detection.run_detector.get_detector_version_from_filename(detector_filename, accept_first_match=True, verbose=False)[source]

Gets the canonical version number string of a detector from the model filename.

[detector_filename] will almost always end with one of the following:

megadetector_v2.pb
megadetector_v3.pb
megadetector_v4.1 (not produced by run_detector_batch.py, only found in output files from the deprecated Azure Batch API)
md_v4.1.0.pb
md_v5a.0.0.pt
md_v5b.0.0.pt

This function identifies the version number as “v2.0.0”, “v3.0.0”, “v4.1.0”, “v4.1.0”, “v5a.0.0”, and “v5b.0.0”, respectively. See known_models for the list of valid version numbers.

Parameters:

detector_filename (str) – model filename, e.g. c:/x/z/md_v5a.0.0.pt
accept_first_match (bool, optional) – if multiple candidates match the filename, choose the first one, otherwise returns the string “multiple”
verbose (bool, optional) – enable additional debug output

Returns:

a detector version string, e.g. “v5a.0.0”, or “multiple” if I’m confused

Return type:

str

megadetector.detection.run_detector.get_detector_version_from_model_file(detector_filename, verbose=False)[source]

Gets the canonical detection version from a model file, preferably by reading it from the file itself, otherwise based on the filename.

Parameters:

detector_filename (str) – model filename, e.g. c:/x/z/md_v5a.0.0.pt
verbose (bool, optional) – enable additional debug output

Returns:

a canonical detector version string, e.g. “v5a.0.0”, or “unknown”

Return type:

str

megadetector.detection.run_detector.get_typical_confidence_threshold_from_results(results)[source]

Given the .json data loaded from a MD results file, returns a typical confidence threshold based on the detector version.

Parameters:: results (dict or str) – a dict of MD results, as it would be loaded from a MD results .json file, or a .json filename
Returns:: a sensible default threshold for this model
Return type:: float

megadetector.detection.run_detector.is_gpu_available(model_file, context_string=None)[source]

Determines whether a GPU is available, importing PyTorch or TF depending on the extension of model_file. Does not actually load model_file, just uses that to determine how to check for GPU availability (PT vs. TF).

Parameters:

model_file (str) – model filename, e.g. c:/x/z/md_v5a.0.0.pt
context_string (str) – string to print to the console to clarify the context in which is_gpu_available is being called

Returns:

whether a GPU is available

Return type:

bool

megadetector.detection.run_detector.load_and_run_detector(model_file, image_file_names, output_dir, render_confidence_threshold=0.2, crop_images=False, box_thickness=4, box_expansion=0, image_size=None, label_font_size=16, augment=False, force_model_download=False, detector_options=None, verbose=False)[source]

Loads and runs a detector on target images, and visualizes the results.

Parameters:

model_file (str) – model filename, e.g. c:/x/z/md_v5a.0.0.pt, or a known model string, e.g. “MDV5A”
image_file_names (list) – list of absolute paths to process
output_dir (str) – folder to write visualized images to
render_confidence_threshold (float, optional) – only render boxes for detections above this threshold
crop_images (bool, optional) – whether to crop detected objects to individual images (default is to render images with boxes, rather than cropping)
box_thickness (float, optional) – thickness in pixels for box rendering
box_expansion (float, optional) – box expansion in pixels
image_size (tuple, optional) – image size to use for inference, only mess with this if (a) you’re using a model other than MegaDetector or (b) you know what you’re doing
label_font_size (float, optional) – font size to use for displaying class names and confidence values in the rendered images
augment (bool, optional) – enable (implementation-specific) image augmentation
force_model_download (bool, optional) – force downloading the model file if a named model (e.g. “MDV5A”) is supplied, even if the local file already exists
detector_options (dict, optional) – key/value pairs that are interpreted differently by different detectors
verbose (bool, optional) – enable additional debug output

megadetector.detection.run_detector.load_detector(model_file, force_cpu=False, force_model_download=False, detector_options=None, verbose=False)[source]

Loads a TF or PT detector, depending on the extension of model_file.

Parameters:

model_file (str) – model filename (e.g. c:/x/z/md_v5a.0.0.pt) or known model name (e.g. “MDV5A”)
force_cpu (bool, optional) – force the model to run on the CPU even if a GPU is available
force_model_download (bool, optional) – force downloading the model file if a named model (e.g. “MDV5A”) is supplied, even if the local file already exists
detector_options (dict, optional) – key/value pairs that are interpreted differently by different detectors
verbose (bool, optional) – enable additional debug output

Returns:

loaded detector object

Return type:

object

megadetector.detection.run_detector.try_download_known_detector(detector_file, force_download=False, verbose=False)[source]

Checks whether detector_file is really the name of a known model, in which case we will either read the actual filename from the corresponding environment variable or download (if necessary) to local temp space. Otherwise just returns the input string.

Parameters:

detector_file (str) – a known model string (e.g. “MDV5A”), or any other string (in which case this function is a no-op)
force_download (bool, optional) – whether to download the model even if the local target file already exists
verbose (bool, optional) – enable additional debug output

Returns:

the local filename to which the model was downloaded, or the same string that was passed in, if it’s not recognized as a well-known model name

Return type:

str

run_detector - CLI interface

Module to run an animal detection model on images

run_detector [-h] (--image_file IMAGE_FILE | --image_dir IMAGE_DIR) [--recursive]
             [--output_dir OUTPUT_DIR] [--image_size IMAGE_SIZE] [--threshold THRESHOLD]
             [--crop] [--augment] [--box_thickness BOX_THICKNESS]
             [--box_expansion BOX_EXPANSION] [--label_font_size LABEL_FONT_SIZE]
             [--process_likely_output_images] [--force_model_download] [--verbose]
             [--detector_options [KEY=VALUE ...]]
             detector_file

run_detector positional arguments

detector_file - Path to detector model file (.pt, .pth, or .pb). Can also be a model string (e.g. "MDV5A", "MDv1000-redwood") to request automatic download.

run_detector options

-h, --help - show this help message and exit
--image_file IMAGE_FILE - Single file to process, mutually exclusive with --image_dir
--image_dir IMAGE_DIR - Directory to search for images, with optional recursion by adding --recursive
--recursive - Recurse into directories, only meaningful if using --image_dir
--output_dir OUTPUT_DIR - Directory for output images (defaults to same as input)
--image_size IMAGE_SIZE - Force image resizing to a (square) integer size (not recommended to change this)
--threshold THRESHOLD - Confidence threshold between 0 and 1.0; only render boxes above this confidence (defaults to 0.2)
--crop - If set, produces separate output images for each crop, rather than adding bounding boxes to the original image
--augment - Enable image augmentation
--box_thickness BOX_THICKNESS - Line width (in pixels) for box rendering (defaults to 4)
--box_expansion BOX_EXPANSION - Number of pixels to expand boxes by (defaults to 0)
--label_font_size LABEL_FONT_SIZE - Label font size (defaults to 16)
--process_likely_output_images - By default, we skip images that end in _detections, because they probably came from this script. This option disables that behavior.
--force_model_download - If a named model (e.g. "MDV5A") is supplied, force a download of that model even if the local file already exists.
--verbose - Enable additional debug output
--detector_options KEY=VALUE - Detector-specific options, as a space-separated list of key-value pairs

detection.pytorch_detector module

pytorch_detector.py

Module to run YOLO-based MegaDetector models.

class megadetector.detection.pytorch_detector.PTDetector(model_path, detector_options=None, verbose=False)[source]

Bases: object

Class that runs a YOLO-based MegaDetector model. Also used as a preprocessor for images that will later be run through an instance of PTDetector.

compatibility_mode: This allows us to maintain backwards compatibility across a set of changes to the way this class does inference. Currently should start with either “default” or “classic”.

device

0’) or a torch.device()

Type:: Either a string (‘cpu’,’cuda

generate_detections_one_batch(img_original, image_id=None, detection_threshold=1e-05, image_size=None, augment=False, verbose=False)[source]

Run a detector on a batch of images.

Parameters:

img_original (list) – list of images (Image, np.array, or dict) on which we should run the detector, with EXIF rotation already handled, or dicts representing preprocessed images with associated letterbox parameters
image_id (list or None) – list of paths to identify the images; will be in the “file” field of the output objects. Will be ignored when img_original contains preprocessed dicts.
detection_threshold (float, optional) – only detections above this confidence threshold will be included in the return value
image_size (int, optional) – image size (long side) to use for inference, or None to use the default size specified at the time the model was loaded
augment (bool, optional) – enable (implementation-specific) image augmentation
verbose (bool, optional) – enable additional debug output

Returns:

a list of dictionaries, each with the following fields:

’file’ (filename, always present)
’max_detection_conf’ (removed from MegaDetector output files by default, but generated here)
’detections’ (a list of detection objects containing keys ‘category’, ‘conf’, and ‘bbox’)
’failure’ (a failure string, or None if everything went fine)

Return type:

list

generate_detections_one_image(img_original, image_id='unknown', detection_threshold=1e-05, image_size=None, augment=False, verbose=False)[source]

Run a detector on an image (wrapper around batch function).

Parameters:

img_original (Image, np.array, or dict) – the image on which we should run the detector, with EXIF rotation already handled, or a dict representing a preprocessed image with associated letterbox parameters
image_id (str, optional) – a path to identify the image; will be in the “file” field of the output object
detection_threshold (float, optional) – only detections above this confidence threshold will be included in the return value
image_size (int, optional) – image size (long side) to use for inference, or None to use the default size specified at the time the model was loaded
augment (bool, optional) – enable (implementation-specific) image augmentation
verbose (bool, optional) – enable additional debug output

Returns:

a dictionary with the following fields:

’file’ (filename, always present)
’max_detection_conf’ (removed from MegaDetector output files by default, but generated here)
’detections’ (a list of detection objects containing keys ‘category’, ‘conf’, and ‘bbox’)
’failure’ (a failure string, or None if everything went fine)

Return type:

dict

half_precision: Use half-precision inference… fixed by the model, generally don’t mess with this

letterbox_stride: Stride size passed to the YOLO letterbox() function

preprocess_image(img_original, image_id='unknown', image_size=None, verbose=False)[source]

Prepare an image for detection, including scaling and letterboxing.

Parameters:

img_original (Image or np.array) – the image on which we should run the detector, with EXIF rotation already handled
image_id (str, optional) – a path to identify the image; will be in the “file” field of the output object
image_size (int, optional) – image size (long side) to use for inference, or None to use the default size specified at the time the model was loaded
verbose (bool, optional) – enable additional debug output

Returns:

dict with fields:

file (filename)
img (the preprocessed np.array)
img_original (the input image before preprocessing, as an np.array)
img_original_pil (the input image before preprocessing, as a PIL Image)
target_shape (the 2D shape to which the image was resized during preprocessing)
scaling_shape (the 2D original size, for normalizing coordinates later)
letterbox_ratio (letterbox parameter used for normalizing coordinates later)
letterbox_pad (letterbox parameter used for normalizing coordinates later)

Return type:

dict

use_model_native_classes: If this is False, we assume the underlying model is producing class indices in the set (0,1,2) (and we assert() on this), and we add 1 to get to the backwards-compatible MD classes (1,2,3) before generating output. If this is True, we use whatever indices the model provides

megadetector.detection.pytorch_detector.add_metadata_to_megadetector_model_file(model_file_in, model_file_out, metadata, destination_path='megadetector_info.json')[source]

Adds a .json file to the specified MegaDetector model file containing metadata used by this module. Always over-writes the output file.

Parameters:

model_file_in (str) – The input model filename, typically .pt (.zip is also sensible)
model_file_out (str) – The output model filename, typically .pt (.zip is also sensible). May be the same as model_file_in.
metadata (dict) – The metadata dict to add to the output model file
destination_path (str, optional) – The relative path within the main folder of the model archive where we should write the metadata. This is not relative to the root of the archive, it’s relative to the one and only folder at the root of the archive (this is a PyTorch convention).

megadetector.detection.pytorch_detector.nms(prediction, conf_thres=0.25, iou_thres=0.45, max_det=300)[source]

Non-maximum suppression (a wrapper around torchvision.ops.nms())

Parameters:

prediction (torch.Tensor) – Model predictions with shape [batch_size, num_anchors, num_classes + 5] Format: [x_center, y_center, width, height, objectness, class1_conf, class2_conf, …] Coordinates are normalized to input image size.
conf_thres (float) – Confidence threshold for filtering detections
iou_thres (float) – IoU threshold for NMS
max_det (int) – Maximum number of detections per image

Returns:

List of tensors, one per image in batch. Each tensor has shape [N, 6] where:

N is the number of detections for that image
Columns are [x1, y1, x2, y2, confidence, class_id]
Coordinates are in absolute pixels relative to input image size
class_id is the integer class index (0-based)

Return type:

list

megadetector.detection.pytorch_detector.read_metadata_from_megadetector_model_file(model_file, relative_path='megadetector_info.json', verbose=False)[source]

Reads custom MegaDetector metadata from a modified MegaDetector model file.

Parameters:

model_file (str) – The model filename to read, typically .pt (.zip is also sensible)
relative_path (str, optional) – The relative path within the main folder of the model archive from which we should read the metadata. This is not relative to the root of the archive, it’s relative to the one and only folder at the root of the archive (this is a PyTorch convention).
verbose (str, optional) – enable additional debug output

Returns:

whatever we read from the metadata file, always a dict in practice. Returns None if we failed to read the specified metadata file.

Return type:

object

detection.rfdetr_detector module

rfdetr_detector.py

Module to run RF-DETR-based detectors within the MegaDetector Python package.

Supports only RF-DETR checkpoints produced by package version >= 1.8.3, which include metadata about model architecture and training resolution that was not included in earlier checkpoint formats.

The rfdetr package is not a dependency of the MegaDetector Python package, so it is imported lazily (at the time a model is loaded), rather than at module import time.

class megadetector.detection.rfdetr_detector.RFDETRDetector(model_path, detector_options=None, verbose=False)[source]

Bases: object

Class that runs an RF-DETR-based detector. Also used as a preprocessor for images that will later be run through an instance of RFDETRDetector.

detection_categories: Mapping from string category IDs to class names; None until the model is loaded

generate_detections_one_batch(img_original, image_id=None, detection_threshold=1e-05, image_size=None, augment=False, verbose=False)[source]

Run an RF-DETR detector on a batch of images.

Parameters:

img_original (list) – list of images (Image, np.array, or dict) on which we should run the detector, with EXIF rotation already handled, or dicts representing preprocessed images (as produced by preprocess_image())
image_id (list or None) – list of paths to identify the images; will be in the “file” field of the output objects. Ignored when img_original contains preprocessed dicts.
detection_threshold (float, optional) – only detections above this confidence threshold will be included in the return value
image_size (int, optional) – included for signature compatibility with PTDetector; must be None for RF-DETR models (set the resolution via the ‘image_size’ detector option at load time instead)
augment (bool, optional) – included for signature compatibility with PTDetector; must be False for RF-DETR models
verbose (bool, optional) – enable additional debug output

Returns:

a list of dictionaries, each with the following fields:

’file’ (filename, always present)
’max_detection_conf’ (removed from MegaDetector output files by default, but generated here)
’detections’ (a list of detection objects containing keys ‘category’, ‘conf’, and ‘bbox’)
’failure’ (a failure string, only present if inference failed)

Return type:

list

generate_detections_one_image(img_original, image_id='unknown', detection_threshold=1e-05, image_size=None, augment=False, verbose=False)[source]

Run an RF-DETR detector on an image (wrapper around generate_detections_one_batch()).

Parameters:

img_original (Image, np.array, or dict) – the image on which we should run the detector, with EXIF rotation already handled, or a dict representing a preprocessed image (as produced by preprocess_image())
image_id (str, optional) – a path to identify the image; will be in the “file” field of the output object
detection_threshold (float, optional) – only detections above this confidence threshold will be included in the return value
image_size (int, optional) – must be None for RF-DETR models (for which image size is specified at load time, not inference time)
augment (bool, optional) – must be False for RF-DETR models (which don’t support augmentation)
verbose (bool, optional) – enable additional debug output

Returns:

a dictionary with the following fields:

’file’ (filename, always present)
’max_detection_conf’ (removed from MegaDetector output files by default, but generated here)
’detections’ (a list of detection objects containing keys ‘category’, ‘conf’, and ‘bbox’)
’failure’ (a failure string, only present if inference failed)

Return type:

dict

image_size: Image resolution passed to from_checkpoint(); None means “use the resolution recorded in the checkpoint”. After the model is loaded, this is updated to the resolution actually used.

model: The loaded RF-DETR model; remains None for preprocess-only instances

model_type: The resolved variant class name (e.g. ‘RFDETRNano’); None until the model is loaded

preprocess_image(img_original, image_id='unknown', image_size=None, verbose=False)[source]

Prepare an image for detection. RF-DETR resizes and letterboxes internally, so this is almost a no-op.

Parameters:

img_original (Image or np.array) – the image on which we should run the detector, with EXIF rotation already handled
image_id (str, optional) – a path to identify the image; will be in the “file” field of the output object
image_size (int, optional) – included for signature compatibility with PTDetector.preprocess_image(); ignored for RF-DETR models
verbose (bool, optional) – enable additional debug output

Returns:

dict with fields:

file (filename)
img_original (the input image as an np.array)
img_original_pil (the input image as a PIL Image, or None if a numpy array was supplied)

Return type:

dict

megadetector.detection.rfdetr_detector.convert_detections_to_md_format(detections, image_width, image_height)[source]

Convert RF-DETR/Supervision detections to MegaDetector format.

Parameters:

detections – supervision Detections object with xyxy, confidence, class_id
image_width (int) – image width in pixels
image_height (int) – image height in pixels

Returns:

list of detection dicts in MegaDetector format

Return type:

list

megadetector.detection.rfdetr_detector.load_model(detector_file, image_size=None, optimize_for_inference=False, batch_size=1)[source]

Load an RF-DETR model from an inference-ready .pth checkpoint via rfdetr.from_checkpoint(), which reads the architecture name (“Nano”, “Medium”, etc.), training resolution, and class names from metadata stored in the checkpoint.

Parameters:

detector_file (str) – path to .pth checkpoint file.
image_size (int, optional) – image resolution for inference. None uses the training resolution recorded in the checkpoint; a value overrides it.
optimize_for_inference (bool, optional) – whether to optimize the model for inference, which should be a free lunch, but as of 9/2025 there is some risk of accuracy regression.
batch_size (int, optional) – batch size to pass to optimize_for_inference()

Returns:

dictionary with keys:

’model’: the loaded RF-DETR model
’model_type’ (str): resolved variant class name (e.g. ‘RFDETRSmall’)
’image_size’ (int): resolved inference resolution
’detection_categories’ (dict): mapping from string category IDs to class names

Return type:

dict

detection.tf_detector module

tf_detector.py

Module containing the class TFDetector, for loading and running a TensorFlow detection model.

class megadetector.detection.tf_detector.TFDetector(model_path, detector_options=None)[source]

Bases: object

A detector model loaded at the time of initialization. It is intended to be used with TensorFlow-based versions of MegaDetector (v2, v3, or v4). If someone can find v1, I suppose you could use this class for v1 also.

generate_detections_one_image(image, image_id, detection_threshold, image_size=None, augment=False, verbose=False)[source]

Runs the detector on an image.

Parameters:

image (Image) – the PIL Image object (or numpy array) on which we should run the detector, with EXIF rotation already handled.
image_id (str) – a path to identify the image; will be in the “file” field of the output object
detection_threshold (float) – only detections above this threshold will be included in the return value
image_size (tuple, optional) – image size to use for inference, only mess with this if (a) you’re using a model other than MegaDetector or (b) you know what you’re doing
augment (bool, optional) – enable image augmentation. Not currently supported, but included here for compatibility with PTDetector.
verbose (bool, optional) – enable additional debug output

Returns:

a dictionary with the following fields:

’file’ (filename, always present)
’max_detection_conf’ (removed from MegaDetector output files by default, but generated here)
’detections’ (a list of detection objects containing keys ‘category’, ‘conf’, and ‘bbox’)
’failure’ (a failure string, or None if everything went fine)

Return type:

dict

detection.process_video module

process_video.py

Splits a video (or folder of videos) into frames, runs the frames through run_detector_batch.py, and optionally stitches together results into a new video with detection boxes.

When possible, video processing happens in memory, without writing intermediate frames to disk. If the caller requests that frames be saved, frames are written before processing, and the MD results correspond to the frames that were written to disk (which simplifies, for example, repeat detection elimination).

class megadetector.detection.process_video.ProcessVideoOptions[source]

Bases: object

Options controlling the behavior of process_video()

augment: Enable image augmentation

checkpoint_frequency: Write a checkpoint file (to resume processing later) every N videos; set to -1 (default) to disable checkpointing

checkpoint_path: Path to checkpoint file; None (default) for auto-generation based on output filename

detector_options: Detector-specific options

exit_on_empty_video: By default, a video with no frames (or no frames retrievable with the current parameters) is silently stored as a failure; this causes it to halt execution.

frame_sample: Sample every Nth frame; set to None (default) or 1 to sample every frame. Typically we sample down to around 3 fps, so for typical 30 fps videos, frame_sample=10 is a typical value. Mutually exclusive with [time_sample].

image_size: Run the model at this image size (don’t mess with this unless you know what you’re getting into)… if you just want to pass smaller frames to MD, use max_width

input_video_file: Video (of folder of videos) to process

json_confidence_threshold: Detections below this threshold will not be included in the output file.

model_file

Can be a model filename (.pt or .pb) or a model name (e.g. “MDV5A”)

Use the string “no_detection” to indicate that you only want to extract frames, not run a model. If you do this, you almost definitely want to set keep_extracted_frames to “True”, otherwise everything in this module is a no-op. I.e., there’s no reason to extract frames, do nothing with them, then delete them.

output_json_file: .json file to which we should write results

recursive: If [input_video_file] is a folder, should we search for videos recursively?

resume_from_checkpoint: Resume from a checkpoint file, or “auto” to use the most recent checkpoint in the output directory

time_sample: Sample frames every N seconds. Mutually exclusive with [frame_sample]

verbose: Enable additional debug console output

megadetector.detection.process_video.process_videos(options)[source]

Process a video or folder of videos through MD.

Parameters:: options (ProcessVideoOptions) – all the parameters used to control this process, including filenames; see ProcessVideoOptions for details

process_video - CLI interface

Run MegaDetector on each frame (or every Nth frame) in a video (or folder of videos), optionally producing a new video with detections annotated

process_video [-h] [--recursive] [--output_json_file OUTPUT_JSON_FILE]
              [--json_confidence_threshold JSON_CONFIDENCE_THRESHOLD]
              [--frame_sample FRAME_SAMPLE] [--time_sample TIME_SAMPLE] [--verbose]
              [--image_size IMAGE_SIZE] [--augment] [--exit_on_empty_video]
              [--detector_options [KEY=VALUE ...]]
              [--checkpoint_frequency CHECKPOINT_FREQUENCY]
              [--checkpoint_path CHECKPOINT_PATH]
              [--resume_from_checkpoint RESUME_FROM_CHECKPOINT]
              model_file input_video_file

process_video positional arguments

model_file - MegaDetector model file (.pt or .pb) or model name (e.g. "MDV5A"), or the string "no_detection" to run just frame extraction
input_video_file - video file (or folder) to process

process_video options

-h, --help - show this help message and exit
--recursive - recurse into [input_video_file]; only meaningful if a folder is specified as input
--output_json_file OUTPUT_JSON_FILE - .json output file, defaults to [video file].json
--json_confidence_threshold JSON_CONFIDENCE_THRESHOLD - don’t include boxes in the .json file with confidence below this threshold (default 0.005)
--frame_sample FRAME_SAMPLE - process every Nth frame (defaults to every frame), mutually exclusive with --time_sample.
--time_sample TIME_SAMPLE - process frames every N seconds; this is converted to a frame sampling rate, so it may not be exactly the requested interval in seconds. mutually exclusive with --frame_sample
--verbose - Enable additional debug output
--image_size IMAGE_SIZE - Force image resizing to a specific integer size on the long axis (not recommended to change this)
--augment - Enable image augmentation
--exit_on_empty_video - By default, videos with no retrievable frames are stored as failures; thiscauses them to halt execution
--detector_options KEY=VALUE - Detector-specific options, as a space-separated list of key-value pairs
--checkpoint_frequency CHECKPOINT_FREQUENCY - Write a checkpoint file (to resume processing later) every N videos; set to -1 to disable checkpointing (default -1)
--checkpoint_path CHECKPOINT_PATH - Path to checkpoint file; defaults to a file in the same directory as the output file
--resume_from_checkpoint RESUME_FROM_CHECKPOINT - Resume from a specific checkpoint file, or "auto" to resume from the most recent checkpoint in the output directory

detection.run_inference_with_yolov5_val module

run_inference_with_yolov5_val.py

Runs a folder of images through MegaDetector (or another YOLOv5/YOLOv8 model) with YOLO’s val.py, converting the output to the standard MD format. The reasons this script exists, as an alternative to the standard run_detector_batch.py are:

This script provides access to YOLO’s test-time augmentation tools.
This script serves a reference implementation: by any reasonable definition, YOLOv5’s val.py produces the “correct” result for any image, since it matches what was used in training.
This script works for any Ultralytics detection model, including YOLOv8 models

YOLOv5’s val.py uses each file’s base name as a unique identifier, which doesn’t work when you have typical camera trap images like:

a/b/c/RECONYX0001.JPG
d/e/f/RECONYX0001.JPG

…both of which would just be “RECONYX0001.JPG”. So this script jumps through a bunch of hoops to put a symlinks in a flat folder, run YOLOv5 on that folder, and map the results back to the real files.

If you are running a YOLOv5 model, this script currently requires the caller to supply the path where a working YOLOv5 install lives, and assumes that the current conda environment is all set up for YOLOv5. If you are running a YOLOv8 model, the folder doesn’t matter, but it assumes that ultralytics tools are available in the current environment.

By default, this script uses symlinks to format the input images in a way that YOLO’s val.py likes, as per above. This requires admin privileges on Windows… actually technically this only requires permissions to create symbolic links, but I’ve never seen a case where someone has that permission and doesn’t have admin privileges. If you are running this script on Windows and you don’t have admin privileges, use –no_use_symlinks, which will make copies of images, rather than using symlinks.

class megadetector.detection.run_inference_with_yolov5_val.YoloInferenceOptions[source]

Bases: object

Parameters that control the behavior of run_inference_with_yolov5_val(), including the input/output filenames.

append_job_id_to_symlink_folder: By default, if we’re creating symlinks to images, we append a unique job ID to the symlink folder. If the caller is 100% sure that the symlink folder can be re-used across calls, this can be set to False.

augment: Should we enable test-time augmentation?

batch_size: Batch size… has no impact on results, but may create memory issues if you set this to large values

checkpoint_frequency: Maximum number of images to run in a single chunk

conf_thres: Detections below this threshold will not be included in the output file

device_string

typically ‘0’ for GPU 0, ‘1’ for GPU 1, etc., or ‘cpu’

Type:: Device string

half_precision_enabled: Should we enable half-precision inference?

image_filename_list

If this is None, [input_folder] can’t be None, we’ll process all images in [input_folder].

If this is not None, and [input_folder] is not None, this should be a list of relative image paths within [input_folder] to process, or a .txt or .json file containing a list of relative image paths.

If this is not None, and [input_folder] is None, this should be a list of absolute image paths, or a .txt or .json file containing a list of absolute image paths.

image_size

Image size to use; this is a single int, which in ultralytics’s terminology means “scale the long side of the image to this size, and preserve aspect ratio”.

If None, will choose based on whether augmentation is enabled.

input_folder: Folder of images to process (can be None if image_filename_list contains absolute paths)

model_filename: Model filename (ending in .pt), or a well-known model name (e.g. “MDV5A”)

model_type: Currently ‘yolov5’ and ‘ultralytics’ are supported, and really these are proxies for “the yolov5 repo” and “the ultralytics repo”.

offset_yolo_category_ids: By default, we turn category ID 0 coming out of the YOLO .json file into category 1 in the MD-formatted .json file.

output_file: .json output file, in MD results format

overwrite_handling

What should we do if the output file already exists?

Can be ‘error’, ‘skip’, or ‘overwrite’.

preview_yolo_command_only: If True, we’ll do a dry run that lets you preview the YOLO val command, without actually running it.

recursive

Whether to search for images recursively within [input_folder]

Ignored if a list of files is provided.

remove_symlink_folder: Should we remove the symlink folder when we’re done?

remove_yolo_results_folder: Should we remove the intermediate results folder when we’re done?

save_yolo_debug_output: Save YOLO console output

symlink_folder: If this is None, we’ll create a folder in system temp space.

treat_copy_failures_as_warnings: By default, if any errors occur while we’re copying images or creating symlinks, it’s game over. If this is True, those errors become warnings, and we plow ahead.

unique_id_strategy

How should we guarantee that YOLO IDs (base filenames) are unique? Choices are:

‘verify’: assume image IDs are unique, but verify and error if they’re not
‘links’: create symlinks (or copies, depending on use_symlinks) to enforce uniqueness
‘auto’: check whether IDs are unique, create links if necessary

use_symlinks: Should we use symlinks to give unique identifiers to image files (vs. copies)?

yolo_category_id_to_name

These are deliberately offset from the standard MD categories; YOLOv5 needs categories IDs to start at 0.

This can also be a string that points to any class mapping file supported by read_classes_from_yolo_dataset_file(): a YOLO dataset.yaml file, a text file with a list of classes, or a .json file with an ID –> name dict

yolo_results_folder

Temporary folder to stash intermediate YOLO results.

If this is None, we’ll create a folder in system temp space.

yolo_working_folder: Required for older YOLOv5 inference, not for newer ulytralytics/YOLOv8 inference

megadetector.detection.run_inference_with_yolov5_val.get_stats_for_category(filename, category='all')[source]

Retrieve statistics for a category from the YOLO console output stored in [filenam].

Parameters:

filename (str) – a text file containing console output from a YOLO val run
category (str, optional) – a category name

Returns:

a dict with fields n_images, n_labels, P, R, mAP50, and mAP50-95

Return type:

dict

megadetector.detection.run_inference_with_yolov5_val.run_inference_with_yolo_val(options)[source]

Runs a folder of images through MegaDetector (or another YOLOv5/YOLOv8 model) with YOLO’s val.py, converting the output to the standard MD format.

Parameters:: options (YoloInferenceOptions) – all the parameters used to control this process, including filenames; see YoloInferenceOptions for details

run_inference_with_yolov5_val - CLI interface

run_inference_with_yolov5_val [-h] [--image_filename_list IMAGE_FILENAME_LIST]
                              [--yolo_working_folder YOLO_WORKING_FOLDER]
                              [--image_size IMAGE_SIZE] [--conf_thres CONF_THRES]
                              [--batch_size BATCH_SIZE]
                              [--half_precision_enabled HALF_PRECISION_ENABLED]
                              [--device_string DEVICE_STRING]
                              [--overwrite_handling OVERWRITE_HANDLING]
                              [--yolo_dataset_file YOLO_DATASET_FILE]
                              [--model_type MODEL_TYPE]
                              [--unique_id_strategy UNIQUE_ID_STRATEGY]
                              [--symlink_folder SYMLINK_FOLDER]
                              [--yolo_results_folder YOLO_RESULTS_FOLDER] [--no_use_symlinks]
                              [--no_remove_symlink_folder] [--no_remove_yolo_results_folder]
                              [--save_yolo_debug_output]
                              [--checkpoint_frequency CHECKPOINT_FREQUENCY]
                              [--no_append_job_id_to_symlink_folder] [--nonrecursive]
                              [--no_offset_class_ids] [--preview_yolo_command_only]
                              [--augment_enabled AUGMENT_ENABLED]
                              model_filename input_folder output_file

run_inference_with_yolov5_val positional arguments

model_filename - model file name
input_folder - folder on which to recursively run the model, or a .json or .txt file containing a list of absolute image paths
output_file - .json file where output will be written

run_inference_with_yolov5_val options

-h, --help - show this help message and exit
--image_filename_list IMAGE_FILENAME_LIST - .json or .txt file containing a list of relative image filenames within [input_folder]
--yolo_working_folder YOLO_WORKING_FOLDER - folder in which to execute val.py (not necessary for YOLOv8 inference)
--image_size IMAGE_SIZE - image size for model execution (default 1664 when augmentation is enabled, else 1280)
--conf_thres CONF_THRES - confidence threshold for including detections in the output file (default 0.001)
--batch_size BATCH_SIZE - inference batch size (default 1)
--half_precision_enabled HALF_PRECISION_ENABLED - use half-precision-inference (1 or 0) (default is the underlying model’s default, probably full for YOLOv8 and half for YOLOv5
--device_string DEVICE_STRING - CUDA device specifier, typically "0" or "1" for CUDA devices, "mps" for M1/M2 devices, or "cpu" (default 0)
--overwrite_handling OVERWRITE_HANDLING - action to take if the output file exists (skip, error, overwrite) (default skip)
--yolo_dataset_file YOLO_DATASET_FILE - YOLOv5 dataset.yaml file from which we should load category information (otherwise defaults to MD categories)
--model_type MODEL_TYPE - model type ("yolov5" or "ultralytics" ("yolov8" behaves the same as "ultralytics")) (default yolov5)
--unique_id_strategy UNIQUE_ID_STRATEGY - how should we ensure that unique filenames are passed to the YOLO val script, can be "verify", "auto", or "links", see options class docs for details (default links)
--symlink_folder SYMLINK_FOLDER - temporary folder for symlinks (defaults to a folder in the system temp dir)
--yolo_results_folder YOLO_RESULTS_FOLDER - temporary folder for YOLO intermediate output (defaults to a folder in the system temp dir)
--no_use_symlinks - copy files instead of creating symlinks when preparing the yolo input folder
--no_remove_symlink_folder - don’t remove the temporary folder full of symlinks
--no_remove_yolo_results_folder - don’t remove the temporary folder full of YOLO intermediate files
--save_yolo_debug_output - write yolo console output to a text file in the results folder, along with additional debug files
--checkpoint_frequency CHECKPOINT_FREQUENCY - break the job into chunks with no more than this many images (default None)
--no_append_job_id_to_symlink_folder - don’t append a unique job ID to the symlink folder name
--nonrecursive - disable recursive folder processing
--no_offset_class_ids - disable class ID offsetting
--preview_yolo_command_only - don’t run inference, just preview the YOLO inference command (still creates symlinks)
--augment_enabled AUGMENT_ENABLED - enable/disable augmentation (default 0)

detection.run_tiled_inference module

run_tiled_inference.py

This script is experimental, YMMV.

Runs inference on a folder, fist splitting each image up into tiles of size MxN (typically the native inference size of your detector), writing those tiles out to a temporary folder, then de-duplicating the resulting detections before merging them back into a set of detections that make sense on the original images.

This approach will likely fail to detect very large animals, so if you expect both large and small animals (in terms of pixel size), this script is best used in conjunction with a traditional inference pass that looks at whole images.

Currently requires temporary storage at least as large as the input data, generally a lot more than that (depending on the overlap between adjacent tiles). This is inefficient, but easy to debug.

Programmatic invocation supports using YOLOv5’s inference scripts (and test-time augmentation); the command-line interface only supports standard inference right now.

megadetector.detection.run_tiled_inference.extract_patch_from_image(im, patch_xy, patch_size, patch_image_fn=None, patch_folder=None, image_name=None, overwrite=True)[source]

Extracts a patch from the provided image, and writes that patch out to a new file.

Parameters:

im (str or Image) – image from which we should extract a patch, can be a filename or a PIL Image object.
patch_xy (tuple) – length-2 tuple of ints (x,y) representing the upper-left corner of the patch to extract
patch_size (tuple) – length-2 tuple of ints (w,h) representing the size of the patch to extract
patch_image_fn (str, optional) – image filename to write the patch to; if this is None the filename will be generated from [image_name] and the patch coordinates
patch_folder (str, optional) – folder in which the image lives; only used to generate a patch filename, so only required if [patch_image_fn] is None
image_name (str, optional) – the identifier of the source image; only used to generate a patch filename, so only required if [patch_image_fn] is None
overwrite (bool, optional) – whether to overwrite an existing patch image

Returns:

a dictionary with fields xmin,xmax,ymin,ymax,patch_fn

Return type:

dict

megadetector.detection.run_tiled_inference.get_patch_boundaries(image_size, patch_size, patch_stride=None)[source]

Computes a list of patch starting coordinates (x,y) given an image size (w,h) and a stride (x,y)

Patch size is guaranteed, but the stride may deviate to make sure all pixels are covered. I.e., we move by regular strides until the current patch walks off the right/bottom, at which point it backs up to one patch from the end. So if your image is 15 pixels wide and you have a stride of 10 pixels, you will get starting positions of 0 (from 0 to 9) and 5 (from 5 to 14).

Parameters:

image_size (tuple) – size of the image you want to divide into patches, as a length-2 tuple (w,h)
patch_size (tuple) – patch size into which you want to divide an image, as a length-2 tuple (w,h)
patch_stride (tuple or float, optional) – stride between patches, as a length-2 tuple (x,y), or a float; if this is a float, it’s interpreted as the stride relative to the patch size (0.1 == 10% stride). Defaults to half the patch size.

Returns:

list of length-2 tuples, each representing the x/y start position of a patch

Return type:

list

megadetector.detection.run_tiled_inference.in_place_nms(md_results, iou_thres=0.45, verbose=True)[source]

Run torch.ops.nms in-place on MD-formatted detection results.

Parameters:

md_results (dict) – detection results for a list of images, in MD results format (i.e., containing a list of image dicts with the key ‘images’, each of which has a list of detections with the key ‘detections’)
iou_thres (float, optional) – IoU threshold above which we will treat two detections as redundant
verbose (bool, optional) – enable additional debug console output

megadetector.detection.run_tiled_inference.patch_info_to_patch_name(image_name, patch_x_min, patch_y_min)[source]

Gives a unique string name to an x/y coordinate, e.g. turns (“a.jpg”,10,20) into “a.jpg_0010_0020”.

Parameters:

image_name (str) – image identifier
patch_x_min (int) – x coordinate
patch_y_min (int) – y coordinate

Returns:

name for this patch, e.g. “a.jpg_0010_0020”

Return type:

str

megadetector.detection.run_tiled_inference.run_tiled_inference(model_file, image_folder, tiling_folder, output_file, tile_size_x=1280, tile_size_y=1280, tile_overlap=0.5, checkpoint_path=None, checkpoint_frequency=-1, remove_tiles=False, yolo_inference_options=None, n_patch_extraction_workers=1, overwrite_tiles=True, image_list=None, augment=False, detector_options=None, use_image_queue=True, preprocess_on_image_queue=True, loader_workers=4, inference_size=None, verbose=False, pool_type=None, load_cached_tiles_if_available=False, create_tiles_only=False)[source]

Runs inference using [model_file] on the images in [image_folder], fist splitting each image up into tiles of size [tile_size_x] x [tile_size_y], writing those tiles to [tiling_folder], then de-duplicating the results before merging them back into a set of detections that make sense on the original images and writing those results to [output_file].

[tiling_folder] can be any folder, but this function reserves the right to do whatever it wants within that folder, including deleting everything, so it’s best if it’s a new folder. Conceptually this folder is temporary, it’s just helpful in this case to not actually use the system temp folder, because the tile cache may be very large, so the caller may want it to be on a specific drive. If this is None, a new folder will be created in system temp space.

tile_overlap is the fraction of overlap between tiles.

Optionally removes the temporary tiles.

if yolo_inference_options is supplied, it should be an instance of YoloInferenceOptions; in this case the model will be run with run_inference_with_yolov5_val. The following members in the YoloInference options object will be over-written by the corresponding parameters to this function: input_folder, model_filename, output_file.

Parameters:

model_file (str) – model filename (ending in .pt), or a well-known model name (e.g. “MDV5A”)
image_folder (str) – the folder of images to proess (always recursive)
tiling_folder (str) – folder for temporary tile storage; see caveats above. Can be None to use system temp space.
output_file (str) – .json file to which we should write MD-formatted results
tile_size_x (int, optional) – tile width
tile_size_y (int, optional) – tile height
tile_overlap (float, optional) – overlap between adjacent tiles, as a fraction of the tile size
checkpoint_path (str, optional) – checkpoint path; passed directly to run_detector_batch; see run_detector_batch for details
checkpoint_frequency (int, optional) – checkpoint frequency; passed directly to run_detector_batch; see run_detector_batch for details
remove_tiles (bool, optional) – whether to delete the tiles when we’re done
yolo_inference_options (YoloInferenceOptions, optional) – if not None, will run inference with run_inference_with_yolov5_val.py, rather than with run_detector_batch.py, using these options
n_patch_extraction_workers (int, optional) – number of workers to use for patch extraction; set to <= 1 to disable parallelization
overwrite_tiles (bool, optional) – whether to overwrite image files for individual tiles if they exist
image_list (list, optional) – .json file containing a list of specific images to process. If this is supplied, and the paths are absolute, [image_folder] will be ignored. If this is supplied, and the paths are relative, they should be relative to [image_folder]
augment (bool, optional) – apply test-time augmentation
detector_options (dict, optional) – parameters to pass to run_detector, only relevant if yolo_inference_options is None
use_image_queue (bool, optional) – whether to use a loader worker queue, only relevant if yolo_inference_options is None
preprocess_on_image_queue (bool, optional) – whether the image queue should also be responsible for preprocessing
loader_workers (int, optional) – number of preprocessing loader workers to use
inference_size (int, optional) – override the default inference image size, only relevant if yolo_inference_options is None
verbose (bool, optional) – enable additional debug output
pool_type (str, optional) – ‘thread’ or ‘process’, or None to use the default (threads)
load_cached_tiles_if_available (bool, optional) – if we find tile information in the tiling folder from a previous call to this function, load tile information rather than re-tiling.
create_tiles_only (bool, optional) – return after creating tiles, before running inference

Returns:

MD-formatted results dictionary, identical to what’s written to [output_file]

Return type:

dict

run_tiled_inference - CLI interface

Chop a folder of images up into tiles, run MD on the tiles, and stitch the results together

run_tiled_inference [-h] [--no_remove_tiles] [--augment] [--verbose]
                    [--tile_size_x TILE_SIZE_X] [--tile_size_y TILE_SIZE_Y]
                    [--tile_overlap TILE_OVERLAP] [--overwrite_handling OVERWRITE_HANDLING]
                    [--image_list IMAGE_LIST] [--detector_options DETECTOR_OPTIONS]
                    [--inference_size INFERENCE_SIZE]
                    [--n_patch_extraction_workers N_PATCH_EXTRACTION_WORKERS]
                    [--loader_workers LOADER_WORKERS]
                    model_file image_folder tiling_folder output_file

run_tiled_inference positional arguments

model_file - Path to detector model file (.pb or .pt)
image_folder - Folder containing images for inference (always recursive, unless image_list is supplied)
tiling_folder - Temporary folder where tiles and intermediate results will be stored
output_file - Path to output JSON results file, should end with a .json extension

run_tiled_inference options

-h, --help - show this help message and exit
--no_remove_tiles - Tiles are removed by default; this option suppresses tile deletion
--augment - Enable test-time augmentation
--verbose - Enable additional debug output
--tile_size_x TILE_SIZE_X - Tile width (defaults to 1280)
--tile_size_y TILE_SIZE_Y - Tile height (defaults to 1280)
--tile_overlap TILE_OVERLAP - Overlap between tiles [0,1] (defaults to 0.5)
--overwrite_handling OVERWRITE_HANDLING - Behavior when the target file exists (skip/overwrite/error) (default skip)
--image_list IMAGE_LIST - A .json list of relative filenames (or absolute paths contained within image_folder) to include
--detector_options DETECTOR_OPTIONS - A list of detector options (key-value pairs)
--inference_size INFERENCE_SIZE - Run inference at a non-default size
--n_patch_extraction_workers N_PATCH_EXTRACTION_WORKERS - Number of workers to use for patch extraction
--loader_workers LOADER_WORKERS - Number of workers to use for image loading and preprocessing (0 to disable)

detection.run_md_and_speciesnet module

run_md_and_speciesnet.py

Script to run MegaDetector and SpeciesNet on a folder of images and/or videos. Runs MD first, then runs SpeciesNet on every above-threshold crop.

class megadetector.detection.run_md_and_speciesnet.CropBatch[source]

Bases: object

A batch of crops with their metadata for classification.

add_crop(crop_data, metadata)[source]

Parameters:

crop_data (PreprocessedImage) – preprocessed image data from SpeciesNetClassifier.preprocess()
metadata (CropMetadata) – metadata for this crop

crops: List of preprocessed images

metadata: List of CropMetadata objects

class megadetector.detection.run_md_and_speciesnet.CropMetadata(image_file: str, detection_index: int, bbox: list[float], original_width: int, original_height: int)[source]

Bases: object

Metadata for a crop extracted from an image detection.

class megadetector.detection.run_md_and_speciesnet.RunMDSpeciesNetOptions[source]

Bases: object

Class controlling the behavior of run_md_and_speciesnet()

admin1_region: Admin1 region/state code for geofencing

classification_model

google/speciesnet/pyTorch/v4.0.2a)

Type:: SpeciesNet classifier model identifier (e.g. kaggle

classifier_batch_size: Batch size for SpeciesNet classification

country: Country code (ISO 3166-1 alpha-3) for geofencing (default None, no geoferencing)

detection_confidence_threshold_for_classification: Classify detections above this threshold

detection_confidence_threshold_for_output: Include detections above this threshold in the output

detections_file: Path to existing MegaDetector output file (skips detection step)

detector_batch_size: Batch size for MegaDetector inference

detector_model: MegaDetector model identifier (MDv5a, MDv5b, MDv1000-redwood, etc.)

frame_sample

Sample every Nth frame from videos

Mutually exclusive with time_sample

include_raw_classifications: Include raw (pre-rollup/geofence) classification scores in output

intermediate_file_folder

system temp)

Type:: Folder for intermediate files (default

keep_intermediate_files: Keep intermediate files (e.g. detection-only results file)

loader_workers: Number of worker threads for preprocessing

norollup: Disable taxonomic rollup

output_file: Output file for results (JSON format)

overwrite_handling: What to do if the output file exists (‘overwrite’, ‘error’, ‘skip’)

rollup_target_confidence: Target confidence threshold for taxonomic rollup

skip_images: Ignore images, only process videos

skip_video: Ignore videos, only process images

source: Folder containing images and/or videos to process

time_sample

Sample frames every N seconds from videos

Mutually exclusive with frame_sample

verbose: Enable additional debug output

worker_type: Worker type for parallelization; should be “thread” or “process”

megadetector.detection.run_md_and_speciesnet.run_md_and_speciesnet(options)[source]

Main entry point, runs MegaDetector and SpeciesNet on a folder. See RunMDSpeciesNetOptions for available arguments.

Parameters:: options (RunMDSpeciesNetOptions) – options controlling MD and SN inference

run_md_and_speciesnet - CLI interface

Run MegaDetector and SpeciesNet on a folder of images/videos

run_md_and_speciesnet [-h] [--detector_model DETECTOR_MODEL]
                      [--classification_model CLASSIFICATION_MODEL]
                      [--detector_batch_size DETECTOR_BATCH_SIZE]
                      [--classifier_batch_size CLASSIFIER_BATCH_SIZE]
                      [--loader_workers LOADER_WORKERS]
                      [--detection_confidence_threshold_for_classification DETECTION_CONFIDENCE_THRESHOLD_FOR_CLASSIFICATION]
                      [--detection_confidence_threshold_for_output DETECTION_CONFIDENCE_THRESHOLD_FOR_OUTPUT]
                      [--intermediate_file_folder INTERMEDIATE_FILE_FOLDER]
                      [--keep_intermediate_files] [--norollup]
                      [--rollup_target_confidence ROLLUP_TARGET_CONFIDENCE]
                      [--country COUNTRY] [--admin1_region ADMIN1_REGION]
                      [--detections_file DETECTIONS_FILE] [--skip_video] [--skip_images]
                      [--frame_sample FRAME_SAMPLE] [--time_sample TIME_SAMPLE] [--verbose]
                      [--include_raw_classifications]
                      source output_file

run_md_and_speciesnet positional arguments

source - Folder containing images and/or videos to process
output_file - Output file for results (JSON format)

run_md_and_speciesnet options

-h, --help - show this help message and exit
--detector_model DETECTOR_MODEL - MegaDetector model identifier
--classification_model CLASSIFICATION_MODEL - SpeciesNet classifier model identifier
--detector_batch_size DETECTOR_BATCH_SIZE - Batch size for MegaDetector inference
--classifier_batch_size CLASSIFIER_BATCH_SIZE - Batch size for SpeciesNet classification
--loader_workers LOADER_WORKERS - Number of worker threads for preprocessing
--detection_confidence_threshold_for_classification DETECTION_CONFIDENCE_THRESHOLD_FOR_CLASSIFICATION - Classify detections above this threshold
--detection_confidence_threshold_for_output DETECTION_CONFIDENCE_THRESHOLD_FOR_OUTPUT - Include detections above this threshold in the output
--intermediate_file_folder INTERMEDIATE_FILE_FOLDER - Folder for intermediate files (default: system temp)
--keep_intermediate_files - Keep intermediate files (e.g. detection-only results file)
--norollup - Disable taxonomic rollup
--rollup_target_confidence ROLLUP_TARGET_CONFIDENCE - Target confidence threshold for taxonomic rollup (default 0.65), only used when geofencing is disabled
--country COUNTRY - Country code (ISO 3166-1 alpha-3) for geofencing
--admin1_region ADMIN1_REGION, --state ADMIN1_REGION - Admin1 region/state code for geofencing
--detections_file DETECTIONS_FILE - Path to existing MegaDetector output file (skips detection step)
--skip_video - Ignore videos, only process images
--skip_images - Ignore images, only process videos
--frame_sample FRAME_SAMPLE - Sample every Nth frame from videos (mutually exclusive with --time_sample)
--time_sample TIME_SAMPLE - Sample frames every N seconds from videos (default 1.0) (mutually exclusive with --frame_sample)
--verbose - Enable additional debug output
--include_raw_classifications - Include raw (pre-rollup/geofence) classification scores in output

detection.video_utils module

video_utils.py

Utilities for splitting, rendering, and assembling videos.

class megadetector.detection.video_utils.FrameToVideoOptions[source]

Bases: object

Options controlling the conversion of frame-level results to video-level results via frame_results_to_video_results()

frame_rates_are_required: Are frame rates required?

include_all_processed_frames: Should we include just a single representative frame result for each video (default), or every frame that was processed?

non_video_behavior: What to do if a file referred to in a .json results file appears not to be a video; can be ‘error’ or ‘skip_with_warning’

nth_highest_confidence: One-indexed indicator of which frame-level confidence value to use to determine detection confidence for the whole video, i.e. “1” means “use the confidence value from the highest-confidence frame”

verbose: Enable additional debug output

megadetector.detection.video_utils.find_video_strings(strings)[source]

Given a list of strings that are potentially video file names, looks for strings that actually look like video file names (based on extension).

Parameters:: strings (list) – list of strings to check for video-ness
Returns:: a subset of [strings] that looks like they are video filenames
Return type:: list

megadetector.detection.video_utils.find_videos(dirname, recursive=False, convert_slashes=True, return_relative_paths=False)[source]

Finds all files in a directory that look like video file names.

Parameters:

dirname (str) – folder to search for video files
recursive (bool, optional) – whether to search [dirname] recursively
convert_slashes (bool, optional) – forces forward slashes in the returned files, otherwise uses the native path separator
return_relative_paths (bool, optional) – forces the returned filenames to be relative to [dirname], otherwise returns absolute paths

Returns:

A list of filenames within [dirname] that appear to be videos

megadetector.detection.video_utils.frame_results_to_video_results(input_file, output_file, options=None, video_filename_to_frame_rate=None)[source]

Given an MD results file produced at the frame level, corresponding to a directory created with video_folder_to_frames, maps those frame-level results back to the video level for use in Timelapse.

Preserves everything in the input .json file other than the images.

Parameters:

input_file (str) – the frame-level MD results file to convert to video-level results
output_file (str) – the .json file to which we should write video-level results
options (FrameToVideoOptions, optional) – parameters for converting frame-level results to video-level results, see FrameToVideoOptions for details
video_filename_to_frame_rate (dict, optional) – maps (relative) video path names to frame rates, used only to populate the output file

megadetector.detection.video_utils.frames_to_video(images, fs, output_file_name, codec_spec='h264')[source]

Given a list of image files and a sample rate, concatenates those images into a video and writes to a new video file.

Parameters:

images (list) – a list of frame file names to concatenate into a video
fs (float) – the frame rate in fps
output_file_name (str) – the output video file, no checking is performed to make sure the extension is compatible with the codec
codec_spec (str, optional) – codec to use for encoding; h264 is a sensible default and generally works on Windows, but when this fails (which is around 50% of the time on Linux), mp4v is a good second choice

megadetector.detection.video_utils.get_video_fs(input_video_file, verbose=False)[source]

Retrieves the frame rate of [input_video_file].

Parameters:

input_video_file (str) – video file for which we want the frame rate
verbose (bool, optional) – enable additional debug output

Returns:

the frame rate of [input_video_file], or None if no frame: rate could be extracted

Return type:

float

megadetector.detection.video_utils.is_video_file(s, video_extensions=('.mp4', '.avi', '.mpeg', '.mpg', '.mov', '.mkv', '.flv'))[source]

Checks a file’s extension against a set of known video file extensions to determine whether it’s a video file. Performs a case-insensitive comparison.

Parameters:

s (str) – filename to check for probable video-ness
video_extensions (list, optional) – list of video file extensions

Returns:

True if this looks like a video file, else False

Return type:

bool

megadetector.detection.video_utils.open_video(video_path, verbose=False)[source]

Open the video at [video_path], trying multiple OpenCV backends if necessary.

Parameters:

video_path (str) – the file to open
verbose (bool, optional) – enable additional debug output

Returns:

a tuple containing (a) the open video capture device (or None if no backends succeeded) and (b) the first frame of the video (or None)

Return type:

(cv2.VideoCapture,image)

megadetector.detection.video_utils.run_callback_on_frames(input_video_file, frame_callback, every_n_frames=None, verbose=False, frames_to_process=None, allow_empty_videos=False)[source]

Calls the function frame_callback(np.array,image_id) on all (or selected) frames in [input_video_file].

Parameters:

input_video_file (str) – video file to process
frame_callback (function) – callback to run on frames, should take an np.array and a string and return a single value. callback should expect two arguments: (1) a numpy array with image data, in the typical PIL image orientation/channel order, and (2) a string identifier for the frame, typically something like “frame0006.jpg” (even though it’s not a JPEG image, this is just an identifier for the frame).
every_n_frames (int or float, optional) – sample every Nth frame starting from the first frame; if this is None or 1, every frame is processed. If this is a negative value, it’s interpreted as a sampling rate in seconds, which is rounded to the nearest frame sampling rate. Mutually exclusive with frames_to_process.
verbose (bool, optional) – enable additional debug console output
frames_to_process (list of int, optional) – process this specific set of frames; mutually exclusive with every_n_frames. If all values are beyond the length of the video, no frames are extracted. Can also be a single int, specifying a single frame number.
allow_empty_videos (bool, optional) – Just print a warning if a video appears to have no frames (by default, this raises an Exception).

Returns:

dict with keys ‘frame_filenames’ (list), ‘frame_rate’ (float), ‘results’ (list). ‘frame_filenames’ are synthetic filenames (e.g. frame000000.jpg). Elements in ‘results’ are whatever is returned by the callback, typically dicts in the same format used in the ‘images’ array in the MD results format. [frame_filenames] and [results] both have one element per processed frame.

Return type:

dict

megadetector.detection.video_utils.run_callback_on_frames_for_folder(input_video_folder, frame_callback, every_n_frames=None, verbose=False, recursive=True, files_to_process_relative=None, error_on_empty_video=False)[source]

Calls the function frame_callback(np.array,image_id) on all (or selected) frames in all videos in [input_video_folder].

Parameters:

input_video_folder (str) – video folder to process
frame_callback (function) – callback to run on frames, should take an np.array and a string and return a single value. callback should expect two arguments: (1) a numpy array with image data, in the typical PIL image orientation/channel order, and (2) a string identifier for the frame, typically something like “frame0006.jpg” (even though it’s not a JPEG image, this is just an identifier for the frame).
every_n_frames (int or float, optional) – sample every Nth frame starting from the first frame; if this is None or 1, every frame is processed. If this is a negative value, it’s interpreted as a sampling rate in seconds, which is rounded to the nearest frame sampling rate.
verbose (bool, optional) – enable additional debug console output
recursive (bool, optional) – recurse into [input_video_folder]
files_to_process_relative (list, optional) – only process specific relative paths
error_on_empty_video (bool, optional) – by default, videos with errors or no valid frames are silently stored as failures; this turns them into exceptions

Returns:

dict with keys ‘video_filenames’ (list of str), ‘frame_rates’ (list of floats), ‘results’ (list of list of dicts). ‘video_filenames’ will contain relative filenames. ‘results’ is a list (one element per video) of lists (one element per frame) of whatever the callback returns, typically (but not necessarily) dicts in the MD results format.

For failed videos, the frame rate will be represented by -1, and “results” will be a dict with at least the key “failure”.

Return type:

dict

megadetector.detection.video_utils.video_folder_to_frames(input_folder, output_folder_base, recursive=True, overwrite=True, n_threads=1, every_n_frames=None, verbose=False, parallelization_uses_threads=True, quality=None, max_width=None, frames_to_extract=None, allow_empty_videos=False, relative_paths_to_process=None)[source]

For every video file in input_folder, creates a folder within output_folder_base, and renders frame of that video to images in that folder.

Parameters:

input_folder (str) – folder to process
output_folder_base (str) – root folder for output images; subfolders will be created for each input video
recursive (bool, optional) – whether to recursively process videos in [input_folder]
overwrite (bool, optional) – whether to overwrite existing frame images
n_threads (int, optional) – number of concurrent workers to use; set to <= 1 to disable parallelism
every_n_frames (int or float, optional) – sample every Nth frame starting from the first frame; if this is None or 1, every frame is extracted. If this is a negative value, it’s interpreted as a sampling rate in seconds, which is rounded to the nearest frame sampling rate. Mutually exclusive with frames_to_extract.
verbose (bool, optional) – enable additional debug console output
parallelization_uses_threads (bool, optional) – whether to use threads (True) or processes (False) for parallelization; ignored if n_threads <= 1
quality (int, optional) – JPEG quality for frame output, from 0-100. Defaults to the opencv default (typically 95).
max_width (int, optional) – resize frames to be no wider than [max_width]
frames_to_extract (int, list of int, or dict, optional) – extract this specific set of frames from each video; mutually exclusive with every_n_frames. If all values are beyond the length of a video, no frames are extracted. Can also be a single int, specifying a single frame number. In the special case where frames_to_extract is [], this function still reads video frame rates and verifies that videos are readable, but no frames are extracted. Can be a dict mapping relative paths to lists of frame numbers to extract different frames from each video.
allow_empty_videos (bool, optional) – just print a warning if a video appears to have no frames (by default, this is an error).
relative_paths_to_process (list, optional) – only process the relative paths on this list

Returns:

a length-3 tuple containing:

list of lists of frame filenames; the Nth list of frame filenames corresponds to the Nth video
list of video frame rates; the Nth value corresponds to the Nth video
list of video filenames

Return type:

tuple

megadetector.detection.video_utils.video_to_frames(input_video_file, output_folder, overwrite=True, every_n_frames=None, verbose=False, quality=None, max_width=None, frames_to_extract=None, allow_empty_videos=True)[source]

Renders frames from [input_video_file] to .jpg files in [output_folder].

With help from:

https://stackoverflow.com/questions/33311153/python-extracting-and-saving-video-frames

Parameters:

input_video_file (str) – video file to split into frames
output_folder (str) – folder to put frame images in
overwrite (bool, optional) – whether to overwrite existing frame images
every_n_frames (int, optional) – sample every Nth frame starting from the first frame; if this is None or 1, every frame is extracted. If this is a negative value, it’s interpreted as a sampling rate in seconds, which is rounded to the nearest frame sampling rate. Mutually exclusive with frames_to_extract.
verbose (bool, optional) – enable additional debug console output
quality (int, optional) – JPEG quality for frame output, from 0-100. Defaults to the opencv default (typically 95).
max_width (int, optional) – resize frames to be no wider than [max_width]
frames_to_extract (list of int, optional) – extract this specific set of frames; mutually exclusive with every_n_frames. If all values are beyond the length of the video, no frames are extracted. Can also be a single int, specifying a single frame number. In the special case where frames_to_extract is [], this function still reads video frame rates and verifies that videos are readable, but no frames are extracted.
allow_empty_videos (bool, optional) – Just print a warning if a video appears to have no frames (by default, this is an error).

Returns:

length-2 tuple containing (list of frame filenames,frame rate)

Return type:

tuple