detection package
This package contains tools for running object detectors, especially MegaDetector. With 90% probability, if you’re looking at this documentation, the function you’re looking for is run_detector_batch.run_detector_batch.
Submodules
detection.run_detector_batch module
run_detector_batch.py
Module to run MegaDetector on lots of images, writing the results to a file in the MegaDetector results format.
https://lila.science/megadetector-output-format
This enables the results to be used in our post-processing pipeline; see postprocess_batch_results.py.
This script can save results to checkpoints intermittently, in case disaster strikes. To enable this, set –checkpoint_frequency to n > 0, and results will be saved as a checkpoint every n images. Checkpoints will be written to a file in the same directory as the output_file, and after all images are processed and final results file written to output_file, the temporary checkpoint file will be deleted. If you want to resume from a checkpoint, set the checkpoint file’s path using –resume_from_checkpoint.
Has multiprocessing support for CPUs only; if a GPU is available, it will use the GPU instead of CPUs, and the –ncores option will be ignored. Checkpointing is not supported when using a GPU.
The lack of GPU multiprocessing support might sound annoying, but in practice we run a gazillion MegaDetector images on multiple GPUs using this script, we just only use one GPU per invocation of this script. Dividing a list of images into one chunk per GPU happens outside of this script.
Does not have a command-line option to bind the process to a particular GPU, but you can prepend with “CUDA_VISIBLE_DEVICES=0 “, for example, to bind to GPU 0, e.g.:
CUDA_VISIBLE_DEVICES=0 python detection/run_detector_batch.py md_v4.1.0.pb ~/data ~/mdv4test.json
You can disable GPU processing entirely by setting CUDA_VISIBLE_DEVICES=’’.
- megadetector.detection.run_detector_batch.get_image_datetime(image)[source]
Reads EXIF datetime from a PIL Image object.
- Parameters:
image (Image) – the PIL Image object from which we should read datetime information
- Returns:
the EXIF datetime from [image] (a PIL Image object), if available, as a string; returns None if EXIF datetime is not available.
- Return type:
str
- megadetector.detection.run_detector_batch.load_and_run_detector_batch(model_file, image_file_names, checkpoint_path=None, confidence_threshold=0.005, checkpoint_frequency=-1, results=None, n_cores=1, use_image_queue=False, quiet=False, image_size=None, class_mapping_filename=None, include_image_size=False, include_image_timestamp=False, include_exif_tags=None, augment=False, force_model_download=False, detector_options=None, loader_workers=4, preprocess_on_image_queue=False, batch_size=1, verbose_output=False, use_threads_for_queue=False)[source]
Load a model file and run it on a list of images.
- Parameters:
model_file (str) – path to model file, or supported model string (e.g. “MDV5A”)
image_file_names (list or str) – list of strings (image filenames), a single image filename, a folder to recursively search for images in, or a .json or .txt file containing a list of images.
checkpoint_path (str, optional) – path to use for checkpoints (if None, checkpointing is disabled)
confidence_threshold (float, optional) – only detections above this threshold are returned
checkpoint_frequency (int, optional) – int, write results to JSON checkpoint file every N images, -1 disabled checkpointing
results (list, optional) – list of dicts, existing results loaded from checkpoint; generally not useful if you’re using this function outside of the CLI
n_cores (int, optional) – number of parallel worker to use, ignored if we’re running on a GPU
use_image_queue (bool, optional) – use a dedicated worker for image loading
quiet (bool, optional) – disable per-image console output
image_size (int, optional) – image size to use for inference, only mess with this if (a) you’re using a model other than MegaDetector or (b) you know what you’re doing
class_mapping_filename (str, optional) – use a non-default class mapping supplied in a .json file or YOLOv5 dataset.yaml file
include_image_size (bool, optional) – should we include image size in the output for each image?
include_image_timestamp (bool, optional) – should we include image timestamps in the output for each image?
include_exif_tags (str, optional) – comma-separated list of EXIF tags to include in output
augment (bool, optional) – enable image augmentation
force_model_download (bool, optional) – force downloading the model file if a named model (e.g. “MDV5A”) is supplied, even if the local file already exists
detector_options (dict, optional) – key/value pairs that are interpreted differently by different detectors. Can also be a list of k=v pairs, or a comma-delimited string containing a list of k=v pairs.
loader_workers (int, optional) – number of loaders to use, only relevant when use_image_queue is True
preprocess_on_image_queue (bool, optional) – if the image queue is enabled, should it handle image loading and preprocessing (True), or just image loading (False)?
batch_size (int, optional) – batch size for GPU processing, automatically set to 1 for CPU processing
verbose_output (bool, optional) – enable additional debug output
use_threads_for_queue (bool, optional) – use threads (rather than processes) for the data loading workers
- Returns:
list of dicts; each dict represents detections on one image
- Return type:
results
- megadetector.detection.run_detector_batch.load_checkpoint(checkpoint_path)[source]
Loads results from a checkpoint file. A checkpoint file is always a dict with the key “checkpoint”.
- Parameters:
checkpoint_path (str) – the .json file to load
- Returns:
object retrieved from the checkpoint, typically a list of results
- Return type:
object
- megadetector.detection.run_detector_batch.write_checkpoint(checkpoint_path, results)[source]
Writes the object in [results] to a json checkpoint file, as a dict with the key “checkpoint”. First backs up the checkpoint file if it exists, in case we crash while writing the file.
- Parameters:
checkpoint_path (str) – the file to write the checkpoint to
results (object) – the object we should write
- megadetector.detection.run_detector_batch.write_results_to_file(results, output_file, relative_path_base=None, detector_file=None, info=None, include_max_conf=False, custom_metadata=None, force_forward_slashes=True)[source]
Writes list of detection results to JSON output file. Format matches:
https://lila.science/megadetector-output-format
- Parameters:
results (list) – list of dict, each dict represents detections on one image
output_file (str) – path to JSON output file, should end in ‘.json’
relative_path_base (str, optional) – path to a directory as the base for relative paths, can be None if the paths in [results] are absolute
detector_file (str, optional) – filename of the detector used to generate these results, only used to pull out a version number for the “info” field
info (dict, optional) – dictionary to put in the results file instead of the default “info” field
include_max_conf (bool, optional) – old files (version 1.2 and earlier) included a “max_conf” field in each image; this was removed in version 1.3. Set this flag to force the inclusion of this field.
custom_metadata (object, optional) – additional data to include as info[‘custom_metadata’]; typically a dictionary, but no type/format checks are performed
force_forward_slashes (bool, optional) – convert all slashes in filenames within [results] to forward slashes
- Returns:
the MD-formatted dictionary that was written to [output_file]
- Return type:
dict
run_detector_batch - CLI interface
Module to run a TF/PT animal detection model on lots of images
run_detector_batch [-h] [--recursive] [--output_relative_filenames] [--include_max_conf]
[--verbose] [--image_size IMAGE_SIZE] [--augment] [--use_image_queue]
[--preprocess_on_image_queue] [--use_threads_for_queue]
[--threshold THRESHOLD] [--checkpoint_frequency CHECKPOINT_FREQUENCY]
[--checkpoint_path CHECKPOINT_PATH]
[--resume_from_checkpoint RESUME_FROM_CHECKPOINT]
[--allow_checkpoint_overwrite] [--ncores NCORES]
[--loader_workers LOADER_WORKERS]
[--class_mapping_filename CLASS_MAPPING_FILENAME] [--include_image_size]
[--include_image_timestamp] [--include_exif_tags INCLUDE_EXIF_TAGS]
[--overwrite_handling OVERWRITE_HANDLING] [--force_model_download]
[--previous_results_file PREVIOUS_RESULTS_FILE]
[--detector_options [KEY=VALUE ...]] [--batch_size BATCH_SIZE]
detector_file image_file output_file
run_detector_batch positional arguments
detector_file- Path to detector model file (.pb or .pt). Can also be the strings"MDV4","MDV5A", or"MDV5B"to request automatic download.image_file- Path to a single image file, a .json or .txt file containing a list of paths to images, or a directoryoutput_file- Path to output JSON results file, should end with a .json extension
run_detector_batch options
--recursive- Recurse into directories, only meaningful if image_file points to a directory--output_relative_filenames- Output relative file names, only meaningful if image_file points to a directory--include_max_conf- Include the"max_detection_conf"field in the output--verbose- Enable additional debug output--image_sizeIMAGE_SIZE- Force image resizing to a specific integer size on the long axis (not recommended to change this)--augment- Enable image augmentation--use_image_queue- Pre-load images, may help keep your GPU busy; does not currently support checkpointing. Useful if you have a very fast GPU and a very slow disk.--preprocess_on_image_queue- Whether to do image resizing on the image queue (PyTorch detectors only)--use_threads_for_queue- Use threads (rather than processes) for the image queue; only relevant if –use_image_queue is set--thresholdTHRESHOLD- Confidence threshold between 0 and 1.0, don’t include boxes below this confidence in the output file. Default is 0.005--checkpoint_frequencyCHECKPOINT_FREQUENCY- Write results to a temporary file every N images; default is -1, which disables this feature--checkpoint_pathCHECKPOINT_PATH- File name to which checkpoints will be written if checkpoint_frequency is > 0, defaults to md_checkpoint_[date].json in the same folder as the output file--resume_from_checkpointRESUME_FROM_CHECKPOINT- Path to a JSON checkpoint file to resume from, or"auto"to find the most recent checkpoint in the same folder as the output file."auto"usescheckpoint_path (rather than searching the output folder) if checkpoint_path is specified.--allow_checkpoint_overwrite- By default, this script will bail if the specified checkpoint file already exists; this option allows it to overwrite existing checkpoints--ncoresNCORES- Number of cores to use for inference; only applies to CPU-based inference (default 1)--loader_workersLOADER_WORKERS- Number of image loader workers to use; only relevant when –use_image_queue is set (default 4)--class_mapping_filenameCLASS_MAPPING_FILENAME- Use a non-default class mapping, supplied in a .json file with a dictionary mappingint-strings to strings. This will also disable the addition of"1"to all category IDs, so your class mapping should start at zero. Can also be a YOLOv5 dataset.yaml file.--include_image_size- Include image dimensions in output file--include_image_timestamp- Include image datetime (if available) in output file--overwrite_handlingOVERWRITE_HANDLING- What should we do if the output file exists? overwrite/skip/error (default overwrite)--force_model_download- If a named model (e.g."MDV5A") is supplied, force a download of that model even if the local file already exists.--previous_results_filePREVIOUS_RESULTS_FILE- If supplied, this should point to a previous .json results file; any results in that file will be transferred to the output file without reprocessing those images. Useful for"updating"a set of results when you may have added new images to a folder you’ve already processed. Only supported when using relative paths.--detector_optionsKEY=VALUE- Detector-specific options, as a space-separated list of key-value pairs--batch_sizeBATCH_SIZE- Batch size for GPU inference (default 1). CPU inference will ignore this and use batch_size=1.
detection.run_detector module
run_detector.py
Module to run an animal detection model on images. The main function in this script also renders the predicted bounding boxes on images and saves the resulting images (with bounding boxes).
This script is not a good way to process lots of images. It does not produce a useful output format, and it does not facilitate checkpointing the results so if it crashes you would have to start from scratch. If you want to run a detector on lots of images, you should check out run_detector_batch.py.
That said, this script (run_detector.py) is a good way to test our detector on a handful of images and get super-satisfying, graphical results.
If you would like to not use the GPU, set the environment variable CUDA_VISIBLE_DEVICES to “-1”.
This script will only consider detections with > 0.005 confidence at all times. The threshold you provide is only for rendering the results. If you need to see lower-confidence detections, you can change DEFAULT_OUTPUT_CONFIDENCE_THRESHOLD.
- megadetector.detection.run_detector.estimate_md_images_per_second(model_file, device_name=None)[source]
Estimates how fast MegaDetector will run on a particular device, based on benchmarks. Defaults to querying the current device. Returns None if no data is available for the current card/model. Estimates only available for a small handful of GPUs. Uses an absurdly simple lookup approach, e.g. if the string “4090” appears in the device name, congratulations, you have an RTX 4090.
- Parameters:
model_file (str) – model filename, e.g. c:/x/z/md_v5a.0.0.pt
device_name (str, optional) – device name, e.g. blah-blah-4090-blah-blah
- Returns:
the approximate number of images this model version can process on this device per second
- Return type:
float
- megadetector.detection.run_detector.get_detector_metadata_from_version_string(detector_version)[source]
Given a MegaDetector version string (e.g. “v4.1.0”), returns the metadata for the model. Used for writing standard defaults to batch output files.
- Parameters:
detector_version (str) – a detection version string, e.g. “v4.1.0”, which you can extract from a filename using get_detector_version_from_filename()
- Returns:
metadata for this model, suitable for writing to a MD output file
- Return type:
dict
- megadetector.detection.run_detector.get_detector_version_from_filename(detector_filename, accept_first_match=True, verbose=False)[source]
Gets the canonical version number string of a detector from the model filename.
[detector_filename] will almost always end with one of the following:
megadetector_v2.pb
megadetector_v3.pb
megadetector_v4.1 (not produced by run_detector_batch.py, only found in output files from the deprecated Azure Batch API)
md_v4.1.0.pb
md_v5a.0.0.pt
md_v5b.0.0.pt
This function identifies the version number as “v2.0.0”, “v3.0.0”, “v4.1.0”, “v4.1.0”, “v5a.0.0”, and “v5b.0.0”, respectively. See known_models for the list of valid version numbers.
- Parameters:
detector_filename (str) – model filename, e.g. c:/x/z/md_v5a.0.0.pt
accept_first_match (bool, optional) – if multiple candidates match the filename, choose the first one, otherwise returns the string “multiple”
verbose (bool, optional) – enable additional debug output
- Returns:
a detector version string, e.g. “v5a.0.0”, or “multiple” if I’m confused
- Return type:
str
- megadetector.detection.run_detector.get_detector_version_from_model_file(detector_filename, verbose=False)[source]
Gets the canonical detection version from a model file, preferably by reading it from the file itself, otherwise based on the filename.
- Parameters:
detector_filename (str) – model filename, e.g. c:/x/z/md_v5a.0.0.pt
verbose (bool, optional) – enable additional debug output
- Returns:
a canonical detector version string, e.g. “v5a.0.0”, or “unknown”
- Return type:
str
- megadetector.detection.run_detector.get_typical_confidence_threshold_from_results(results)[source]
Given the .json data loaded from a MD results file, returns a typical confidence threshold based on the detector version.
- Parameters:
results (dict or str) – a dict of MD results, as it would be loaded from a MD results .json file, or a .json filename
- Returns:
a sensible default threshold for this model
- Return type:
float
- megadetector.detection.run_detector.is_gpu_available(model_file)[source]
Determines whether a GPU is available, importing PyTorch or TF depending on the extension of model_file. Does not actually load model_file, just uses that to determine how to check for GPU availability (PT vs. TF).
- Parameters:
model_file (str) – model filename, e.g. c:/x/z/md_v5a.0.0.pt
- Returns:
whether a GPU is available
- Return type:
bool
- megadetector.detection.run_detector.load_and_run_detector(model_file, image_file_names, output_dir, render_confidence_threshold=0.2, crop_images=False, box_thickness=4, box_expansion=0, image_size=None, label_font_size=16, augment=False, force_model_download=False, detector_options=None, verbose=False)[source]
Loads and runs a detector on target images, and visualizes the results.
- Parameters:
model_file (str) – model filename, e.g. c:/x/z/md_v5a.0.0.pt, or a known model string, e.g. “MDV5A”
image_file_names (list) – list of absolute paths to process
output_dir (str) – folder to write visualized images to
render_confidence_threshold (float, optional) – only render boxes for detections above this threshold
crop_images (bool, optional) – whether to crop detected objects to individual images (default is to render images with boxes, rather than cropping)
box_thickness (float, optional) – thickness in pixels for box rendering
box_expansion (float, optional) – box expansion in pixels
image_size (tuple, optional) – image size to use for inference, only mess with this if (a) you’re using a model other than MegaDetector or (b) you know what you’re doing
label_font_size (float, optional) – font size to use for displaying class names and confidence values in the rendered images
augment (bool, optional) – enable (implementation-specific) image augmentation
force_model_download (bool, optional) – force downloading the model file if a named model (e.g. “MDV5A”) is supplied, even if the local file already exists
detector_options (dict, optional) – key/value pairs that are interpreted differently by different detectors
verbose (bool, optional) – enable additional debug output
- megadetector.detection.run_detector.load_detector(model_file, force_cpu=False, force_model_download=False, detector_options=None, verbose=False)[source]
Loads a TF or PT detector, depending on the extension of model_file.
- Parameters:
model_file (str) – model filename (e.g. c:/x/z/md_v5a.0.0.pt) or known model name (e.g. “MDV5A”)
force_cpu (bool, optional) – force the model to run on the CPU even if a GPU is available
force_model_download (bool, optional) – force downloading the model file if a named model (e.g. “MDV5A”) is supplied, even if the local file already exists
detector_options (dict, optional) – key/value pairs that are interpreted differently by different detectors
verbose (bool, optional) – enable additional debug output
- Returns:
loaded detector object
- Return type:
object
- megadetector.detection.run_detector.try_download_known_detector(detector_file, force_download=False, verbose=False)[source]
Checks whether detector_file is really the name of a known model, in which case we will either read the actual filename from the corresponding environment variable or download (if necessary) to local temp space. Otherwise just returns the input string.
- Parameters:
detector_file (str) – a known model string (e.g. “MDV5A”), or any other string (in which case this function is a no-op)
force_download (bool, optional) – whether to download the model even if the local target file already exists
verbose (bool, optional) – enable additional debug output
- Returns:
the local filename to which the model was downloaded, or the same string that was passed in, if it’s not recognized as a well-known model name
- Return type:
str
run_detector - CLI interface
Module to run an animal detection model on images
run_detector [-h] (--image_file IMAGE_FILE | --image_dir IMAGE_DIR) [--recursive]
[--output_dir OUTPUT_DIR] [--image_size IMAGE_SIZE] [--threshold THRESHOLD]
[--crop] [--augment] [--box_thickness BOX_THICKNESS]
[--box_expansion BOX_EXPANSION] [--label_font_size LABEL_FONT_SIZE]
[--process_likely_output_images] [--force_model_download] [--verbose]
[--detector_options [KEY=VALUE ...]]
detector_file
run_detector positional arguments
detector_file- Path detector model file (.pb or .pt). Can also be MDV4, MDV5A, or MDV5B to request automatic download.
run_detector options
--image_fileIMAGE_FILE- Single file to process, mutually exclusive with –image_dir--image_dirIMAGE_DIR- Directory to search for images, with optional recursion by adding –recursive--recursive- Recurse into directories, only meaningful if using –image_dir--output_dirOUTPUT_DIR- Directory for output images (defaults to same as input)--image_sizeIMAGE_SIZE- Force image resizing to a (square) integer size (not recommended to change this)--thresholdTHRESHOLD- Confidence threshold between 0 and 1.0; only render boxes above this confidence (defaults to 0.2)--crop- If set, produces separate output images for each crop, rather than adding bounding boxes to the original image--augment- Enable image augmentation--box_thicknessBOX_THICKNESS- Line width (in pixels) for box rendering (defaults to 4)--box_expansionBOX_EXPANSION- Number of pixels to expand boxes by (defaults to 0)--label_font_sizeLABEL_FONT_SIZE- Label font size (defaults to 16)--process_likely_output_images- By default, we skip images that end in _detections, because they probably came from this script. This option disables that behavior.--force_model_download- If a named model (e.g."MDV5A") is supplied, force a download of that model even if the local file already exists.--verbose- Enable additional debug output--detector_optionsKEY=VALUE- Detector-specific options, as a space-separated list of key-value pairs
detection.pytorch_detector module
pytorch_detector.py
Module to run YOLO-based MegaDetector models.
- class megadetector.detection.pytorch_detector.PTDetector(model_path, detector_options=None, verbose=False)[source]
Bases:
objectClass that runs a PyTorch-based MegaDetector model. Also used as a preprocessor for images that will later be run through an instance of PTDetector.
- compatibility_mode
This allows us to maintain backwards compatibility across a set of changes to the way this class does inference. Currently should start with either “default” or “classic”.
- device
0’) or a torch.device()
- Type:
Either a string (‘cpu’,’cuda
- generate_detections_one_batch(img_original, image_id=None, detection_threshold=1e-05, image_size=None, augment=False, verbose=False)[source]
Run a detector on a batch of images.
- Parameters:
img_original (list) – list of images (Image, np.array, or dict) on which we should run the detector, with EXIF rotation already handled, or dicts representing preprocessed images with associated letterbox parameters
image_id (list or None) – list of paths to identify the images; will be in the “file” field of the output objects. Will be ignored when img_original contains preprocessed dicts.
detection_threshold (float, optional) – only detections above this confidence threshold will be included in the return value
image_size (int, optional) – image size (long side) to use for inference, or None to use the default size specified at the time the model was loaded
augment (bool, optional) – enable (implementation-specific) image augmentation
verbose (bool, optional) – enable additional debug output
- Returns:
- a list of dictionaries, each with the following fields:
’file’ (filename, always present)
’max_detection_conf’ (removed from MegaDetector output files by default, but generated here)
’detections’ (a list of detection objects containing keys ‘category’, ‘conf’, and ‘bbox’)
’failure’ (a failure string, or None if everything went fine)
- Return type:
list
- generate_detections_one_image(img_original, image_id='unknown', detection_threshold=1e-05, image_size=None, augment=False, verbose=False)[source]
Run a detector on an image (wrapper around batch function).
- Parameters:
img_original (Image, np.array, or dict) – the image on which we should run the detector, with EXIF rotation already handled, or a dict representing a preprocessed image with associated letterbox parameters
image_id (str, optional) – a path to identify the image; will be in the “file” field of the output object
detection_threshold (float, optional) – only detections above this confidence threshold will be included in the return value
image_size (int, optional) – image size (long side) to use for inference, or None to use the default size specified at the time the model was loaded
augment (bool, optional) – enable (implementation-specific) image augmentation
verbose (bool, optional) – enable additional debug output
- Returns:
- a dictionary with the following fields:
’file’ (filename, always present)
’max_detection_conf’ (removed from MegaDetector output files by default, but generated here)
’detections’ (a list of detection objects containing keys ‘category’, ‘conf’, and ‘bbox’)
’failure’ (a failure string, or None if everything went fine)
- Return type:
dict
- half_precision
Use half-precision inference… fixed by the model, generally don’t mess with this
- letterbox_stride
Stride size passed to the YOLO letterbox() function
- preprocess_image(img_original, image_id='unknown', image_size=None, verbose=False)[source]
Prepare an image for detection, including scaling and letterboxing.
- Parameters:
img_original (Image or np.array) – the image on which we should run the detector, with EXIF rotation already handled
image_id (str, optional) – a path to identify the image; will be in the “file” field of the output object
detection_threshold (float, optional) – only detections above this confidence threshold will be included in the return value
image_size (int, optional) – image size (long side) to use for inference, or None to use the default size specified at the time the model was loaded
verbose (bool, optional) – enable additional debug output
- Returns:
- dict with fields:
file (filename)
img (the preprocessed np.array)
img_original (the input image before preprocessing, as an np.array)
img_original_pil (the input image before preprocessing, as a PIL Image)
target_shape (the 2D shape to which the image was resized during preprocessing)
scaling_shape (the 2D original size, for normalizing coordinates later)
letterbox_ratio (letterbox parameter used for normalizing coordinates later)
letterbox_pad (letterbox parameter used for normalizing coordinates later)
- Return type:
dict
- use_model_native_classes
If this is False, we assume the underlying model is producing class indices in the set (0,1,2) (and we assert() on this), and we add 1 to get to the backwards-compatible MD classes (1,2,3) before generating output. If this is True, we use whatever indices the model provides
- megadetector.detection.pytorch_detector.add_metadata_to_megadetector_model_file(model_file_in, model_file_out, metadata, destination_path='megadetector_info.json')[source]
Adds a .json file to the specified MegaDetector model file containing metadata used by this module. Always over-writes the output file.
- Parameters:
model_file_in (str) – The input model filename, typically .pt (.zip is also sensible)
model_file_out (str) – The output model filename, typically .pt (.zip is also sensible). May be the same as model_file_in.
metadata (dict) – The metadata dict to add to the output model file
destination_path (str, optional) – The relative path within the main folder of the model archive where we should write the metadata. This is not relative to the root of the archive, it’s relative to the one and only folder at the root of the archive (this is a PyTorch convention).
- megadetector.detection.pytorch_detector.nms(prediction, conf_thres=0.25, iou_thres=0.45, max_det=300)[source]
Non-maximum suppression (a wrapper around torchvision.ops.nms())
- Parameters:
prediction (torch.Tensor) – Model predictions with shape [batch_size, num_anchors, num_classes + 5] Format: [x_center, y_center, width, height, objectness, class1_conf, class2_conf, …] Coordinates are normalized to input image size.
conf_thres (float) – Confidence threshold for filtering detections
iou_thres (float) – IoU threshold for NMS
max_det (int) – Maximum number of detections per image
- Returns:
- List of tensors, one per image in batch. Each tensor has shape [N, 6] where:
N is the number of detections for that image
Columns are [x1, y1, x2, y2, confidence, class_id]
Coordinates are in absolute pixels relative to input image size
class_id is the integer class index (0-based)
- Return type:
list
- megadetector.detection.pytorch_detector.read_metadata_from_megadetector_model_file(model_file, relative_path='megadetector_info.json', verbose=False)[source]
Reads custom MegaDetector metadata from a modified MegaDetector model file.
- Parameters:
model_file (str) – The model filename to read, typically .pt (.zip is also sensible)
relative_path (str, optional) – The relative path within the main folder of the model archive from which we should read the metadata. This is not relative to the root of the archive, it’s relative to the one and only folder at the root of the archive (this is a PyTorch convention).
verbose (str, optional) – enable additional debug output
- Returns:
whatever we read from the metadata file, always a dict in practice. Returns None if we failed to read the specified metadata file.
- Return type:
object
detection.tf_detector module
tf_detector.py
Module containing the class TFDetector, for loading and running a TensorFlow detection model.
- class megadetector.detection.tf_detector.TFDetector(model_path, detector_options=None)[source]
Bases:
objectA detector model loaded at the time of initialization. It is intended to be used with TensorFlow-based versions of MegaDetector (v2, v3, or v4). If someone can find v1, I suppose you could use this class for v1 also.
- generate_detections_one_image(image, image_id, detection_threshold, image_size=None, augment=False, verbose=False)[source]
Runs the detector on an image.
- Parameters:
image (Image) – the PIL Image object (or numpy array) on which we should run the detector, with EXIF rotation already handled.
image_id (str) – a path to identify the image; will be in the “file” field of the output object
detection_threshold (float) – only detections above this threshold will be included in the return value
image_size (tuple, optional) – image size to use for inference, only mess with this if (a) you’re using a model other than MegaDetector or (b) you know what you’re doing
augment (bool, optional) – enable image augmentation. Not currently supported, but included here for compatibility with PTDetector.
verbose (bool, optional) – enable additional debug output
- Returns:
- a dictionary with the following fields:
’file’ (filename, always present)
’max_detection_conf’ (removed from MegaDetector output files by default, but generated here)
’detections’ (a list of detection objects containing keys ‘category’, ‘conf’, and ‘bbox’)
’failure’ (a failure string, or None if everything went fine)
- Return type:
dict
detection.process_video module
process_video.py
Splits a video (or folder of videos) into frames, runs the frames through run_detector_batch.py, and optionally stitches together results into a new video with detection boxes.
When possible, video processing happens in memory, without writing intermediate frames to disk. If the caller requests that frames be saved, frames are written before processing, and the MD results correspond to the frames that were written to disk (which simplifies, for example, repeat detection elimination).
- class megadetector.detection.process_video.ProcessVideoOptions[source]
Bases:
objectOptions controlling the behavior of process_video()
- augment
Enable image augmentation
- checkpoint_frequency
Write a checkpoint file (to resume processing later) every N videos; set to -1 (default) to disable checkpointing
- checkpoint_path
Path to checkpoint file; None (default) for auto-generation based on output filename
- detector_options
Detector-specific options
- exit_on_empty_video
By default, a video with no frames (or no frames retrievable with the current parameters) is silently stored as a failure; this causes it to halt execution.
- frame_sample
Sample every Nth frame; set to None (default) or 1 to sample every frame. Typically we sample down to around 3 fps, so for typical 30 fps videos, frame_sample=10 is a typical value. Mutually exclusive with [time_sample].
- image_size
Run the model at this image size (don’t mess with this unless you know what you’re getting into)… if you just want to pass smaller frames to MD, use max_width
- input_video_file
Video (of folder of videos) to process
- json_confidence_threshold
Detections below this threshold will not be included in the output file.
- model_file
Can be a model filename (.pt or .pb) or a model name (e.g. “MDV5A”)
Use the string “no_detection” to indicate that you only want to extract frames, not run a model. If you do this, you almost definitely want to set keep_extracted_frames to “True”, otherwise everything in this module is a no-op. I.e., there’s no reason to extract frames, do nothing with them, then delete them.
- output_json_file
.json file to which we should write results
- recursive
If [input_video_file] is a folder, should we search for videos recursively?
- resume_from_checkpoint
Resume from a checkpoint file, or “auto” to use the most recent checkpoint in the output directory
- time_sample
Sample frames every N seconds. Mutually exclusive with [frame_sample]
- verbose
Enable additional debug console output
- megadetector.detection.process_video.process_videos(options)[source]
Process a video or folder of videos through MD.
- Parameters:
options (ProcessVideoOptions) – all the parameters used to control this process, including filenames; see ProcessVideoOptions for details
process_video - CLI interface
Run MegaDetector on each frame (or every Nth frame) in a video (or folder of videos), optionally producing a new video with detections annotated
process_video [-h] [--recursive] [--output_json_file OUTPUT_JSON_FILE]
[--json_confidence_threshold JSON_CONFIDENCE_THRESHOLD]
[--frame_sample FRAME_SAMPLE] [--time_sample TIME_SAMPLE] [--verbose]
[--image_size IMAGE_SIZE] [--augment] [--exit_on_empty_video]
[--detector_options [KEY=VALUE ...]]
[--checkpoint_frequency CHECKPOINT_FREQUENCY]
[--checkpoint_path CHECKPOINT_PATH]
[--resume_from_checkpoint RESUME_FROM_CHECKPOINT]
model_file input_video_file
process_video positional arguments
model_file- MegaDetector model file (.pt or .pb) or model name (e.g."MDV5A"), or the string"no_detection"to run just frame extractioninput_video_file- video file (or folder) to process
process_video options
--recursive- recurse into [input_video_file]; only meaningful if a folder is specified as input--output_json_fileOUTPUT_JSON_FILE- .json output file, defaults to [video file].json--json_confidence_thresholdJSON_CONFIDENCE_THRESHOLD- don’t include boxes in the .json file with confidence below this threshold (default 0.005)--frame_sampleFRAME_SAMPLE- process every Nth frame (defaults to every frame), mutually exclusive with –time_sample.--time_sampleTIME_SAMPLE- process frames every N seconds; this is converted to a frame sampling rate, so it may not be exactly the requested interval in seconds. mutually exclusive with –frame_sample--verbose- Enable additional debug output--image_sizeIMAGE_SIZE- Force image resizing to a specific integer size on the long axis (not recommended to change this)--augment- Enable image augmentation--exit_on_empty_video- By default, videos with no retrievable frames are stored as failures; thiscauses them to halt execution--detector_optionsKEY=VALUE- Detector-specific options, as a space-separated list of key-value pairs--checkpoint_frequencyCHECKPOINT_FREQUENCY- Write a checkpoint file (to resume processing later) every N videos; set to -1 to disable checkpointing (default -1)--checkpoint_pathCHECKPOINT_PATH- Path to checkpoint file; defaults to a file in the same directory as the output file--resume_from_checkpointRESUME_FROM_CHECKPOINT- Resume from a specific checkpoint file, or"auto"to resume from the most recent checkpoint in the output directory
detection.run_inference_with_yolov5_val module
run_inference_with_yolov5_val.py
Runs a folder of images through MegaDetector (or another YOLOv5/YOLOv8 model) with YOLO’s val.py, converting the output to the standard MD format. The reasons this script exists, as an alternative to the standard run_detector_batch.py are:
This script provides access to YOLO’s test-time augmentation tools.
This script serves a reference implementation: by any reasonable definition, YOLOv5’s val.py produces the “correct” result for any image, since it matches what was used in training.
This script works for any Ultralytics detection model, including YOLOv8 models
YOLOv5’s val.py uses each file’s base name as a unique identifier, which doesn’t work when you have typical camera trap images like:
a/b/c/RECONYX0001.JPG
d/e/f/RECONYX0001.JPG
…both of which would just be “RECONYX0001.JPG”. So this script jumps through a bunch of hoops to put a symlinks in a flat folder, run YOLOv5 on that folder, and map the results back to the real files.
If you are running a YOLOv5 model, this script currently requires the caller to supply the path where a working YOLOv5 install lives, and assumes that the current conda environment is all set up for YOLOv5. If you are running a YOLOv8 model, the folder doesn’t matter, but it assumes that ultralytics tools are available in the current environment.
By default, this script uses symlinks to format the input images in a way that YOLO’s val.py likes, as per above. This requires admin privileges on Windows… actually technically this only requires permissions to create symbolic links, but I’ve never seen a case where someone has that permission and doesn’t have admin privileges. If you are running this script on Windows and you don’t have admin privileges, use –no_use_symlinks, which will make copies of images, rather than using symlinks.
- class megadetector.detection.run_inference_with_yolov5_val.YoloInferenceOptions[source]
Bases:
objectParameters that control the behavior of run_inference_with_yolov5_val(), including the input/output filenames.
- append_job_id_to_symlink_folder
By default, if we’re creating symlinks to images, we append a unique job ID to the symlink folder. If the caller is 100% sure that the symlink folder can be re-used across calls, this can be set to False.
- augment
Should we enable test-time augmentation?
- batch_size
Batch size… has no impact on results, but may create memory issues if you set this to large values
- checkpoint_frequency
Maximum number of images to run in a single chunk
- conf_thres
Detections below this threshold will not be included in the output file
- device_string
typically ‘0’ for GPU 0, ‘1’ for GPU 1, etc., or ‘cpu’
- Type:
Device string
- half_precision_enabled
Should we enable half-precision inference?
- image_filename_list
If this is None, [input_folder] can’t be None, we’ll process all images in [input_folder].
If this is not None, and [input_folder] is not None, this should be a list of relative image paths within [input_folder] to process, or a .txt or .json file containing a list of relative image paths.
If this is not None, and [input_folder] is None, this should be a list of absolute image paths, or a .txt or .json file containing a list of absolute image paths.
- image_size
Image size to use; this is a single int, which in ultralytics’s terminology means “scale the long side of the image to this size, and preserve aspect ratio”.
If None, will choose based on whether augmentation is enabled.
- input_folder
Folder of images to process (can be None if image_filename_list contains absolute paths)
- model_filename
Model filename (ending in .pt), or a well-known model name (e.g. “MDV5A”)
- model_type
Currently ‘yolov5’ and ‘ultralytics’ are supported, and really these are proxies for “the yolov5 repo” and “the ultralytics repo”.
- offset_yolo_category_ids
By default, we turn category ID 0 coming out of the YOLO .json file into category 1 in the MD-formatted .json file.
- output_file
.json output file, in MD results format
- overwrite_handling
What should we do if the output file already exists?
Can be ‘error’, ‘skip’, or ‘overwrite’.
- preview_yolo_command_only
If True, we’ll do a dry run that lets you preview the YOLO val command, without actually running it.
- recursive
Whether to search for images recursively within [input_folder]
Ignored if a list of files is provided.
- remove_symlink_folder
Should we remove the symlink folder when we’re done?
- remove_yolo_results_folder
Should we remove the intermediate results folder when we’re done?
- save_yolo_debug_output
Save YOLO console output
- symlink_folder
If this is None, we’ll create a folder in system temp space.
- treat_copy_failures_as_warnings
By default, if any errors occur while we’re copying images or creating symlinks, it’s game over. If this is True, those errors become warnings, and we plow ahead.
- unique_id_strategy
How should we guarantee that YOLO IDs (base filenames) are unique? Choices are:
‘verify’: assume image IDs are unique, but verify and error if they’re not
‘links’: create symlinks (or copies, depending on use_symlinks) to enforce uniqueness
‘auto’: check whether IDs are unique, create links if necessary
- use_symlinks
Should we use symlinks to give unique identifiers to image files (vs. copies)?
- yolo_category_id_to_name
These are deliberately offset from the standard MD categories; YOLOv5 needs categories IDs to start at 0.
This can also be a string that points to any class mapping file supported by read_classes_from_yolo_dataset_file(): a YOLO dataset.yaml file, a text file with a list of classes, or a .json file with an ID –> name dict
- yolo_results_folder
Temporary folder to stash intermediate YOLO results.
If this is None, we’ll create a folder in system temp space.
- yolo_working_folder
Required for older YOLOv5 inference, not for newer ulytralytics/YOLOv8 inference
- megadetector.detection.run_inference_with_yolov5_val.get_stats_for_category(filename, category='all')[source]
Retrieve statistics for a category from the YOLO console output stored in [filenam].
- Parameters:
filename (str) – a text file containing console output from a YOLO val run
category (str, optional) – a category name
- Returns:
a dict with fields n_images, n_labels, P, R, mAP50, and mAP50-95
- Return type:
dict
- megadetector.detection.run_inference_with_yolov5_val.run_inference_with_yolo_val(options)[source]
Runs a folder of images through MegaDetector (or another YOLOv5/YOLOv8 model) with YOLO’s val.py, converting the output to the standard MD format.
- Parameters:
options (YoloInferenceOptions) – all the parameters used to control this process, including filenames; see YoloInferenceOptions for details
run_inference_with_yolov5_val - CLI interface
run_inference_with_yolov5_val [-h] [--image_filename_list IMAGE_FILENAME_LIST]
[--yolo_working_folder YOLO_WORKING_FOLDER]
[--image_size IMAGE_SIZE] [--conf_thres CONF_THRES]
[--batch_size BATCH_SIZE]
[--half_precision_enabled HALF_PRECISION_ENABLED]
[--device_string DEVICE_STRING]
[--overwrite_handling OVERWRITE_HANDLING]
[--yolo_dataset_file YOLO_DATASET_FILE]
[--model_type MODEL_TYPE]
[--unique_id_strategy UNIQUE_ID_STRATEGY]
[--symlink_folder SYMLINK_FOLDER]
[--yolo_results_folder YOLO_RESULTS_FOLDER] [--no_use_symlinks]
[--no_remove_symlink_folder] [--no_remove_yolo_results_folder]
[--save_yolo_debug_output]
[--checkpoint_frequency CHECKPOINT_FREQUENCY]
[--no_append_job_id_to_symlink_folder] [--nonrecursive]
[--no_offset_class_ids] [--preview_yolo_command_only]
[--augment_enabled AUGMENT_ENABLED]
model_filename input_folder output_file
run_inference_with_yolov5_val positional arguments
model_filename- model file nameinput_folder- folder on which to recursively run the model, or a .json or .txt file containing a list of absolute image pathsoutput_file- .json file where output will be written
run_inference_with_yolov5_val options
--image_filename_listIMAGE_FILENAME_LIST- .json or .txt file containing a list of relative image filenames within [input_folder]--yolo_working_folderYOLO_WORKING_FOLDER- folder in which to execute val.py (not necessary for YOLOv8 inference)--image_sizeIMAGE_SIZE- image size for model execution (default 1664 when augmentation is enabled, else 1280)--conf_thresCONF_THRES- confidence threshold for including detections in the output file (default 0.001)--batch_sizeBATCH_SIZE- inference batch size (default 1)--half_precision_enabledHALF_PRECISION_ENABLED- use half-precision-inference (1 or 0) (default is the underlying model’s default, probably full for YOLOv8 and half for YOLOv5--device_stringDEVICE_STRING- CUDA device specifier, typically"0"or"1"for CUDA devices,"mps"for M1/M2 devices, or"cpu"(default 0)--overwrite_handlingOVERWRITE_HANDLING- action to take if the output file exists (skip, error, overwrite) (default skip)--yolo_dataset_fileYOLO_DATASET_FILE- YOLOv5 dataset.yaml file from which we should load category information (otherwise defaults to MD categories)--model_typeMODEL_TYPE- model type ("yolov5"or"ultralytics"("yolov8"behaves the same as"ultralytics")) (default yolov5)--unique_id_strategyUNIQUE_ID_STRATEGY- how should we ensure that unique filenames are passed to the YOLO val script, can be"verify","auto", or"links", see options class docs for details (default links)--symlink_folderSYMLINK_FOLDER- temporary folder for symlinks (defaults to a folder in the system temp dir)--yolo_results_folderYOLO_RESULTS_FOLDER- temporary folder for YOLO intermediate output (defaults to a folder in the system temp dir)--no_use_symlinks- copy files instead of creating symlinks when preparing the yolo input folder--no_remove_symlink_folder- don’t remove the temporary folder full of symlinks--no_remove_yolo_results_folder- don’t remove the temporary folder full of YOLO intermediate files--save_yolo_debug_output- write yolo console output to a text file in the results folder, along with additional debug files--checkpoint_frequencyCHECKPOINT_FREQUENCY- break the job into chunks with no more than this many images (default None)--no_append_job_id_to_symlink_folder- don’t append a unique job ID to the symlink folder name--nonrecursive- disable recursive folder processing--no_offset_class_ids- disable class ID offsetting--preview_yolo_command_only- don’t run inference, just preview the YOLO inference command (still creates symlinks)--augment_enabledAUGMENT_ENABLED- enable/disable augmentation (default 0)
detection.run_tiled_inference module
run_tiled_inference.py
This script is experimental, YMMV.
Runs inference on a folder, fist splitting each image up into tiles of size MxN (typically the native inference size of your detector), writing those tiles out to a temporary folder, then de-duplicating the resulting detections before merging them back into a set of detections that make sense on the original images.
This approach will likely fail to detect very large animals, so if you expect both large and small animals (in terms of pixel size), this script is best used in conjunction with a traditional inference pass that looks at whole images.
Currently requires temporary storage at least as large as the input data, generally a lot more than that (depending on the overlap between adjacent tiles). This is inefficient, but easy to debug.
Programmatic invocation supports using YOLOv5’s inference scripts (and test-time augmentation); the command-line interface only supports standard inference right now.
- megadetector.detection.run_tiled_inference.extract_patch_from_image(im, patch_xy, patch_size, patch_image_fn=None, patch_folder=None, image_name=None, overwrite=True)[source]
Extracts a patch from the provided image, and writes that patch out to a new file.
- Parameters:
im (str or Image) – image from which we should extract a patch, can be a filename or a PIL Image object.
patch_xy (tuple) – length-2 tuple of ints (x,y) representing the upper-left corner of the patch to extract
patch_size (tuple) – length-2 tuple of ints (w,h) representing the size of the patch to extract
patch_image_fn (str, optional) – image filename to write the patch to; if this is None the filename will be generated from [image_name] and the patch coordinates
patch_folder (str, optional) – folder in which the image lives; only used to generate a patch filename, so only required if [patch_image_fn] is None
image_name (str, optional) – the identifier of the source image; only used to generate a patch filename, so only required if [patch_image_fn] is None
overwrite (bool, optional) – whether to overwrite an existing patch image
- Returns:
a dictionary with fields xmin,xmax,ymin,ymax,patch_fn
- Return type:
dict
- megadetector.detection.run_tiled_inference.get_patch_boundaries(image_size, patch_size, patch_stride=None)[source]
Computes a list of patch starting coordinates (x,y) given an image size (w,h) and a stride (x,y)
Patch size is guaranteed, but the stride may deviate to make sure all pixels are covered. I.e., we move by regular strides until the current patch walks off the right/bottom, at which point it backs up to one patch from the end. So if your image is 15 pixels wide and you have a stride of 10 pixels, you will get starting positions of 0 (from 0 to 9) and 5 (from 5 to 14).
- Parameters:
image_size (tuple) – size of the image you want to divide into patches, as a length-2 tuple (w,h)
patch_size (tuple) – patch size into which you want to divide an image, as a length-2 tuple (w,h)
patch_stride (tuple or float, optional) – stride between patches, as a length-2 tuple (x,y), or a float; if this is a float, it’s interpreted as the stride relative to the patch size (0.1 == 10% stride). Defaults to half the patch size.
- Returns:
list of length-2 tuples, each representing the x/y start position of a patch
- Return type:
list
- megadetector.detection.run_tiled_inference.in_place_nms(md_results, iou_thres=0.45, verbose=True)[source]
Run torch.ops.nms in-place on MD-formatted detection results.
- Parameters:
md_results (dict) – detection results for a list of images, in MD results format (i.e., containing a list of image dicts with the key ‘images’, each of which has a list of detections with the key ‘detections’)
iou_thres (float, optional) – IoU threshold above which we will treat two detections as redundant
verbose (bool, optional) – enable additional debug console output
- megadetector.detection.run_tiled_inference.patch_info_to_patch_name(image_name, patch_x_min, patch_y_min)[source]
Gives a unique string name to an x/y coordinate, e.g. turns (“a.jpg”,10,20) into “a.jpg_0010_0020”.
- Parameters:
image_name (str) – image identifier
patch_x_min (int) – x coordinate
patch_y_min (int) – y coordinate
- Returns:
name for this patch, e.g. “a.jpg_0010_0020”
- Return type:
str
- megadetector.detection.run_tiled_inference.run_tiled_inference(model_file, image_folder, tiling_folder, output_file, tile_size_x=1280, tile_size_y=1280, tile_overlap=0.5, checkpoint_path=None, checkpoint_frequency=-1, remove_tiles=False, yolo_inference_options=None, n_patch_extraction_workers=1, overwrite_tiles=True, image_list=None, augment=False, detector_options=None, use_image_queue=True, preprocess_on_image_queue=True, loader_workers=4, inference_size=None, verbose=False, pool_type=None, load_cached_tiles_if_available=False, create_tiles_only=False)[source]
Runs inference using [model_file] on the images in [image_folder], fist splitting each image up into tiles of size [tile_size_x] x [tile_size_y], writing those tiles to [tiling_folder], then de-duplicating the results before merging them back into a set of detections that make sense on the original images and writing those results to [output_file].
[tiling_folder] can be any folder, but this function reserves the right to do whatever it wants within that folder, including deleting everything, so it’s best if it’s a new folder. Conceptually this folder is temporary, it’s just helpful in this case to not actually use the system temp folder, because the tile cache may be very large, so the caller may want it to be on a specific drive. If this is None, a new folder will be created in system temp space.
tile_overlap is the fraction of overlap between tiles.
Optionally removes the temporary tiles.
if yolo_inference_options is supplied, it should be an instance of YoloInferenceOptions; in this case the model will be run with run_inference_with_yolov5_val. The following members in the YoloInference options object will be over-written by the corresponding parameters to this function: input_folder, model_filename, output_file.
- Parameters:
model_file (str) – model filename (ending in .pt), or a well-known model name (e.g. “MDV5A”)
image_folder (str) – the folder of images to proess (always recursive)
tiling_folder (str) – folder for temporary tile storage; see caveats above. Can be None to use system temp space.
output_file (str) – .json file to which we should write MD-formatted results
tile_size_x (int, optional) – tile width
tile_size_y (int, optional) – tile height
tile_overlap (float, optional) – overlap between adjacent tiles, as a fraction of the tile size
checkpoint_path (str, optional) – checkpoint path; passed directly to run_detector_batch; see run_detector_batch for details
checkpoint_frequency (int, optional) – checkpoint frequency; passed directly to run_detector_batch; see run_detector_batch for details
remove_tiles (bool, optional) – whether to delete the tiles when we’re done
yolo_inference_options (YoloInferenceOptions, optional) – if not None, will run inference with run_inference_with_yolov5_val.py, rather than with run_detector_batch.py, using these options
n_patch_extraction_workers (int, optional) – number of workers to use for patch extraction; set to <= 1 to disable parallelization
overwrite_tiles (bool, optional) – whether to overwrite image files for individual tiles if they exist
image_list (list, optional) – .json file containing a list of specific images to process. If this is supplied, and the paths are absolute, [image_folder] will be ignored. If this is supplied, and the paths are relative, they should be relative to [image_folder]
augment (bool, optional) – apply test-time augmentation
detector_options (dict, optional) – parameters to pass to run_detector, only relevant if yolo_inference_options is None
use_image_queue (bool, optional) – whether to use a loader worker queue, only relevant if yolo_inference_options is None
preprocess_on_image_queue (bool, optional) – whether the image queue should also be responsible for preprocessing
loader_workers (int, optional) – number of preprocessing loader workers to use
inference_size (int, optional) – override the default inference image size, only relevant if yolo_inference_options is None
verbose (bool, optional) – enable additional debug output
pool_type (str, optional) – ‘thread’ or ‘process’, or None to use the default (threads)
load_cached_tiles_if_available (bool, optional) – if we find tile information in the tiling folder from a previous call to this function, load tile information rather than re-tiling.
create_tiles_only (bool, optional) – return after creating tiles, before running inference
- Returns:
MD-formatted results dictionary, identical to what’s written to [output_file]
- Return type:
dict
run_tiled_inference - CLI interface
Chop a folder of images up into tiles, run MD on the tiles, and stitch the results together
run_tiled_inference [-h] [--no_remove_tiles] [--augment] [--verbose]
[--tile_size_x TILE_SIZE_X] [--tile_size_y TILE_SIZE_Y]
[--tile_overlap TILE_OVERLAP] [--overwrite_handling OVERWRITE_HANDLING]
[--image_list IMAGE_LIST] [--detector_options DETECTOR_OPTIONS]
[--inference_size INFERENCE_SIZE]
[--n_patch_extraction_workers N_PATCH_EXTRACTION_WORKERS]
[--loader_workers LOADER_WORKERS]
model_file image_folder tiling_folder output_file
run_tiled_inference positional arguments
model_file- Path to detector model file (.pb or .pt)image_folder- Folder containing images for inference (always recursive, unless image_list is supplied)tiling_folder- Temporary folder where tiles and intermediate results will be storedoutput_file- Path to output JSON results file, should end with a .json extension
run_tiled_inference options
--no_remove_tiles- Tiles are removed by default; this option suppresses tile deletion--augment- Enable test-time augmentation--verbose- Enable additional debug output--tile_size_xTILE_SIZE_X- Tile width (defaults to 1280)--tile_size_yTILE_SIZE_Y- Tile height (defaults to 1280)--tile_overlapTILE_OVERLAP- Overlap between tiles [0,1] (defaults to 0.5)--overwrite_handlingOVERWRITE_HANDLING- Behavior when the target file exists (skip/overwrite/error) (default skip)--image_listIMAGE_LIST- A .json list of relative filenames (or absolute paths contained within image_folder) to include--detector_optionsDETECTOR_OPTIONS- A list of detector options (key-value pairs)--inference_sizeINFERENCE_SIZE- Run inference at a non-default size--n_patch_extraction_workersN_PATCH_EXTRACTION_WORKERS- Number of workers to use for patch extraction--loader_workersLOADER_WORKERS- Number of workers to use for image loading and preprocessing (0 to disable)
detection.run_md_and_speciesnet module
run_md_and_speciesnet.py
Script to run MegaDetector and SpeciesNet on a folder of images and/or videos. Runs MD first, then runs SpeciesNet on every above-threshold crop.
- class megadetector.detection.run_md_and_speciesnet.CropBatch[source]
Bases:
objectA batch of crops with their metadata for classification.
- add_crop(crop_data, metadata)[source]
- Parameters:
crop_data (PreprocessedImage) – preprocessed image data from SpeciesNetClassifier.preprocess()
metadata (CropMetadata) – metadata for this crop
- crops
List of preprocessed images
- metadata
List of CropMetadata objects
- class megadetector.detection.run_md_and_speciesnet.CropMetadata(image_file: str, detection_index: int, bbox: list[float], original_width: int, original_height: int)[source]
Bases:
objectMetadata for a crop extracted from an image detection.
- class megadetector.detection.run_md_and_speciesnet.RunMDSpeciesNetOptions[source]
Bases:
objectClass controlling the behavior of run_md_and_speciesnet()
- admin1_region
Admin1 region/state code for geofencing
- classification_model
google/speciesnet/pyTorch/v4.0.2a)
- Type:
SpeciesNet classifier model identifier (e.g. kaggle
- classifier_batch_size
Batch size for SpeciesNet classification
- country
Country code (ISO 3166-1 alpha-3) for geofencing (default None, no geoferencing)
- detection_confidence_threshold_for_classification
Classify detections above this threshold
- detection_confidence_threshold_for_output
Include detections above this threshold in the output
- detections_file
Path to existing MegaDetector output file (skips detection step)
- detector_batch_size
Batch size for MegaDetector inference
- detector_model
MegaDetector model identifier (MDv5a, MDv5b, MDv1000-redwood, etc.)
- frame_sample
Sample every Nth frame from videos
Mutually exclusive with time_sample
- include_raw_classifications
Include raw (pre-rollup/geofence) classification scores in output
- intermediate_file_folder
system temp)
- Type:
Folder for intermediate files (default
- keep_intermediate_files
Keep intermediate files (e.g. detection-only results file)
- loader_workers
Number of worker threads for preprocessing
- norollup
Disable taxonomic rollup
- output_file
Output file for results (JSON format)
- overwrite_handling
What to do if the output file exists (‘overwrite’, ‘error’, ‘skip’)
- rollup_target_confidence
Target confidence threshold for taxonomic rollup
- skip_images
Ignore images, only process videos
- skip_video
Ignore videos, only process images
- source
Folder containing images and/or videos to process
- time_sample
Sample frames every N seconds from videos
Mutually exclusive with frame_sample
- verbose
Enable additional debug output
- worker_type
Worker type for parallelization; should be “thread” or “process”
- megadetector.detection.run_md_and_speciesnet.run_md_and_speciesnet(options)[source]
Main entry point, runs MegaDetector and SpeciesNet on a folder. See RunMDSpeciesNetOptions for available arguments.
- Parameters:
options (RunMDSpeciesNetOptions) – options controlling MD and SN inference
run_md_and_speciesnet - CLI interface
Run MegaDetector and SpeciesNet on a folder of images/videos
run_md_and_speciesnet [-h] [--detector_model DETECTOR_MODEL]
[--classification_model CLASSIFICATION_MODEL]
[--detector_batch_size DETECTOR_BATCH_SIZE]
[--classifier_batch_size CLASSIFIER_BATCH_SIZE]
[--loader_workers LOADER_WORKERS]
[--detection_confidence_threshold_for_classification DETECTION_CONFIDENCE_THRESHOLD_FOR_CLASSIFICATION]
[--detection_confidence_threshold_for_output DETECTION_CONFIDENCE_THRESHOLD_FOR_OUTPUT]
[--intermediate_file_folder INTERMEDIATE_FILE_FOLDER]
[--keep_intermediate_files] [--norollup]
[--rollup_target_confidence ROLLUP_TARGET_CONFIDENCE]
[--country COUNTRY] [--admin1_region ADMIN1_REGION]
[--detections_file DETECTIONS_FILE] [--skip_video] [--skip_images]
[--frame_sample FRAME_SAMPLE] [--time_sample TIME_SAMPLE] [--verbose]
[--include_raw_classifications]
source output_file
run_md_and_speciesnet positional arguments
source- Folder containing images and/or videos to processoutput_file- Output file for results (JSON format)
run_md_and_speciesnet options
--detector_modelDETECTOR_MODEL- MegaDetector model identifier--classification_modelCLASSIFICATION_MODEL- SpeciesNet classifier model identifier--detector_batch_sizeDETECTOR_BATCH_SIZE- Batch size for MegaDetector inference--classifier_batch_sizeCLASSIFIER_BATCH_SIZE- Batch size for SpeciesNet classification--loader_workersLOADER_WORKERS- Number of worker threads for preprocessing--detection_confidence_threshold_for_classificationDETECTION_CONFIDENCE_THRESHOLD_FOR_CLASSIFICATION- Classify detections above this threshold--detection_confidence_threshold_for_outputDETECTION_CONFIDENCE_THRESHOLD_FOR_OUTPUT- Include detections above this threshold in the output--intermediate_file_folderINTERMEDIATE_FILE_FOLDER- Folder for intermediate files (default: system temp)--keep_intermediate_files- Keep intermediate files (e.g. detection-only results file)--norollup- Disable taxonomic rollup--rollup_target_confidenceROLLUP_TARGET_CONFIDENCE- Target confidence threshold for taxonomic rollup (default 0.65), only used when geofencing is disabled--countryCOUNTRY- Country code (ISO 3166-1 alpha-3) for geofencing--admin1_regionADMIN1_REGION,--stateADMIN1_REGION- Admin1 region/state code for geofencing--detections_fileDETECTIONS_FILE- Path to existing MegaDetector output file (skips detection step)--skip_video- Ignore videos, only process images--skip_images- Ignore images, only process videos--frame_sampleFRAME_SAMPLE- Sample every Nth frame from videos (mutually exclusive with –time_sample)--time_sampleTIME_SAMPLE- Sample frames every N seconds from videos (default 1.0) (mutually exclusive with –frame_sample)--verbose- Enable additional debug output--include_raw_classifications- Include raw (pre-rollup/geofence) classification scores in output
detection.video_utils module
video_utils.py
Utilities for splitting, rendering, and assembling videos.
- class megadetector.detection.video_utils.FrameToVideoOptions[source]
Bases:
objectOptions controlling the conversion of frame-level results to video-level results via frame_results_to_video_results()
- frame_rates_are_required
Are frame rates required?
- include_all_processed_frames
Should we include just a single representative frame result for each video (default), or every frame that was processed?
- non_video_behavior
What to do if a file referred to in a .json results file appears not to be a video; can be ‘error’ or ‘skip_with_warning’
- nth_highest_confidence
One-indexed indicator of which frame-level confidence value to use to determine detection confidence for the whole video, i.e. “1” means “use the confidence value from the highest-confidence frame”
- verbose
Enable additional debug output
- megadetector.detection.video_utils.find_video_strings(strings)[source]
Given a list of strings that are potentially video file names, looks for strings that actually look like video file names (based on extension).
- Parameters:
strings (list) – list of strings to check for video-ness
- Returns:
a subset of [strings] that looks like they are video filenames
- Return type:
list
- megadetector.detection.video_utils.find_videos(dirname, recursive=False, convert_slashes=True, return_relative_paths=False)[source]
Finds all files in a directory that look like video file names.
- Parameters:
dirname (str) – folder to search for video files
recursive (bool, optional) – whether to search [dirname] recursively
convert_slashes (bool, optional) – forces forward slashes in the returned files, otherwise uses the native path separator
return_relative_paths (bool, optional) – forces the returned filenames to be relative to [dirname], otherwise returns absolute paths
- Returns:
A list of filenames within [dirname] that appear to be videos
- megadetector.detection.video_utils.frame_results_to_video_results(input_file, output_file, options=None, video_filename_to_frame_rate=None)[source]
Given an MD results file produced at the frame level, corresponding to a directory created with video_folder_to_frames, maps those frame-level results back to the video level for use in Timelapse.
Preserves everything in the input .json file other than the images.
- Parameters:
input_file (str) – the frame-level MD results file to convert to video-level results
output_file (str) – the .json file to which we should write video-level results
options (FrameToVideoOptions, optional) – parameters for converting frame-level results to video-level results, see FrameToVideoOptions for details
video_filename_to_frame_rate (dict, optional) – maps (relative) video path names to frame rates, used only to populate the output file
- megadetector.detection.video_utils.frames_to_video(images, fs, output_file_name, codec_spec='h264')[source]
Given a list of image files and a sample rate, concatenates those images into a video and writes to a new video file.
- Parameters:
images (list) – a list of frame file names to concatenate into a video
fs (float) – the frame rate in fps
output_file_name (str) – the output video file, no checking is performed to make sure the extension is compatible with the codec
codec_spec (str, optional) – codec to use for encoding; h264 is a sensible default and generally works on Windows, but when this fails (which is around 50% of the time on Linux), mp4v is a good second choice
- megadetector.detection.video_utils.get_video_fs(input_video_file, verbose=False)[source]
Retrieves the frame rate of [input_video_file].
- Parameters:
input_video_file (str) – video file for which we want the frame rate
verbose (bool, optional) – enable additional debug output
- Returns:
- the frame rate of [input_video_file], or None if no frame
rate could be extracted
- Return type:
float
- megadetector.detection.video_utils.is_video_file(s, video_extensions=('.mp4', '.avi', '.mpeg', '.mpg', '.mov', '.mkv', '.flv'))[source]
Checks a file’s extension against a set of known video file extensions to determine whether it’s a video file. Performs a case-insensitive comparison.
- Parameters:
s (str) – filename to check for probable video-ness
video_extensions (list, optional) – list of video file extensions
- Returns:
True if this looks like a video file, else False
- Return type:
bool
- megadetector.detection.video_utils.open_video(video_path, verbose=False)[source]
Open the video at [video_path], trying multiple OpenCV backends if necessary.
- Parameters:
video_path (str) – the file to open
verbose (bool, optional) – enable additional debug output
- Returns:
a tuple containing (a) the open video capture device (or None if no backends succeeded) and (b) the first frame of the video (or None)
- Return type:
(cv2.VideoCapture,image)
- megadetector.detection.video_utils.run_callback_on_frames(input_video_file, frame_callback, every_n_frames=None, verbose=False, frames_to_process=None, allow_empty_videos=False)[source]
Calls the function frame_callback(np.array,image_id) on all (or selected) frames in [input_video_file].
- Parameters:
input_video_file (str) – video file to process
frame_callback (function) – callback to run on frames, should take an np.array and a string and return a single value. callback should expect two arguments: (1) a numpy array with image data, in the typical PIL image orientation/channel order, and (2) a string identifier for the frame, typically something like “frame0006.jpg” (even though it’s not a JPEG image, this is just an identifier for the frame).
every_n_frames (int or float, optional) – sample every Nth frame starting from the first frame; if this is None or 1, every frame is processed. If this is a negative value, it’s interpreted as a sampling rate in seconds, which is rounded to the nearest frame sampling rate. Mutually exclusive with frames_to_process.
verbose (bool, optional) – enable additional debug console output
frames_to_process (list of int, optional) – process this specific set of frames; mutually exclusive with every_n_frames. If all values are beyond the length of the video, no frames are extracted. Can also be a single int, specifying a single frame number.
allow_empty_videos (bool, optional) – Just print a warning if a video appears to have no frames (by default, this raises an Exception).
- Returns:
dict with keys ‘frame_filenames’ (list), ‘frame_rate’ (float), ‘results’ (list). ‘frame_filenames’ are synthetic filenames (e.g. frame000000.jpg). Elements in ‘results’ are whatever is returned by the callback, typically dicts in the same format used in the ‘images’ array in the MD results format. [frame_filenames] and [results] both have one element per processed frame.
- Return type:
dict
- megadetector.detection.video_utils.run_callback_on_frames_for_folder(input_video_folder, frame_callback, every_n_frames=None, verbose=False, recursive=True, files_to_process_relative=None, error_on_empty_video=False)[source]
Calls the function frame_callback(np.array,image_id) on all (or selected) frames in all videos in [input_video_folder].
- Parameters:
input_video_folder (str) – video folder to process
frame_callback (function) – callback to run on frames, should take an np.array and a string and return a single value. callback should expect two arguments: (1) a numpy array with image data, in the typical PIL image orientation/channel order, and (2) a string identifier for the frame, typically something like “frame0006.jpg” (even though it’s not a JPEG image, this is just an identifier for the frame).
every_n_frames (int or float, optional) – sample every Nth frame starting from the first frame; if this is None or 1, every frame is processed. If this is a negative value, it’s interpreted as a sampling rate in seconds, which is rounded to the nearest frame sampling rate.
verbose (bool, optional) – enable additional debug console output
recursive (bool, optional) – recurse into [input_video_folder]
files_to_process_relative (list, optional) – only process specific relative paths
error_on_empty_video (bool, optional) – by default, videos with errors or no valid frames are silently stored as failures; this turns them into exceptions
- Returns:
dict with keys ‘video_filenames’ (list of str), ‘frame_rates’ (list of floats), ‘results’ (list of list of dicts). ‘video_filenames’ will contain relative filenames. ‘results’ is a list (one element per video) of lists (one element per frame) of whatever the callback returns, typically (but not necessarily) dicts in the MD results format.
For failed videos, the frame rate will be represented by -1, and “results” will be a dict with at least the key “failure”.
- Return type:
dict
- megadetector.detection.video_utils.video_folder_to_frames(input_folder, output_folder_base, recursive=True, overwrite=True, n_threads=1, every_n_frames=None, verbose=False, parallelization_uses_threads=True, quality=None, max_width=None, frames_to_extract=None, allow_empty_videos=False, relative_paths_to_process=None)[source]
For every video file in input_folder, creates a folder within output_folder_base, and renders frame of that video to images in that folder.
- Parameters:
input_folder (str) – folder to process
output_folder_base (str) – root folder for output images; subfolders will be created for each input video
recursive (bool, optional) – whether to recursively process videos in [input_folder]
overwrite (bool, optional) – whether to overwrite existing frame images
n_threads (int, optional) – number of concurrent workers to use; set to <= 1 to disable parallelism
every_n_frames (int or float, optional) – sample every Nth frame starting from the first frame; if this is None or 1, every frame is extracted. If this is a negative value, it’s interpreted as a sampling rate in seconds, which is rounded to the nearest frame sampling rate. Mutually exclusive with frames_to_extract.
verbose (bool, optional) – enable additional debug console output
parallelization_uses_threads (bool, optional) – whether to use threads (True) or processes (False) for parallelization; ignored if n_threads <= 1
quality (int, optional) – JPEG quality for frame output, from 0-100. Defaults to the opencv default (typically 95).
max_width (int, optional) – resize frames to be no wider than [max_width]
frames_to_extract (int, list of int, or dict, optional) – extract this specific set of frames from each video; mutually exclusive with every_n_frames. If all values are beyond the length of a video, no frames are extracted. Can also be a single int, specifying a single frame number. In the special case where frames_to_extract is [], this function still reads video frame rates and verifies that videos are readable, but no frames are extracted. Can be a dict mapping relative paths to lists of frame numbers to extract different frames from each video.
allow_empty_videos (bool, optional) – just print a warning if a video appears to have no frames (by default, this is an error).
relative_paths_to_process (list, optional) – only process the relative paths on this list
- Returns:
- a length-3 tuple containing:
list of lists of frame filenames; the Nth list of frame filenames corresponds to the Nth video
list of video frame rates; the Nth value corresponds to the Nth video
list of video filenames
- Return type:
tuple
- megadetector.detection.video_utils.video_to_frames(input_video_file, output_folder, overwrite=True, every_n_frames=None, verbose=False, quality=None, max_width=None, frames_to_extract=None, allow_empty_videos=True)[source]
Renders frames from [input_video_file] to .jpg files in [output_folder].
With help from:
https://stackoverflow.com/questions/33311153/python-extracting-and-saving-video-frames
- Parameters:
input_video_file (str) – video file to split into frames
output_folder (str) – folder to put frame images in
overwrite (bool, optional) – whether to overwrite existing frame images
every_n_frames (int, optional) – sample every Nth frame starting from the first frame; if this is None or 1, every frame is extracted. If this is a negative value, it’s interpreted as a sampling rate in seconds, which is rounded to the nearest frame sampling rate. Mutually exclusive with frames_to_extract.
verbose (bool, optional) – enable additional debug console output
quality (int, optional) – JPEG quality for frame output, from 0-100. Defaults to the opencv default (typically 95).
max_width (int, optional) – resize frames to be no wider than [max_width]
frames_to_extract (list of int, optional) – extract this specific set of frames; mutually exclusive with every_n_frames. If all values are beyond the length of the video, no frames are extracted. Can also be a single int, specifying a single frame number. In the special case where frames_to_extract is [], this function still reads video frame rates and verifies that videos are readable, but no frames are extracted.
allow_empty_videos (bool, optional) – Just print a warning if a video appears to have no frames (by default, this is an error).
- Returns:
length-2 tuple containing (list of frame filenames,frame rate)
- Return type:
tuple