utils package

This package contains utility functions for string manipulation, filename manipulation, downloading files from URLs, etc. Stuff one does when doing camera trap stuff that isn’t directly related to MegaDetector.

Submodules

utils.ct_utils module

ct_utils.py

Numeric/geometry/array utility functions.

megadetector.utils.ct_utils.args_to_object(args, obj)[source]

Copies all fields from a Namespace (typically the output from parse_args) to an object. Skips fields starting with _. Does not check existence in the target object.

Parameters:

args (argparse.Namespace) – the namespace to convert to an object
obj (object) – object whose whose attributes will be updated

Returns:

the modified object (modified in place, but also returned)

Return type:

object

megadetector.utils.ct_utils.compare_values_nan_equal(v0, v1)[source]

Utility function for comparing two values when we want to return True if both values are NaN.

Parameters:

v0 (object) – the first value to compare
v1 (object) – the second value to compare

Returns:

True if v0 == v1, or if both v0 and v1 are NaN

Return type:

bool

megadetector.utils.ct_utils.convert_xywh_to_xyxy(api_box)[source]

Converts an xywh bounding box (the MD output format) to an xyxy bounding box (the format produced by TF-based MD models).

Parameters:: api_box (list) – bbox formatted as [x_min, y_min, width_of_box, height_of_box]
Returns:: bbox formatted as [x_min, y_min, x_max, y_max]
Return type:: list

megadetector.utils.ct_utils.convert_yolo_to_xywh(yolo_box)[source]

Converts a YOLO format bounding box [x_center, y_center, w, h] to [x_min, y_min, width_of_box, height_of_box].

Parameters:: yolo_box (list) – bounding box of format [x_center, y_center, width_of_box, height_of_box]
Returns:: bbox with coordinates represented as [x_min, y_min, width_of_box, height_of_box]
Return type:: list

megadetector.utils.ct_utils.dict_to_kvp_list(d, item_separator=' ', kv_separator='=', non_string_value_handling='error')[source]

Convert a string <–> string dict into a string containing list of list of key-value pairs. I.e., converts {‘a’:’dog’,’b’:’cat’} to ‘a=dog b=cat’. If d is None, returns None. If d is empty, returns ‘’.

Parameters:

d (dict) – the dictionary to convert, must contain only strings
item_separator (str, optional) – the delimiter between KV pairs
kv_separator (str, optional) – the separator betweena a key and its value
non_string_value_handling (str, optional) – what do do with non-string values, can be “omit”, “error”, or “convert”

Returns:

the string representation of [d]

Return type:

str

megadetector.utils.ct_utils.dict_to_object(d, obj)[source]

Copies all fields from a dict to an object. Skips fields starting with _. Does not check existence in the target object.

Parameters:

d (dict) – the dict to convert to an object
obj (object) – object whose whose attributes will be updated

Returns:

the modified object (modified in place, but also returned)

Return type:

object

megadetector.utils.ct_utils.environment_is_wsl()[source]

Determines whether we’re running in WSL.

Returns:: True if we’re running in WSL

megadetector.utils.ct_utils.get_iou(bb1, bb2)[source]

Calculates the intersection over union (IoU) of two bounding boxes.

Adapted from:

https://stackoverflow.com/questions/25349178/calculating-percentage-of-bounding-box-overlap-for-image-detector-evaluation

Parameters:

bb1 (list) – [x_min, y_min, width_of_box, height_of_box]
bb2 (list) – [x_min, y_min, width_of_box, height_of_box]

Returns:

intersection_over_union, a float in [0, 1]

Return type:

float

megadetector.utils.ct_utils.get_max_conf(im)[source]

Given an image dict in the MD output format, computes the maximum detection confidence for any class. Returns 0.0 if there were no detections, if there was a failure, or if ‘detections’ isn’t present.

Parameters:: im (dict) – image dictionary in the MD output format (with a ‘detections’ field)
Returns:: the maximum detection confidence across all classes
Return type:: float

megadetector.utils.ct_utils.image_file_to_camera_folder(image_fn)[source]

Removes common overflow folders (e.g. RECNX101, RECNX102) from paths, i.e. turn:

abcRECNX101image001.jpg

…into:

abc

Returns the same thing as os.dirname() (i.e., just the folder name) if no overflow folders are present.

Always converts backslashes to slashes.

Parameters:: image_fn (str) – the image filename from which we should remove overflow folders
Returns:: a version of [image_fn] from which camera overflow folders have been removed
Return type:: str

megadetector.utils.ct_utils.invert_dictionary(d, verify_unique=False)[source]

Creates a new dictionary that maps d.values() to d.keys()

Parameters:

d (dict) – dictionary to invert
verify_unique (bool, optional) – error if values are not unique

Returns:

inverted copy of [d]

Return type:

dict

megadetector.utils.ct_utils.is_empty(v, strip_strings=True)[source]

A common definition of “empty” used throughout the repo, particularly when loading data from .csv files. “empty” includes None, ‘’, and NaN.

Parameters:

v (obj) – the object to evaluate for emptiness
strip_strings (bool, optional) – if v is a string, should whitespace be considered empty?

Returns:

True if [v] is None, ‘’, or NaN, otherwise False

Return type:

bool

megadetector.utils.ct_utils.is_function_name(s, calling_namespace)[source]

Determines whether [s] is a callable function in the global or local scope, or a built-in function.

Parameters:

s (str) – the string to test for function-ness
calling_namespace (dict) – typically pass the output of locals()

megadetector.utils.ct_utils.is_iterable(x)[source]

Uses duck typing to assess whether [x] is iterable (list, set, dict, etc.).

Parameters:: x (object) – the object to test
Returns:: True if [x] appears to be iterable, otherwise False
Return type:: bool

megadetector.utils.ct_utils.is_list_sorted(L, reverse=False)[source]

Returns True if the list L appears to be sorted, otherwise False.

Calling is_list_sorted(L,reverse=True) is the same as calling is_list_sorted(L.reverse(),reverse=False).

Parameters:

L (list) – list to evaluate
reverse (bool, optional) – whether to reverse the list before evaluating sort status

Returns:

True if the list L appears to be sorted, otherwise False

Return type:

bool

megadetector.utils.ct_utils.is_running_in_gha()[source]

Determine whether we are running on a GitHub Actions runner.

Returns:: True if we’re running in a GHA runner
Return type:: bool

megadetector.utils.ct_utils.is_sphinx_build()[source]

Determine whether we are running in the context of our Sphinx build.

Returns:: True if we’re running a Sphinx build
Return type:: bool

megadetector.utils.ct_utils.isnan(v)[source]

Returns True if v is a nan-valued float, otherwise returns False.

Parameters:: v (obj) – the object to evaluate for nan-ness
Returns:: True if v is a nan-valued float, otherwise False
Return type:: bool

megadetector.utils.ct_utils.json_serialize_datetime(obj)[source]

Serializes datetime.datetime and datetime.date objects to ISO format.

Parameters:: obj (object) – The object to serialize.
Returns:: The ISO format string representation of the datetime object.
Return type:: str
Raises:: TypeError – If the object is not a datetime.datetime or datetime.date instance.

megadetector.utils.ct_utils.make_temp_folder(top_level_folder='megadetector', subfolder=None, append_guid=True)[source]

Creates a temporary folder within the system temp folder, by default in a subfolder called megadetector/some_guid. Used for testing without making too much of a mess.

Parameters:

top_level_folder (str, optional) – the top-level folder to use within the system temp folder
subfolder (str, optional) – the subfolder within [top_level_folder]
append_guid (bool, optional) – append a guid to the subfolder

Returns:

the new directory

Return type:

str

megadetector.utils.ct_utils.make_test_folder(subfolder=None)[source]

Wrapper around make_temp_folder that creates folders within megadetector/tests

Parameters:: subfolder (str) – specific subfolder to create within the default megadetector temp folder.

megadetector.utils.ct_utils.max_none(a, b)[source]

Returns the maximum of a and b. If both are None, returns None. If one is None, returns the other.

Parameters:

a (numeric) – the first value to compare
b (numeric) – the second value to compare

Returns:

the maximum of a and b, or None

Return type:

numeric

megadetector.utils.ct_utils.min_none(a, b)[source]

Returns the minimum of a and b. If both are None, returns None. If one is None, returns the other.

Parameters:

a (numeric) – the first value to compare
b (numeric) – the second value to compare

Returns:

the minimum of a and b, or None

Return type:

numeric

megadetector.utils.ct_utils.parse_bool_string(s, strict=False)[source]

Convert the strings “true” or “false” to boolean values. Case-insensitive, discards leading and trailing whitespace. If s is already a bool, returns s.

Parameters:

s (str or bool) – the string to parse, or the bool to return
strict (bool, optional) – only allow “true” or “false”, otherwise handles “1”, “0”, “yes”, and “no”.

Returns:

the parsed value

Return type:

bool

megadetector.utils.ct_utils.parse_kvp(s, kv_separator='=')[source]

Parse a key/value pair, separated by [kv_separator]. Errors if s is not a valid key/value pair string. Strips leading/trailing whitespace from the key and value.

Parameters:

s (str) – the string to parse
kv_separator (str, optional) – the string separating keys from values.

Returns:

a 2-tuple formatted as (key,value)

Return type:

tuple

megadetector.utils.ct_utils.parse_kvp_list(items, kv_separator='=', d=None)[source]

Parse a list key-value pairs into a dictionary. If items is None or [], returns {}.

Parameters:

items (list) – the list of KVPs to parse
kv_separator (str, optional) – the string separating keys from values.
d (dict, optional) – the initial dictionary, defaults to {}

Returns:

a dict mapping keys to values

Return type:

dict

megadetector.utils.ct_utils.point_dist(p1, p2)[source]

Computes the distance between two points, represented as length-two tuples.

Parameters:

p1 (list or tuple) – point, formatted as (x,y)
p2 (list or tuple) – point, formatted as (x,y)

Returns:

the Euclidean distance between p1 and p2

Return type:

float

megadetector.utils.ct_utils.pretty_print_object(obj, b_print=True)[source]

Converts an arbitrary object to .json, optionally printing the .json representation.

Parameters:

obj (object) – object to print
b_print (bool, optional) – whether to print the object

Returns:

.json reprepresentation of [obj]

Return type:

str

megadetector.utils.ct_utils.rect_distance(r1, r2, format='x0y0x1y1')[source]

Computes the minimum distance between two axis-aligned rectangles, each represented as (x0,y0,x1,y1) by default.

Can also specify “format” as x0y0wh for MD-style bbox formatting (x0,y0,w,h).

Parameters:

r1 (list or tuple) – rectangle, formatted as (x0,y0,x1,y1) or (x0,y0,xy,y1)
r2 (list or tuple) – rectangle, formatted as (x0,y0,x1,y1) or (x0,y0,xy,y1)
format (str, optional) – whether the boxes are formatted as ‘x0y0x1y1’ (default) or ‘x0y0wh’

Returns:

the minimum distance between r1 and r2

Return type:

float

megadetector.utils.ct_utils.round_float(x, precision=3)[source]

Convenience wrapper for the native Python round()

Parameters:

x (float) – number to truncate
precision (int, optional) – the number of significant digits to preserve, should be >= 1

Returns:

rounded value

Return type:

float

megadetector.utils.ct_utils.round_float_array(xs, precision=3)[source]

Truncates the fractional portion of each floating-point value in the array [xs] to a specific number of floating-point digits.

Parameters:

xs (list) – list of floats to round
precision (int, optional) – the number of significant digits to preserve, should be >= 1

Returns:

list of rounded floats

Return type:

list

megadetector.utils.ct_utils.round_floats_in_nested_dict(obj, decimal_places=5, allow_iterator_conversion=False)[source]

Recursively rounds all floating point values in a nested structure to the specified number of decimal places. Handles dictionaries, lists, tuples, sets, and other iterables. Modifies mutable objects in place by default.

Parameters:

obj (obj) – The object to process (can be a dict, list, set, tuple, or primitive value)
decimal_places (int, optional) – Number of decimal places to round to
allow_iterator_conversion (bool, optional) – for iterator types, should we convert to lists? Otherwise we error.

Returns:

The processed object (useful for recursive calls)

megadetector.utils.ct_utils.run_all_module_tests()[source]: Run all tests in the ct_utils module. This is not invoked by pytest; this is just a convenience wrapper for debugging the tests.

megadetector.utils.ct_utils.sets_overlap(set1, set2)[source]

Determines whether two sets overlap.

Parameters:

set1 (set) – the first set to compare (converted to a set if it’s not already)
set2 (set) – the second set to compare (converted to a set if it’s not already)

Returns:

True if any elements are shared between set1 and set2

Return type:

bool

megadetector.utils.ct_utils.sort_dictionary_by_key(d, reverse=False)[source]

Sorts the dictionary [d] by key.

Parameters:

d (dict) – dictionary to sort
reverse (bool, optional) – whether to sort in reverse (descending) order

Returns:

sorted copy of [d]

Return type:

dict

megadetector.utils.ct_utils.sort_dictionary_by_value(d, sort_values=None, reverse=False)[source]

Sorts the dictionary [d] by value. If sort_values is None, uses d.values(), otherwise uses the dictionary sort_values as the sorting criterion. Always returns a new standard dict, so if [d] is, for example, a defaultdict, the returned value is not.

Parameters:

d (dict) – dictionary to sort
sort_values (dict, optional) – dictionary mapping keys in [d] to sort values (defaults to None, uses [d] itself for sorting)
reverse (bool, optional) – whether to sort in reverse (descending) order

Returns:

sorted copy of [d]

Return type:

dict

megadetector.utils.ct_utils.sort_list_of_dicts_by_key(L, k, reverse=False, none_handling='smallest')[source]

Sorts the list of dictionaries [L] by the key [k].

Parameters:

L (list) – list of dictionaries to sort
k (object, typically str) – the sort key
reverse (bool, optional) – whether to sort in reverse (descending) order
none_handling (str, optional) – how to handle None values. Options: “smallest” - treat None as smaller than all other values (default) “largest” - treat None as larger than all other values “error” - raise error when None is compared with non-None

Returns:

sorted copy of [L]

Return type:

list

megadetector.utils.ct_utils.sort_results_for_image(im)[source]

Sort classification and detection results in descending order by confidence (in place).

Parameters:: im (dict) – image dictionary in the MD output format (with a ‘detections’ field)

megadetector.utils.ct_utils.split_list_into_fixed_size_chunks(L, n)[source]

Split the list or tuple L into chunks of size n (allowing at most one chunk with size less than N, i.e. len(L) does not have to be a multiple of n).

Parameters:

L (list) – list to split into chunks
n (int) – preferred chunk size

Returns:

list of chunks, where each chunk is a list of length n or n-1

Return type:

list

megadetector.utils.ct_utils.split_list_into_n_chunks(L, n, chunk_strategy='greedy')[source]

Splits the list or tuple L into n equally-sized chunks (some chunks may be one element smaller than others, i.e. len(L) does not have to be a multiple of n).

chunk_strategy can be “greedy” (default, if there are k samples per chunk, the first k go into the first chunk) or “balanced” (alternate between chunks when pulling items from the list).

Parameters:

L (list) – list to split into chunks
n (int) – number of chunks
chunk_strategy (str, optional) – “greedy” or “balanced”; see above

Returns:

list of chunks, each of which is a list

Return type:

list

megadetector.utils.ct_utils.test_bounding_box_operations()[source]: Test bounding box conversion and IoU calculation.

megadetector.utils.ct_utils.test_datetime_serialization()[source]: Test datetime serialization functions.

megadetector.utils.ct_utils.test_detection_processing()[source]: Test functions related to processing detection results.

megadetector.utils.ct_utils.test_dictionary_operations()[source]: Test dictionary manipulation and sorting functions.

megadetector.utils.ct_utils.test_float_rounding_and_truncation()[source]: Test float rounding, truncation, and nested rounding functions.

megadetector.utils.ct_utils.test_geometric_operations()[source]: Test geometric calculations like distances.

megadetector.utils.ct_utils.test_list_operations()[source]: Test list sorting and chunking functions.

megadetector.utils.ct_utils.test_object_conversion_and_presentation()[source]: Test functions that convert or present objects.

megadetector.utils.ct_utils.test_path_operations()[source]: Test path manipulation functions.

megadetector.utils.ct_utils.test_string_parsing()[source]: Test string parsing utilities like KVP and boolean parsing.

megadetector.utils.ct_utils.test_temp_folder_creation()[source]: Test temporary folder creation and cleanup.

megadetector.utils.ct_utils.test_type_checking_and_validation()[source]: Test type checking and validation utility functions.

megadetector.utils.ct_utils.test_write_json()[source]: Test driver for write_json.

megadetector.utils.ct_utils.to_bool(v)[source]

Convert an object to a bool with specific rules.

Parameters:

v (object) – The object to convert

Returns:

For strings: True if ‘true’ (case-insensitive), False if ‘false’, recursively applied if int-like
For int/bytes: False if 0, True otherwise
For bool: returns the bool as-is
For other types: None

Return type:

bool or None

megadetector.utils.ct_utils.truncate_float(x, precision=3)[source]

Truncates the fractional portion of a floating-point value to a specific number of floating-point digits.

For example:

truncate_float(0.0003214884) –> 0.000321 truncate_float(1.0003214884) –> 1.000321

This function is primarily used to achieve a certain float representation before exporting to JSON.

Parameters:

x (float) – scalar to truncate
precision (int, optional) – the number of significant digits to preserve, should be >= 1

Returns:

truncated version of [x]

Return type:

float

megadetector.utils.ct_utils.truncate_float_array(xs, precision=3)[source]

Truncates the fractional portion of each floating-point value in the array [xs] to a specific number of floating-point digits.

Parameters:

xs (list) – list of floats to truncate
precision (int, optional) – the number of significant digits to preserve, should be >= 1

Returns:

list of truncated floats

Return type:

list

megadetector.utils.ct_utils.write_json(path, content, indent=1, force_str=False, serialize_datetimes=False, ensure_ascii=True, encoding='utf-8')[source]

Standardized wrapper for json.dump().

Parameters:

path (str) – filename to write to
content (object) – object to dump
indent (int, optional) – indentation depth passed to json.dump
force_str (bool, optional) – whether to force string conversion for non-serializable objects
serialize_datetimes (bool, optional) – whether to serialize datetime objects to ISO format
ensure_ascii (bool, optional) – whether to ensure ASCII characters in the output
encoding (str, optional) – string encoding to use

utils.directory_listing module

directory_listing.py

Script for creating Apache-style HTML directory listings for a local directory and all its subdirectories.

Also includes a preview of a jpg file (the first in an alphabetical list), if present.

megadetector.utils.directory_listing.create_html_index(dir, overwrite=False, template_fun=<function _create_plain_index>, basepath=None, recursive=True)[source]

Recursively traverses the local directory [dir] and generates a index file for each folder using [template_fun] to generate the HTML output. Excludes hidden files.

Parameters:

dir (str) – directory to process
overwrite (bool, optional) – whether to over-write existing index file
template_fun (func, optional) – function taking three arguments (string, list of string, list of string) representing the current root, the list of folders, and the list of files. Should return the HTML source of the index file.
basepath (str, optional) – if not None, the name used for each subfolder in [dir] in the output files will be relative to [basepath]
recursive (bool, optional) – recurse into subfolders

directory_listing - CLI interface

directory_listing [-h] [--basepath BASEPATH] [--overwrite] directory

directory_listing positional arguments

directory - Path to directory which should be traversed.

directory_listing options

-h, --help - show this help message and exit
--basepath BASEPATH - Folder names will be printed relative to basepath, if specified
--overwrite - If set, the script will overwrite existing index.html files.

utils.md_tests module

md_tests.py

A series of tests to validate basic repo functionality and verify either “correct” inference behavior, or - when operating in environments other than the training environment - acceptable deviation from the correct results.

This module should not depend on anything else in this repo outside of the tests themselves, even if it means some duplicated code (e.g. for downloading files), since much of what it tries to test is, e.g., imports.

“Correctness” is determined by agreement with a file that this script fetches from lila.science.

class megadetector.utils.md_tests.MDTestOptions[source]

Bases: object

Options controlling test behavior

alt_model: For comparison tests, use a model that produces slightly different output

alternative_batch_size: Batch size to use when testing batches of size > 1

cli_test_pythonpath: PYTHONPATH to set for CLI tests; if None, inherits from the parent process. Only impacts the called functions, not the parent process.

cli_working_dir

Current working directory when running CLI tests

If this is None, we won’t mess with the inherited working directory.

cpu_execution_is_error: If GPU execution is requested, but a GPU is not available, should we error?

default_model: Default model to use for testing (filename, URL, or well-known model string)

detector_options: Detector options passed to PTDetector

disable_gpu: Force CPU execution

force_data_download: Download test data even if it appears to have already been downloaded

force_data_unzip: Unzip test data even if it appears to have already been unzipped

iou_threshold_for_file_comparison: IoU threshold used to determine whether boxes in two detection files likely correspond to the same box.

max_conf_error: How much deviation from the expected confidence values should we allow before a disrepancy becomes an error?

max_coord_error: How much deviation from the expected detection coordinates should we allow before a disrepancy becomes an error?

model_folder: Used to drive a series of tests (typically with a low value for python_test_depth) over a folder of models.

n_cores_for_multiprocessing_tests: Number of cores to use for multi-CPU inference tests

python_test_depth: Used as a knob to control the level of Python tests, typically used when we want to run a series of simple tests on a small number of models, rather than a deep test of tests on a small number of models. The gestalt is that this is a range from 0-100.

scratch_dir: Force a specific folder for temporary input/output

skip_cli_tests: Skip CLI tests

skip_cpu_tests: Skip force-CPU tests

skip_download_tests: Skip download tests

skip_image_tests: Skip tests related to still image processing

skip_import_tests: Skip module import tests

skip_localhost_downloads: Skip download tests for local URLs

skip_python_tests: Skip tests launched via Python functions (as opposed to CLIs)

skip_video_tests: Skip tests related to video processing

test_data_url: Where does the test data live?

test_mode: Currently should be ‘all’ or ‘utils-only’

warning_mode: By default, any unexpected behavior is an error; this forces most errors to be treated as warnings.

yolo_working_dir

YOLOv5 installation, only relevant if we’re testing run_inference_with_yolov5_val.

If this is None, we’ll skip that test.

megadetector.utils.md_tests.compare_detection_lists(detections_a, detections_b, options, bidirectional_comparison=True)[source]

Compare two lists of MD-formatted detections, matching detections across lists using IoU criteria. Generally used to compare detections for the same image when two sets of results are expected to be more or less the same.

Parameters:

detections_a (list) – the first set of detection dicts
detections_b (list) – the second set of detection dicts
options (MDTestOptions) – options that determine tolerable differences between files
bidirectional_comparison (bool, optional) – reverse the arguments and make a recursive call.

Returns:

a dictionary with keys ‘max_conf_error’ and ‘max_coord_error’.

Return type:

dict

megadetector.utils.md_tests.compare_results(inference_output_file, expected_results_file, options, expected_results_file_is_absolute=False)[source]

Compare two MD-formatted output files that should be nearly identical, allowing small changes (e.g. rounding differences). Generally used to compare a new results file to an expected results file.

Parameters:

inference_output_file (str) – the first results file to compare
expected_results_file (str) – the second results file to compare
options (MDTestOptions) – options that determine tolerable differences between files
expected_results_file_is_absolute (str, optional) – by default, expected_results_file is appended to options.scratch_dir; this option specifies that it’s an absolute path.

Returns:

dictionary with keys ‘max_coord_error’ and ‘max_conf_error’

Return type:

dict

megadetector.utils.md_tests.download_test_data(options=None)[source]

Downloads the test zipfile if necessary, unzips if necessary. Initializes temporary fields in [options], particularly [options.scratch_dir].

Parameters:: options (MDTestOptions, optional) – see MDTestOptions for details
Returns:: the same object passed in as input, or the options that were used if [options] was supplied as None
Return type:: MDTestOptions

megadetector.utils.md_tests.execute(cmd)[source]

Runs [cmd] (a single string) in a shell, yielding each line of output to the caller.

Parameters:: cmd (str) – command to run
Returns:: the command’s return code, always zero, otherwise a CalledProcessError is raised
Return type:: int

megadetector.utils.md_tests.execute_and_print(cmd, print_output=True, catch_exceptions=False, echo_command=True)[source]

Runs [cmd] (a single string) in a shell, capturing (and optionally printing) output.

Parameters:

cmd (str) – command to run
print_output (bool, optional) – whether to print output from [cmd]
catch_exceptions (bool, optional) – whether to catch exceptions, rather than raising them
echo_command (bool, optional) – whether to print [cmd] to stdout prior to execution

Returns:

a dictionary with fields “status” (the process return code) and “output” (the content of stdout)

Return type:

dict

megadetector.utils.md_tests.get_expected_results_filename(gpu_is_available, model_string='mdv5a', test_type='image', augment=False, options=None)[source]

Expected results vary just a little across inference environments, particularly between PT 1.x and 2.x, so when making sure things are working acceptably, we compare to a reference file that matches the current environment.

This function gets the correct filename to compare to current results, depending on whether a GPU is available.

Parameters:

gpu_is_available (bool) – whether a GPU is available
model_string (str, optional) – the model for which we’re retrieving expected results
test_type (str, optional) – the test type we’re running (“image” or “video”)
augment (bool, optional) – whether we’re running this test with image augmentation
options (MDTestOptions, optional) – additional control flow options

Returns:

relative filename of the results file we should use (within the test data zipfile)

Return type:

str

megadetector.utils.md_tests.is_gpu_available(verbose=True)[source]

Checks whether a GPU (including M1/M2 MPS) is available, according to PyTorch. Returns false if PT fails to import.

Parameters:: verbose (bool, optional) – enable additional debug console output
Returns:: whether a GPU is available
Return type:: bool

megadetector.utils.md_tests.output_files_are_identical(fn1, fn2, verbose=False)[source]

Checks whether two MD-formatted output files are identical other than file sorting.

Parameters:

fn1 (str) – the first filename to compare
fn2 (str) – the second filename to compare
verbose (bool, optional) – enable additional debug output

Returns:

whether [fn1] and [fn2] are identical other than file sorting.

Return type:

bool

megadetector.utils.md_tests.run_cli_tests(options)[source]

Runs CLI (as opposed to Python-based) package tests.

Parameters:: options (MDTestOptions) – see MDTestOptions for details

megadetector.utils.md_tests.run_download_tests(options)[source]

Test automatic model downloads.

Parameters:: options (MDTestOptions) – see MDTestOptions for details

megadetector.utils.md_tests.run_python_tests(options)[source]

Runs Python-based (as opposed to CLI-based) package tests.

Parameters:: options (MDTestOptions) – see MDTestOptions for details

megadetector.utils.md_tests.run_tests(options)[source]

Runs Python-based and/or CLI-based package tests.

Parameters:: options (MDTestOptions) – see MDTestOptions for details

megadetector.utils.md_tests.test_package_imports(package_name, exceptions=None, verbose=True)[source]

Imports all modules in [package_name]

Parameters:

package_name (str) – the package name to test
exceptions (list, optional) – exclude any modules that contain any of these strings
verbose (bool, optional) – enable additional debug output

megadetector.utils.md_tests.test_suite_entry_point()[source]: This is the entry point when running tests via pytest; we run a subset of tests in this environment, e.g. we don’t run CLI or video tests.

md_tests - CLI interface

MegaDetector test suite

md_tests [-h] [--disable_gpu] [--cpu_execution_is_error] [--scratch_dir SCRATCH_DIR]
         [--skip_image_tests] [--skip_video_tests] [--skip_video_rendering_tests]
         [--skip_python_tests] [--skip_cli_tests] [--skip_download_tests]
         [--skip_import_tests] [--skip_cpu_tests] [--force_data_download]
         [--force_data_unzip] [--warning_mode] [--max_conf_error MAX_CONF_ERROR]
         [--max_coord_error MAX_COORD_ERROR] [--cli_working_dir CLI_WORKING_DIR]
         [--yolo_working_dir YOLO_WORKING_DIR] [--cli_test_pythonpath CLI_TEST_PYTHONPATH]
         [--test_mode TEST_MODE] [--python_test_depth PYTHON_TEST_DEPTH]
         [--model_folder MODEL_FOLDER] [--detector_options [KEY=VALUE ...]]
         [--default_model DEFAULT_MODEL]

md_tests options

-h, --help - show this help message and exit
--disable_gpu - Disable GPU operation
--cpu_execution_is_error - Fail if the GPU appears not to be available
--scratch_dir SCRATCH_DIR - Directory for temporary storage (defaults to system temp dir)
--skip_image_tests - Skip tests related to still images
--skip_video_tests - Skip tests related to video
--skip_video_rendering_tests - Skip tests related to rendering video
--skip_python_tests - Skip python tests
--skip_cli_tests - Skip CLI tests
--skip_download_tests - Skip model download tests
--skip_import_tests - Skip module import tests
--skip_cpu_tests - Skip force-CPU tests
--force_data_download - Force download of the test data file, even if it’s already available
--force_data_unzip - Force extraction of all files in the test data file, even if they’re already available
--warning_mode - Turns numeric/content errors into warnings
--max_conf_error MAX_CONF_ERROR - Maximum tolerable confidence value deviation from expected (default 0.005)
--max_coord_error MAX_COORD_ERROR - Maximum tolerable coordinate value deviation from expected (default 0.001)
--cli_working_dir CLI_WORKING_DIR - Working directory for CLI tests
--yolo_working_dir YOLO_WORKING_DIR - Working directory for yolo inference tests
--cli_test_pythonpath CLI_TEST_PYTHONPATH - PYTHONPATH to set for CLI tests; if None, inherits from the parent process
--test_mode TEST_MODE - Test mode: "all" or "utils-only"
--python_test_depth PYTHON_TEST_DEPTH - Used as a knob to control the level of Python tests (0-100)
--model_folder MODEL_FOLDER - Run Python tests on every model in this folder
--detector_options KEY=VALUE - Detector-specific options, as a space-separated list of key-value pairs
--default_model DEFAULT_MODEL - Default model file or well-known model name (used for most tests)

utils.path_utils module

path_utils.py

Miscellaneous useful utils for path manipulation, i.e. things that could almost be in os.path, but aren’t.

class megadetector.utils.path_utils.TestPathUtils[source]

Bases: object

Tests for path_utils.py

set_up()[source]: Create a temporary directory for testing.

tear_down()[source]: Remove the temporary directory after tests.

test_add_files_to_single_tar_file()[source]: Test add_files_to_single_tar_file.

test_compute_file_hash()[source]: Test compute_file_hash and parallel_compute_file_hashes.

test_filename_cleaning()[source]: Test clean_filename, clean_path, and flatten_path functions.

test_fileparts()[source]: Test the fileparts function.

test_find_image_strings()[source]: Test the find_image_strings function.

test_find_images()[source]: Test the find_images function.

test_folder_list()[source]: Test the folder_list function.

test_folder_summary()[source]: Test the folder_summary function.

test_get_file_sizes()[source]: Test get_file_sizes function.

test_insert_before_extension()[source]: Test the insert_before_extension function.

test_is_executable()[source]: Test the is_executable function. This is a basic test; comprehensive testing is environment-dependent.

test_is_image_file()[source]: Test the is_image_file function.

test_parallel_copy_files()[source]: Test the parallel_copy_files function (with max_workers=1 for test simplicity).

test_parallel_unzip_files()[source]: Test the parallel_unzip_files function.

test_parallel_zip_individual_files_and_folders()[source]: Test parallel_zip_files, parallel_zip_folders, and zip_each_file_in_folder.

test_path_is_abs()[source]: Test the path_is_abs function.

test_path_join()[source]: Test the path_join function.

test_recursive_file_list_and_file_list()[source]: Test the recursive_file_list and file_list functions.

test_remove_empty_folders()[source]: Test the remove_empty_folders function.

test_safe_create_link_unix()[source]: Test the safe_create_link function on Unix-like systems.

test_split_path()[source]: Test the split_path function.

test_write_read_list_to_file()[source]: Test write_list_to_file and read_list_from_file functions.

test_zip_file_and_unzip_file()[source]: Test zip_file and unzip_file functions.

test_zip_files_into_single_zipfile()[source]: Test zip_files_into_single_zipfile.

test_zip_folder()[source]: Test the zip_folder function.

megadetector.utils.path_utils.add_files_to_single_tar_file(input_files, output_fn, arc_name_base, overwrite=False, verbose=False, mode='x')[source]

Adds all the files in [input_files] to the tar file [output_fn]. Archive names are relative to arc_name_base.

Parameters:

input_files (list) – list of absolute filenames to include in the .tar file
output_fn (str) – .tar file to create
arc_name_base (str) – absolute folder from which relative paths should be determined; behavior is undefined if there are files in [input_files] that don’t live within [arc_name_base]
overwrite (bool, optional) – whether to overwrite an existing .tar file
verbose (bool, optional) – enable additional debug console output
mode (str, optional) – compression type, can be ‘x’ (no compression), ‘x:gz’, or ‘x:bz2’.

Returns:

the output tar file, whether we created it or determined that it already exists

Return type:

str

megadetector.utils.path_utils.clean_filename(filename, allow_list='~-_.() abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789', char_limit=255, force_lower=False, remove_trailing_leading_whitespace=True, replace_whitespace=None)[source]

Removes non-ASCII and other invalid filename characters (on any reasonable OS) from a filename, then optionally trims to a maximum length.

Does not allow :/ by default, use clean_path if you want to preserve those.

Adapted from https://gist.github.com/wassname/1393c4a57cfcbf03641dbc31886123b8

Parameters:

filename (str) – filename to clean
allow_list (str, optional) – string containing all allowable filename characters
char_limit (int, optional) – maximum allowable filename length, if None will skip this step
force_lower (bool, optional) – convert the resulting filename to lowercase
remove_trailing_leading_whitespace (bool, optional) – remove trailing and leading whitespace from each component of a path, e.g. does not allow a/b/c /d.jpg
replace_whitespace (str, optional) – replace all contiguous whitespace with this string, or None to leave whitespace intact

Returns:

cleaned version of [filename]

Return type:

str

megadetector.utils.path_utils.clean_path(pathname, allow_list='~-_.() abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789:\\/', char_limit=255, force_lower=False, remove_trailing_leading_whitespace=True)[source]

Removes non-ASCII and other invalid path characters (on any reasonable OS) from a path, then optionally trims to a maximum length.

Parameters:

pathname (str) – path name to clean
allow_list (str, optional) – string containing all allowable filename characters
char_limit (int, optional) – maximum allowable filename length, if None will skip this step
force_lower (bool, optional) – convert the resulting filename to lowercase
remove_trailing_leading_whitespace (bool, optional) – remove trailing and leading whitespace from each component of a path, e.g. does not allow a/b/c /d.jpg

Returns:

cleaned version of [filename]

Return type:

str

megadetector.utils.path_utils.compute_file_hash(file_path, algorithm='sha256', allow_failures=True)[source]

Compute the hash of a file.

Adapted from:

https://www.geeksforgeeks.org/python-program-to-find-hash-of-file/

Parameters:

file_path (str) – the file to hash
algorithm (str, optional) – the hashing algorithm to use (e.g. md5, sha256)
allow_failures (bool, optional) – if True, read failures will silently return None; if false, read failures will raise exceptions

Returns:

the hash value for this file

Return type:

str

megadetector.utils.path_utils.delete_file(input_file, verbose=False)[source]

Deletes a single file.

Parameters:

input_file (str) – file to delete
verbose (bool, optional) – enable additional debug console output

Returns:

True if file was deleted successfully, False otherwise

Return type:

bool

megadetector.utils.path_utils.file_list(base_dir, convert_slashes=True, return_relative_paths=False, sort_files=True, recursive=False)[source]

Trivial wrapper for recursive_file_list, which was a poor function name choice at the time, since I later wanted to add non-recursive lists, but it doesn’t make sense to have a “recursive” option in a function called “recursive_file_list”.

Parameters:

base_dir (str) – folder to enumerate
convert_slashes (bool, optional) – force forward slashes; if this is False, will use the native path separator
return_relative_paths (bool, optional) – return paths that are relative to [base_dir], rather than absolute paths
sort_files (bool, optional) – force files to be sorted, otherwise uses the sorting provided by os.walk()
recursive (bool, optional) – enumerate recursively

Returns:

list of filenames

Return type:

list

megadetector.utils.path_utils.fileparts(path)[source]

Breaks down a path into the directory path, filename, and extension.

Note that the ‘.’ lives with the extension, and separators are removed.

Examples:

>>> fileparts('file')
('', 'file', '')
>>> fileparts(r'c:/dir/file.jpg')
('c:/dir', 'file', '.jpg')
>>> fileparts('/dir/subdir/file.jpg')
('/dir/subdir', 'file', '.jpg')

Parameters:

path (str) – path name to separate into parts

Returns:

tuple containing (p,n,e):

p: str, directory path
n: str, filename without extension
e: str, extension including the ‘.’

Return type:

tuple

megadetector.utils.path_utils.find_image_strings(strings)[source]

Given a list of strings that are potentially image file names, looks for strings that actually look like image file names (based on extension).

Parameters:: strings (list) – list of filenames to check for image-ness
Returns:: the subset of [strings] that appear to be image filenames
Return type:: list

megadetector.utils.path_utils.find_images(dirname, recursive=False, return_relative_paths=False, convert_slashes=True)[source]

Finds all files in a directory that look like image file names. Returns absolute paths unless return_relative_paths is set. Uses the OS-native path separator unless convert_slashes is set, in which case will always use ‘/’.

Parameters:

dirname (str) – the folder to search for images
recursive (bool, optional) – whether to search recursively
return_relative_paths (str, optional) – return paths that are relative to [dirname], rather than absolute paths
convert_slashes (bool, optional) – force forward slashes in return values

Returns:

list of image filenames found in [dirname]

Return type:

list

megadetector.utils.path_utils.flatten_path(pathname, separator_chars=':\\/', separator_char_replacement='~')[source]

Removes non-ASCII and other invalid path characters (on any reasonable OS) from a path, then trims to a maximum length. Replaces all valid separators with [separator_char_replacement.]

Parameters:

pathname (str) – path name to flatten
separator_chars (str, optional) – string containing all known path separators
separator_char_replacement (str, optional) – string to insert in place of path separators.

Returns:

flattened version of [pathname]

Return type:

str

megadetector.utils.path_utils.folder_list(base_dir, convert_slashes=True, return_relative_paths=False, sort_folders=True, recursive=False)[source]

Enumerates folders (not files) in [base_dir].

Parameters:

base_dir (str) – folder to enumerate
convert_slashes (bool, optional) – force forward slashes; if this is False, will use the native path separator
return_relative_paths (bool, optional) – return paths that are relative to [base_dir], rather than absolute paths
sort_folders (bool, optional) – force folders to be sorted, otherwise uses the sorting provided by os.walk()
recursive (bool, optional) – enumerate recursively

Returns:

list of folder names

Return type:

list

megadetector.utils.path_utils.folder_summary(folder, print_summary=True)[source]

Returns (and optionally prints) a summary of [folder], including:

The total number of files
The total number of folders
The number of files for each extension

Parameters:

folder (str) – folder to summarize
print_summary (bool, optional) – whether to print the summary

Returns:

with fields “n_files”, “n_folders”, and “extension_to_count”

Return type:

dict

megadetector.utils.path_utils.get_file_sizes(filenames, max_workers=1, use_threads=True, verbose=False, recursive=True, convert_slashes=True, return_relative_paths=True)[source]

Returns a dictionary mapping every file in [filenames] to the corresponding file size, or None for errors. If [filenames] is a folder, will enumerate the folder (optionally recursively).

Parameters:

filenames (list or str) – list of filenames for which we should read sizes, or a folder within which we should read all file sizes recursively
max_workers (int, optional) – number of concurrent workers; set <= 1 to disable parallelism
use_threads (bool, optional) – whether to use threads (True) or processes (False); ignored if max_workers <= 1
verbose (bool, optional) – enable debug output
recursive (bool, optional) – enumerate recursively, only relevant if [filenames] is a folder.
convert_slashes (bool, optional) – convert backslashes to forward slashes
return_relative_paths (bool, optional) – return relative paths; only relevant if [filenames] is a folder.

Returns:

mapping filename to file size in bytes, or None for files that error

Return type:

dict

megadetector.utils.path_utils.insert_before_extension(filename, s=None, separator='.')[source]

Insert string [s] before the extension in [filename], separated with [separator].

If [s] is empty, generates a date/timestamp. If [filename] has no extension, appends [s].

Examples:

>>> insert_before_extension('/dir/subdir/file.ext', 'insert')
'/dir/subdir/file.insert.ext'
>>> insert_before_extension('/dir/subdir/file', 'insert')
'/dir/subdir/file.insert'
>>> insert_before_extension('/dir/subdir/file')
'/dir/subdir/file.2020.07.20.10.54.38'

Parameters:

filename (str) – filename to manipulate
s (str, optional) – string to insert before the extension in [filename], or None to insert a datestamp
separator (str, optional) – separator to place between the filename base and the inserted string

Returns:

modified string

Return type:

str

megadetector.utils.path_utils.is_executable(filename)[source]

Checks whether [filename] is on the system path and marked as executable.

Parameters:: filename (str) – filename to check for executable status
Returns:: True if [filename] is on the system path and marked as executable, otherwise False
Return type:: bool

megadetector.utils.path_utils.is_image_file(s, img_extensions=('.jpg', '.jpeg', '.gif', '.png', '.tif', '.tiff', '.bmp', '.webp', '.avif'))[source]

Checks a file’s extension against a hard-coded set of image file extensions. Uses case-insensitive comparison.

Does not check whether the file exists, only determines whether the filename implies it’s an image file.

Parameters:

s (str) – filename to evaluate for image-ness
img_extensions (list, optional) – list of known image file extensions

Returns:

True if [s] appears to be an image file, else False

Return type:

bool

megadetector.utils.path_utils.make_executable(filename, catch_exceptions=False)[source]

Make [filename] executable.

Parameters:

filename (str) – filename to make executable
catch_exceptions (bool, optional) – treat errors as warnings

megadetector.utils.path_utils.open_file(filename, attempt_to_open_in_wsl_host=False, browser_name=None)[source]

Opens [filename] in the default OS file handler for this file type.

If browser_name is not None, uses the webbrowser module to open the filename in the specified browser; see https://docs.python.org/3/library/webbrowser.html for supported browsers. Falls back to the default file handler if webbrowser.open() fails. In this case, attempt_to_open_in_wsl_host is ignored unless webbrowser.open() fails.

If browser_name is ‘default’, uses the system default. This is different from the parameter to webbrowser.get(), where None implies the system default.

Parameters:

filename (str) – file to open
attempt_to_open_in_wsl_host (bool, optional) – if this is True, and we’re in WSL, attempts to open [filename] in the Windows host environment. Only supported for windows files that are being provided as /mnt/… paths.
browser_name (str, optional) – see above

megadetector.utils.path_utils.open_file_in_chrome(filename)[source]

Open a file in chrome, regardless of file type. I typically use this to open .md files in Chrome.

Parameters:: filename (str) – file to open
Returns:: whether the operation was successful
Return type:: bool

megadetector.utils.path_utils.parallel_compute_file_hashes(filenames, max_workers=16, use_threads=True, recursive=True, algorithm='sha256', verbose=False)[source]

Compute file hashes for a list or folder of images.

Parameters:

filenames (list or str) – a list of filenames or a folder
max_workers (int, optional) – the number of parallel workers to use; set to <=1 to disable parallelization
use_threads (bool, optional) – whether to use threads (True) or processes (False) for parallelization
algorithm (str, optional) – the hashing algorithm to use (e.g. md5, sha256)
recursive (bool, optional) – if [filenames] is a folder, whether to enumerate recursively. Ignored if [filenames] is a list.
verbose (bool, optional) – enable additional debug output

Returns:

a dict mapping filenames to hash values; values will be None for files that fail to load.

Return type:

dict

megadetector.utils.path_utils.parallel_copy_files(input_file_to_output_file, max_workers=16, use_threads=True, overwrite=False, verbose=False, move=False)[source]

Copy (or move) files from source to target according to the dict input_file_to_output_file.

Parameters:

input_file_to_output_file (dict) – dictionary mapping source files to the target files to which they should be copied
max_workers (int, optional) – number of concurrent workers; set to <=1 to disable parallelism
use_threads (bool, optional) – whether to use threads (True) or processes (False) for parallel copying; ignored if max_workers <= 1
overwrite (bool, optional) – whether to overwrite existing destination files
verbose (bool, optional) – enable additional debug output
move (bool, optional) – move instead of copying

megadetector.utils.path_utils.parallel_delete_files(input_files, max_workers=16, use_threads=True, verbose=False)[source]

Deletes one or more files in parallel.

Parameters:

input_files (list) – list of files to delete
max_workers (int, optional) – number of concurrent workers, set to <= 1 to disable parallelism
use_threads (bool, optional) – whether to use threads (True) or processes (False); ignored if max_workers <= 1
verbose (bool, optional) – enable additional debug console output

megadetector.utils.path_utils.parallel_unzip_files(input_files, output_folder=None, max_workers=16, use_threads=True, verbose=False)[source]

Unzips one or more zipfiles in parallel.

Parameters:

input_files (list) – list of zipfiles to unzip
output_folder (str, optional) – folder to which we should unzip all files in [input_files], defaults to unzipping each file to the folder where it lives
max_workers (int, optional) – number of concurrent workers, set to <= 1 to disable parallelism
use_threads (bool, optional) – whether to use threads (True) or processes (False); ignored if max_workers <= 1
verbose (bool, optional) – enable additional debug console output

megadetector.utils.path_utils.parallel_zip_files(input_files, max_workers=16, use_threads=True, compress_level=9, overwrite=False, verbose=False)[source]

Zips one or more files to separate output files in parallel, leaving the original files in place. Each file is zipped to [filename].zip.

Parameters:

input_files (str) – list of files to zip
max_workers (int, optional) – number of concurrent workers, set to <= 1 to disable parallelism
use_threads (bool, optional) – whether to use threads (True) or processes (False); ignored if max_workers <= 1
compress_level (int, optional) – zip compression level between 0 and 9
overwrite (bool, optional) – whether to overwrite an existing .tar file
verbose (bool, optional) – enable additional debug console output

megadetector.utils.path_utils.parallel_zip_folders(input_folders, max_workers=16, use_threads=True, compress_level=9, overwrite=False, verbose=False)[source]

Zips one or more folders to separate output files in parallel, leaving the original folders in place. Each folder is zipped to [folder_name].zip.

Parameters:

input_folders (list) – list of folders to zip
max_workers (int, optional) – number of concurrent workers, set to <= 1 to disable parallelism
use_threads (bool, optional) – whether to use threads (True) or processes (False); ignored if max_workers <= 1
compress_level (int, optional) – zip compression level between 0 and 9
overwrite (bool, optional) – whether to overwrite an existing .tar file
verbose (bool, optional) – enable additional debug console output

megadetector.utils.path_utils.path_is_abs(p)[source]

Determines whether [p] is an absolute path. An absolute path is defined as one that starts with slash, backslash, or a letter followed by a colon.

Parameters:: p (str) – path to evaluate
Returns:: True if [p] is an absolute path, else False
Return type:: bool

megadetector.utils.path_utils.path_join(*paths, convert_slashes=True)[source]

Wrapper for os.path.join that optionally converts backslashes to forward slashes.

Parameters:

*paths (variable-length set of strings) – Path components to be joined.
convert_slashes (bool, optional) – whether to convert \ to /

Returns:

A string with the joined path components.

megadetector.utils.path_utils.read_list_from_file(filename)[source]

Reads a json-formatted list of strings from a file.

Parameters:: filename (str) – .json filename to read
Returns:: list of strings read from [filename]
Return type:: list

megadetector.utils.path_utils.recursive_file_list(base_dir, convert_slashes=True, return_relative_paths=False, sort_files=True, recursive=True)[source]

Enumerates files (not directories) in [base_dir].

Parameters:

base_dir (str) – folder to enumerate
convert_slashes (bool, optional) – force forward slashes; if this is False, will use the native path separator
return_relative_paths (bool, optional) – return paths that are relative to [base_dir], rather than absolute paths
sort_files (bool, optional) – force files to be sorted, otherwise uses the sorting provided by os.walk()
recursive (bool, optional) – enumerate recursively

Returns:

list of filenames

Return type:

list

megadetector.utils.path_utils.remove_empty_folders(path, remove_root=False)[source]

Recursively removes empty folders within the specified path.

Parameters:

path (str) – the folder from which we should recursively remove empty folders.
remove_root (bool, optional) – whether to remove the root directory if it’s empty after removing all empty subdirectories. This will always be True during recursive calls.

Returns:

True if the directory is empty after processing, False otherwise

Return type:

bool

megadetector.utils.path_utils.safe_create_link(link_exists, link_new)[source]

Creates a symlink at [link_new] pointing to [link_exists].

If [link_new] already exists, make sure it’s a link (not a file), and if it has a different target than [link_exists], removes and re-creates it.

Creates a real directory if necessary.

Errors if [link_new] already exists but it’s not a link.

Parameters:

link_exists (str) – the source of the (possibly-new) symlink
link_new (str) – the target of the (possibly-new) symlink

megadetector.utils.path_utils.split_path(path)[source]

Splits [path] into all its constituent file/folder tokens.

Examples:

>>> split_path(r'c:\dir\subdir\file.txt')
['c:\\', 'dir', 'subdir', 'file.txt']
>>> split_path('/dir/subdir/file.jpg')
['/', 'dir', 'subdir', 'file.jpg']
>>> split_path('c:\\')
['c:\\']
>>> split_path('/')
['/']

Parameters:: path (str) – path to split into tokens
Returns:: list of path tokens
Return type:: list

megadetector.utils.path_utils.test_file_write(fn, overwrite=True)[source]

Writes an empty file to [fn], used to test that we have appropriate permissions. If [fn] exists and overwrite is False, this function errors. Creates the directory containing [fn] if necessary. Does not delete the test file.

Parameters:

fn (str) – the filename to which we should perform a test write
overwrite (bool, optional) – if [fn] exists, whether we should overwrite (True) or error (False)

Returns:

currently always returns True or errors

Return type:

bool

megadetector.utils.path_utils.test_path_utils()[source]: Runs all tests in the TestPathUtils class.

megadetector.utils.path_utils.unzip_file(input_file, output_folder=None)[source]

Unzips a zipfile to the specified output folder, defaulting to the same location as the input file.

Parameters:

input_file (str) – zipfile to unzip
output_folder (str, optional) – folder to which we should unzip [input_file], defaults to unzipping to the folder where [input_file] lives

megadetector.utils.path_utils.windows_path_to_wsl_path(filename, failure_behavior='none')[source]

Converts a Windows path to a WSL path, or returns None if that’s not possible. E.g. converts:

e:abc

…to:

/mnt/e/a/b/c

Parameters:

filename (str) – filename to convert
failure_behavior (str, optional) – what to do if the path can’t be processed as a Windows path. ‘none’ to return None in this case, ‘original’ to return the original path.

Returns:

WSL equivalent to the Windows path [filename]

Return type:

str

megadetector.utils.path_utils.write_list_to_file(output_file, strings)[source]

Writes a list of strings to either a JSON file or text file, depending on extension of the given file name.

Parameters:

output_file (str) – file to write
strings (list) – list of strings to write to [output_file]

megadetector.utils.path_utils.wsl_path_to_windows_path(filename, failure_behavior='none')[source]

Converts a WSL path to a Windows path. For example, converts:

/mnt/e/a/b/c

…to:

e:abc

Parameters:

filename (str) – filename to convert
failure_behavior (str, optional) – what to do if the path can’t be processed as a WSL path. ‘none’ to return None in this case, ‘original’ to return the original path.

Returns:

Windows equivalent to the WSL path [filename]

Return type:

str

megadetector.utils.path_utils.zip_each_file_in_folder(folder_name, recursive=False, max_workers=16, use_threads=True, compress_level=9, overwrite=False, required_token=None, verbose=False, exclude_zip=True)[source]

Zips each file in [folder_name] to its own zipfile (filename.zip), optionally recursing. To zip a whole folder into a single zipfile, use zip_folder().

Parameters:

folder_name (str) – the folder within which we should zip files
recursive (bool, optional) – whether to recurse within [folder_name]
max_workers (int, optional) – number of concurrent workers, set to <= 1 to disable parallelism
use_threads (bool, optional) – whether to use threads (True) or processes (False); ignored if max_workers <= 1
compress_level (int, optional) – zip compression level between 0 and 9
overwrite (bool, optional) – whether to overwrite an existing .tar file
required_token (str, optional) – only zip files whose names contain this string
verbose (bool, optional) – enable additional debug console output
exclude_zip (bool, optional) – skip files ending in .zip

megadetector.utils.path_utils.zip_file(input_fn, output_fn=None, overwrite=False, verbose=False, compress_level=9)[source]

Zips a single file.

Parameters:

input_fn (str) – file to zip
output_fn (str, optional) – target zipfile; if this is None, we’ll use [input_fn].zip
overwrite (bool, optional) – whether to overwrite an existing target file
verbose (bool, optional) – enable existing debug console output
compress_level (int, optional) – compression level to use, between 0 and 9

Returns:

the output zipfile, whether we created it or determined that it already exists

Return type:

str

megadetector.utils.path_utils.zip_files_into_single_zipfile(input_files, output_fn, arc_name_base, overwrite=False, verbose=False, compress_level=9)[source]

Zip all the files in [input_files] into [output_fn]. Archive names are relative to arc_name_base.

Parameters:

input_files (list) – list of absolute filenames to include in the .tar file
output_fn (str) – .tar file to create
arc_name_base (str) – absolute folder from which relative paths should be determined; behavior is undefined if there are files in [input_files] that don’t live within [arc_name_base]
overwrite (bool, optional) – whether to overwrite an existing .tar file
verbose (bool, optional) – enable additional debug console output
compress_level (int, optional) – compression level to use, between 0 and 9

Returns:

the output zipfile, whether we created it or determined that it already exists

Return type:

str

megadetector.utils.path_utils.zip_folder(input_folder, output_fn=None, overwrite=False, verbose=False, compress_level=9)[source]

Recursively zip everything in [input_folder] into a single zipfile, storing files as paths relative to [input_folder].

Parameters:

input_folder (str) – folder to zip
output_fn (str, optional) – output filename; if this is None, we’ll write to [input_folder].zip
overwrite (bool, optional) – whether to overwrite an existing .tar file
verbose (bool, optional) – enable additional debug console output
compress_level (int, optional) – compression level to use, between 0 and 9

Returns:

the output zipfile, whether we created it or determined that it already exists

Return type:

str

utils.process_utils module

process_utils.py

Run something at the command line and capture the output, based on:

https://stackoverflow.com/questions/4417546/constantly-print-subprocess-output-while-process-is-running

Includes handy example code for doing this on multiple processes/threads.

megadetector.utils.process_utils.execute(cmd, encoding=None, errors=None, env=None, verbose=False)[source]

Run [cmd] (a single string) in a shell, yielding each line of output to the caller.

The “encoding”, “errors”, and “env” parameters are passed directly to subprocess.Popen().

“verbose” only impacts output about process management, it is not related to printing output from the child process.

Parameters:

cmd (str) – command to run
encoding (str, optional) – stdout encoding, see Popen() documentation
errors (str, optional) – error handling, see Popen() documentation
env (dict, optional) – environment variables, see Popen() documentation
verbose (bool, optional) – enable additional debug console output

Returns:

the command’s return code, always zero, otherwise a CalledProcessError is raised

Return type:

int

megadetector.utils.process_utils.execute_and_print(cmd, print_output=True, encoding=None, errors=None, env=None, verbose=False, catch_exceptions=True, echo_command=False)[source]

Run [cmd] (a single string) in a shell, capturing and printing output. Returns a dictionary with fields “status” and “output”.

The “encoding”, “errors”, and “env” parameters are passed directly to subprocess.Popen().

“verbose” only impacts output about process management, it is not related to printing output from the child process.

Parameters:

cmd (str) – command to run
print_output (bool, optional) – whether to print output from [cmd] (stdout is captured regardless of the value of print_output)
encoding (str, optional) – stdout encoding, see Popen() documentation
errors (str, optional) – error handling, see Popen() documentation
env (dict, optional) – environment variables, see Popen() documentation
verbose (bool, optional) – enable additional debug console output
catch_exceptions (bool, optional) – catch exceptions and include in the output, otherwise raise
echo_command (bool, optional) – print the command before executing

Returns:

a dictionary with fields “status” (the process return code) and “output” (the content of stdout)

Return type:

dict

utils.split_locations_into_train_val module

split_locations_into_train_val.py

Splits a list of location IDs into training and validation, targeting a specific train/val split for each category, but allowing some categories to be tighter or looser than others. Does nothing particularly clever, just randomly splits locations into train/val lots of times using the target val fraction, and picks the one that meets the specified constraints and minimizes weighted error, where “error” is defined as the sum of each class’s absolute divergence from the target val fraction.

megadetector.utils.split_locations_into_train_val.split_locations_into_train_val(location_to_category_counts, n_random_seeds=10000, target_val_fraction=0.15, category_to_max_allowable_error=None, category_to_error_weight=None, default_max_allowable_error=0.1, require_complete_coverage=True)[source]

Splits a list of location IDs into training and validation, targeting a specific train/val split for each category, but allowing some categories to be tighter or looser than others. Does nothing particularly clever, just randomly splits locations into train/val lots of times using the target val fraction, and picks the one that meets the specified constraints and minimizes weighted error, where “error” is defined as the sum of each class’s absolute divergence from the target val fraction.

Parameters:

location_to_category_counts (dict) –
a dict mapping location IDs to dicts, with each dict mapping a category name to a count. Any categories not present in a particular dict are assumed to have a count of zero for that location.

For example:
```
{'location-000': {'bear':4,'wolf':10},
 'location-001': {'bear':12,'elk':20}}
```
n_random_seeds (int, optional) – number of random seeds to try, always starting from zero
target_val_fraction (float, optional) – fraction of images containing each species we’d like to put in the val split
category_to_max_allowable_error (dict, optional) – a dict mapping category names to maximum allowable errors. These are hard constraints (i.e., we will error if we can’t meet them). Does not need to include all categories; categories not included will be assigned a maximum error according to [default_max_allowable_error]. If this is None, no hard constraints are applied.
category_to_error_weight (dict, optional) – a dict mapping category names to error weights. You can specify a subset of categories; categories not included here have a weight of 1.0. If None, all categories have the same weight.
default_max_allowable_error (float, optional) – the maximum allowable error for categories not present in [category_to_max_allowable_error]. Set to None (or >= 1.0) to disable hard constraints for categories not present in [category_to_max_allowable_error]
require_complete_coverage (bool, optional) – require that every category appear in both train and val

Returns:

A two-element tuple:

list of location IDs in the val split
a dict mapping category names to the fraction of images in the val split

Return type:

tuple

utils.string_utils module

string_utils.py

Miscellaneous string utilities.

class megadetector.utils.string_utils.TestStringUtils[source]

Bases: object

Tests for string_utils.py

test_human_readable_to_bytes()[source]: Test the human_readable_to_bytes function.

test_is_float()[source]: Test the is_float function.

test_remove_ansi_codes()[source]: Test the remove_ansi_codes function.

megadetector.utils.string_utils.human_readable_to_bytes(size)[source]

Given a human-readable byte string (e.g. 2G, 10GB, 30MB, 20KB), returns the number of bytes. Will return 0 if the argument has unexpected form.

https://gist.github.com/beugley/ccd69945346759eb6142272a6d69b4e0

Parameters:: size (str) – string representing a size
Returns:: the corresponding size in bytes
Return type:: int

megadetector.utils.string_utils.is_float(s)[source]

Checks whether [s] is an object (typically a string) that can be cast to a float

Parameters:: s (object) – object to evaluate
Returns:: True if s successfully casts to a float, otherwise False
Return type:: bool

megadetector.utils.string_utils.is_int(s)[source]

Checks whether [s] is an object (typically a string) that can be cast to a int

Parameters:: s (object) – object to evaluate
Returns:: True if s successfully casts to a int, otherwise False
Return type:: bool

megadetector.utils.string_utils.remove_ansi_codes(s)[source]

Removes ANSI escape codes from a string.

https://stackoverflow.com/questions/14693701/how-can-i-remove-the-ansi-escape-sequences-from-a-string-in-python#14693789

Parameters:: s (str) – the string to de-ANSI-i-fy
Returns:: A copy of [s] without ANSI codes
Return type:: str

megadetector.utils.string_utils.test_string_utils()[source]: Runs all tests in the TestStringUtils class.

utils.url_utils module

url_utils.py

Frequently-used functions for downloading, manipulating, or serving URLs

class megadetector.utils.url_utils.DownloadProgressBar[source]

Bases: object

Progress updater based on the progressbar2 package.

https://stackoverflow.com/questions/37748105/how-to-use-progressbar-module-with-urlretrieve

class megadetector.utils.url_utils.QuietHTTPRequestHandler(*args, directory=None, **kwargs)[source]

Bases: SimpleHTTPRequestHandler

SimpleHTTPRequestHandler subclass that suppresses console printouts

log_message(format, *args)[source]

Log an arbitrary message.

This is used by all other logging functions. Override it if you have specific logging wishes.

The first argument, FORMAT, is a format string for the message to be logged. If the format string contains any % escapes requiring parameters, they should be specified as subsequent arguments (it’s just like printf!).

The client ip and current date/time are prefixed to every message.

Unicode control characters are replaced with escaped hex before writing the output to stderr.

class megadetector.utils.url_utils.SingletonHTTPServer[source]

Bases: object

HTTP server that runs on a local port, serving a particular local folder. Runs as a singleton, so starting a server in a new folder closes the previous server. I use this primarily to serve MD/SpeciesNet previews from manage_local_batch, which can exceed the 260-character filename length limitation imposed by browser on Windows, so really the point here is just to remove characters from the URL.

classmethod is_running()[source]

Check whether the server is currently running.

Returns:: True if the server is running
Return type:: bool

classmethod start_server(directory, port=8000, host='localhost')[source]

Start or restart the HTTP server with a specific directory

Parameters:

directory (str) – the root folder served by the server
port (int, optional) – the port on which to create the server
host (str, optional) – the host on which to listen, typically either “localhost” (default) or “0.0.0.0”

Returns:

URL to the running host

Return type:

str

classmethod stop_server()[source]: Stop the current server (if one is running)

class megadetector.utils.url_utils.TestUrlUtils[source]

Bases: object

Tests for url_utils.py

set_up()[source]: Create a temporary directory for testing.

tear_down()[source]: Remove the temporary directory after tests and restore module temp_dir.

test_download_relative_filename()[source]: Test download_relative_filename.

test_download_url_escape_spaces()[source]: Test download_url with spaces in the URL.

test_download_url_force_download()[source]: Test the force_download parameter of download_url.

test_download_url_non_existent()[source]: Test download_url with a non-existent URL.

test_download_url_to_specified_file()[source]: Test download_url with a specified destination filename.

test_download_url_to_temp_file()[source]: Test download_url when destination_filename is None.

test_get_url_size_and_sizes()[source]: Test get_url_size and get_url_sizes functions.

test_parallel_download_urls()[source]: Test parallel_download_urls (with n_workers=1 for simplicity).

test_test_url_and_test_urls()[source]: Test test_url and test_urls functions.

megadetector.utils.url_utils.download_relative_filename(url, output_base, verbose=False)[source]

Download a URL to output_base, preserving relative path. Path is relative to the site, so:

https://abc.com/xyz/123.txt

…will get downloaded to:

output_base/xyz/123.txt

Parameters:

url (str) – the URL to download
output_base (str) – the base folder to which we should download this file
verbose (bool, optional) – enable additional debug console output

Returns:

the local destination filename

Return type:

str

megadetector.utils.url_utils.download_url(url, destination_filename=None, progress_updater=None, force_download=False, verbose=True, escape_spaces=True)[source]

Downloads a URL to a file. If no file is specified, creates a temporary file, making a best effort to avoid filename collisions.

Prints some diagnostic information and makes sure to omit SAS tokens from printouts.

Parameters:

url (str) – the URL to download
destination_filename (str, optional) – the target filename; if None, will create a file in system temp space
progress_updater (object or bool, optional) – can be “None”, “False”, “True”, or a specific callable object. If None or False, no progress updated will be displayed. If True, a default progress bar will be created.
force_download (bool, optional) – download this file even if [destination_filename] exists.
verbose (bool, optional) – enable additional debug console output
escape_spaces (bool, optional) – replace ‘ ‘ with ‘%20’

Returns:

the filename to which [url] was downloaded, the same as [destination_filename] if [destination_filename] was not None

Return type:

str

megadetector.utils.url_utils.get_url_size(url, verbose=False, timeout=None)[source]

Get the size of the file pointed to by a URL, based on the Content-Length property. If the URL is not available, or the Content-Length property is not available, or the content-Length property is not an integer, returns None.

Parameters:

url (str) – the url to test
verbose (bool, optional) – enable additional debug output
timeout (int, optional) – timeout in seconds to wait before considering this access attempt to be a failure; see requests.head() for precise documentation

Returns:

the file size in bytes, or None if it can’t be retrieved

Return type:

int

megadetector.utils.url_utils.get_url_sizes(urls, n_workers=1, pool_type='thread', timeout=None, verbose=False)[source]

Retrieve file sizes for the URLs specified by [urls]. Returns None for any URLs that we can’t access, or URLs for which the Content-Length property is not set.

Parameters:

urls (list) – list of URLs for which we should retrieve sizes
n_workers (int, optional) – number of concurrent workers, set to <=1 to disable parallelization
pool_type (str, optional) – worker type to use; should be ‘thread’ or ‘process’
timeout (int, optional) – timeout in seconds to wait before considering this access attempt to be a failure; see requests.head() for precise documentation
verbose (bool, optional) – print additional debug information

Returns:

maps urls to file sizes, which will be None for URLs for which we were unable to retrieve a valid size.

Return type:

dict

megadetector.utils.url_utils.parallel_download_urls(url_to_target_file, verbose=False, overwrite=False, n_workers=20, pool_type='thread')[source]

Downloads a list of URLs to local files.

Catches exceptions and reports them in the returned “results” array.

Parameters:

url_to_target_file (dict) – a dict mapping URLs to local filenames.
verbose (bool, optional) – enable additional debug console output
overwrite (bool, optional) – whether to overwrite existing local files
n_workers (int, optional) – number of concurrent workers, set to <=1 to disable parallelization
pool_type (str, optional) – worker type to use; should be ‘thread’ or ‘process’

Returns:

list of dicts with keys:

’url’: the url this item refers to
’status’: ‘skipped’, ‘success’, or a string starting with ‘error’
’target_file’: the local filename to which we downloaded (or tried to download) this URL

Return type:

list

megadetector.utils.url_utils.test_url(url, error_on_failure=True, timeout=None)[source]

Tests the availability of [url], returning an http status code.

Parameters:

url (str) – URL to test
error_on_failure (bool, optional) – whether to error (vs. just returning an error code) if accessing this URL fails
timeout (int, optional) – timeout in seconds to wait before considering this access attempt to be a failure; see requests.head() for precise documentation

Returns:

http status code (200 for success)

Return type:

int

megadetector.utils.url_utils.test_urls(urls, error_on_failure=True, n_workers=1, pool_type='thread', timeout=None, verbose=False)[source]

Verify that URLs are available (i.e., returns status 200). By default, errors if any URL is unavailable.

Parameters:

urls (list) – list of URLs to test
error_on_failure (bool, optional) – whether to error (vs. just returning an error code) if accessing this URL fails
n_workers (int, optional) – number of concurrent workers, set to <=1 to disable parallelization
pool_type (str, optional) – worker type to use; should be ‘thread’ or ‘process’
timeout (int, optional) – timeout in seconds to wait before considering this access attempt to be a failure; see requests.head() for precise documentation
verbose (bool, optional) – enable additional debug output

Returns:

a list of http status codes, the same length and order as [urls]

Return type:

list

utils.gpu_test module

gpu_test.py

Simple script to verify CUDA availability, used to verify a CUDA environment for TF or PyTorch

megadetector.utils.gpu_test.directml_test()[source]

Check whether DirectML support is available.

Returns:: Whether directML support is available.
Return type:: bool

megadetector.utils.gpu_test.tf_test()[source]

Print diagnostic information about TF/CUDA status.

Returns:: The number of CUDA devices reported by TensorFlow.
Return type:: int

megadetector.utils.gpu_test.torch_test()[source]

Print diagnostic information about Torch/CUDA status, including Torch/CUDA versions and all available CUDA device names.

Returns:: The number of CUDA devices reported by PyTorch.
Return type:: int

utils.wi_taxonomy_utils module

wi_taxonomy_utils.py

Functions related to working with the SpeciesNet / Wildlife Insights taxonomy.

class megadetector.utils.wi_taxonomy_utils.TaxonomyHandler(taxonomy_file, geofencing_file, country_code_file)[source]

Bases: object

Handler for taxonomy mapping and geofencing operations.

binomial_name_to_taxonomy_info: Maps a binomial name (one, two, or three ws-delimited tokens) to the same dict described above.

common_name_to_taxonomy_info: Maps a common name to the same dict described above

country_code_to_country: Maps upper-case country codes to lower-case country names

country_to_country_code: Maps lower-case country names to upper-case country codes

export_geofence_data_to_csv(csv_fn=None, include_common_names=True)[source]

Converts the geofence .json representation into an equivalent .csv representation, with one taxon per row and one region per column. Empty values indicate non-allowed combinations, positive numbers indicate allowed combinations. Negative values are reserved for specific non-allowed combinations.

Parameters:

csv_fn (str) – output .csv file
include_common_names (bool, optional) – include a column for common names

Returns:

the pandas representation of the csv output file

Return type:

dataframe

generate_csv_rows_for_species(species_string, allow_countries=None, block_countries=None, allow_states=None, block_states=None)[source]

Generate rows in the format expected by geofence_fixes.csv, representing a list of allow and/or block rules for the specified species and countries/states. Does not check that the rules make sense; e.g. nothing will stop you in this function from both allowing and blocking a country.

Parameters:

species_string (str) – five-token string in semicolon-delimited WI taxonomy format
allow_countries (list or str, optional) – three-letter country codes, list of country codes, or comma-separated list of country codes to allow
block_countries (list or str, optional) – three-letter country codes, list of country codes, or comma-separated list of country codes to block
allow_states (list or str, optional) – two-letter state codes, list of state codes, or comma-separated list of state codes to allow
block_states (list or str, optional) – two-letter state code, list of state codes, or comma-separated list of state codes to block

Returns:

lines ready to be pasted into geofence_fixes.csv

Return type:

list of str

generate_csv_rows_to_block_all_countries_except(species_string, block_except_list)[source]

Generate rows in the format expected by geofence_fixes.csv, representing a list of allow and block rules to block all countries currently allowed for this species except [block_except_list], and add allow rules for these countries.

Parameters:

species_string (str) – five-token taxonomy string
block_except_list (list) – list of country codes not to block

Returns:

strings compatible with geofence_fixes.csv

Return type:

list of str

species_allowed_in_country(species, country, state=None, return_status=False)[source]

Determines whether [species] is allowed in [country], according to already-initialized geofencing rules.

Parameters:

species (str) – can be a common name, a binomial name, or a species string
country (str) – country name or three-letter code
state (str, optional) – two-letter US state code
return_status (bool, optional) – by default, this function returns a bool; if you want to know why [species] is allowed/not allowed, settings return_status to True will return additional information.

Returns:

typically returns True if [species] is allowed in [country], else False. Returns a more detailed string if return_status is set.

Return type:

bool or str

species_string_to_canonical_species_string(species)[source]

Convert a string that may be a 5-token species string, a binomial name, or a common name into a 5-token species string, using taxonomic lookup.

Parameters:: species (str) – 5-token species string, binomial name, or common name
Returns:: common name
Return type:: str
Raises:: ValueError – if [species] is not in our dictionary

species_string_to_taxonomy_info(species)[source]

Convert a string that may be a 5-token species string, a binomial name, or a common name into a taxonomic info dictionary, using taxonomic lookup.

Parameters:: species (str) – 5-token species string, binomial name, or common name
Returns:: taxonomy information
Return type:: dict
Raises:: ValueError – if [species] is not in our dictionary

taxonomy_string_to_geofencing_rules: Dict mapping 5-token semicolon-delimited taxonomy strings to geofencing rules

taxonomy_string_to_taxonomy_info: Maps a taxonomy string (e.g. mammalia;cetartiodactyla;cervidae;odocoileus;virginianus) to a dict with keys taxon_id, common_name, kingdom, phylum, class, order, family, genus, species

class megadetector.utils.wi_taxonomy_utils.TestWITaxonomyUtils[source]

Bases: object

Tests for wi_taxonomy_utils.py

test_get_common_name_from_prediction_string()[source]: Test driver for get_common_name_from_prediction_string(…)

megadetector.utils.wi_taxonomy_utils.clean_taxonomy_string(s, truncate_multiple_description_strings=True)[source]

If [s] is a seven-token prediction string, trim the GUID and common name to produce a “clean” taxonomy string. Else if [s] is a five-token string, return it. Else error.

Parameters:

s (str) – the seven- or five-token taxonomy/prediction string to clean
truncate_multiple_description_strings (bool, optional) – we use | to delimit multiple descriptions in the same string; if this is True, clean and return just the first, else error.

Returns:

the five-token taxonomy string

Return type:

str

megadetector.utils.wi_taxonomy_utils.find_geofence_adjustments(ensemble_json_file, use_latin_names=False)[source]

Count the number of instances of each unique change made by the geofence.

Parameters:

ensemble_json_file (str) – SpeciesNet-formatted .json file produced by the full ensemble.
use_latin_names (bool, optional) – return a mapping using binomial names rather than common names.

Returns:

maps strings that look like “puma,felidae family” to integers,: where that entry would indicate the number of times that “puma” was predicted, but mapped to family level by the geofence. Sorted in descending order by count.

Return type:

dict

megadetector.utils.wi_taxonomy_utils.generate_geofence_adjustment_html_summary(rollup_pair_to_count, min_count=10)[source]

Given a list of geofence rollups, likely generated by find_geofence_adjustments, generate an HTML summary of the changes made by geofencing. The resulting HTML is wrapped in <div>, but not, for example, in <html> or <body>.

Parameters:

rollup_pair_to_count (dict) – list of changes made by geofencing, see find_geofence_adjustments for details
min_count (int, optional) – minimum number of changes a pair needs in order to be included in the report.

megadetector.utils.wi_taxonomy_utils.generate_instances_json_from_folder(folder, country=None, admin1_region=None, lat=None, lon=None, output_file=None, filename_replacements=None, tokens_to_ignore=['$RECYCLE.BIN'])[source]

Generate an instances.json record that contains all images in [folder], optionally including location information, in a format suitable for run_model.py. Optionally writes the results to [output_file].

Parameters:

folder (str) – the folder to recursively search for images
country (str, optional) – a three-letter country code
admin1_region (str, optional) – an administrative region code, typically a two-letter US state code
lat (float, optional) – latitude to associate with all images
lon (float, optional) – longitude to associate with all images
output_file (str, optional) – .json file to which we should write instance records
filename_replacements (dict, optional) – str –> str dict indicating filename substrings that should be replaced with other strings. Replacement occurs after converting backslashes to forward slashes.
tokens_to_ignore (list, optional) – ignore any images with these tokens in their names, typically used to avoid $RECYCLE.BIN. Can be None.

Returns:

dict with at least the field “instances”

Return type:

dict

megadetector.utils.wi_taxonomy_utils.generate_md_results_from_predictions_json(predictions_json_file, md_results_file=None, base_folder=None, max_decimals=5, convert_human_to_person=True, convert_homo_species_to_human=True, verbose=False)[source]

Generate an MD-formatted .json file from a predictions.json file, generated by the SpeciesNet ensemble. Typically, MD results files use relative paths, and predictions.json files use absolute paths, so this function optionally removes the leading string [base_folder] from all file names.

Uses the classification from the “prediction” field if it’s available, otherwise uses the “classifications” field.

When using the “prediction” field, records the top class in the “classifications” field to a field in each image called “top_classification_common_name”. This is often different from the value of the “prediction” field.

speciesnet_to_md.py is a command-line driver for this function.

Parameters:

predictions_json_file (str) – path to a predictions.json file, or a dict
md_results_file (str, optional) – path to which we should write an MD-formatted .json file
base_folder (str, optional) – leading string to remove from each path in the predictions.json file. Typically the folder on which you ran run_model.py. If base_folder does not end in a slash, but filenames start with base_folder + ‘/’, this function assumes that you meant to add the slash.
max_decimals (int, optional) – number of decimal places to which we should round all values
convert_human_to_person (bool, optional) – WI predictions.json files sometimes use the detection category “human”; MD files usually use “person”. If True, this function will change the detection category name “human” to “person”.
convert_homo_species_to_human (bool, optional) – the ensemble often rolls human predictions up to “homo species”, which isn’t wrong, but looks odd. This forces these back to “homo sapiens”.
verbose (bool, optional) – enable additional debug output

Returns:

results in MD format

Return type:

dict

megadetector.utils.wi_taxonomy_utils.generate_predictions_json_from_md_results(md_results_file, predictions_json_file, base_folder=None)[source]

Generate a predictions.json file from the MD-formatted .json file [md_results_file]. Typically, MD results files use relative paths, and predictions.json files use absolute paths, so this function optionally prepends [base_folder]. Does not handle classification results in MD format, since this is intended to prepare data for passing through the WI classifier.

md_to_wi.py is a command-line driver for this function.

Parameters:

md_results_file (str) – path to an MD-formatted .json file
predictions_json_file (str) – path to which we should write a predictions.json file
base_folder (str, optional) – folder name to prepend to each path in md_results_file, to convert relative paths to absolute paths. If [base_folder] is non-empty and doesn’t end in a slash, a slash will be added.

megadetector.utils.wi_taxonomy_utils.generate_whole_image_detections_for_classifications(classifications_json_file, detections_json_file, ensemble_json_file=None, ignore_blank_classifications=True, verbose=True)[source]

Given a set of classification results in SpeciesNet format that were likely run on already-cropped images, generate a file of [fake] detections in SpeciesNet format in which each image is covered in a single whole-image detection.

Parameters:

classifications_json_file (str) – SpeciesNet-formatted file containing classifications
detections_json_file (str) – SpeciesNet-formatted file to write with detections
ensemble_json_file (str, optional) – SpeciesNet-formatted file to write with detections and classfications
ignore_blank_classifications (bool, optional) – use non-top classifications when the top classification is “blank” or “no CV result”
verbose (bool, optional) – enable additional debug output

Returns:

the contents of [detections_json_file]

Return type:

dict

megadetector.utils.wi_taxonomy_utils.get_common_name_from_prediction_string(s)[source]

Extract the common name from the seven-token prediction string [s], or generate a reasonable one (e.g. “vulpes genus”). Prediction strings look like:

‘90d950db-2106-4bd9-a4c1-777604c3eada;mammalia;rodentia;;;;rodent’

Parameters:: s (str) – the string for which we should extract a common name
Returns:: the extracted common name
Return type:: str

megadetector.utils.wi_taxonomy_utils.get_kingdom(prediction_string)[source]

Return the kingdom field from a WI prediction string

Parameters:: prediction_string (str) – a string in the semicolon-delimited prediction string format
Returns:: the kingdom field from the input string
Return type:: str

megadetector.utils.wi_taxonomy_utils.is_animal_classification(prediction_string)[source]

Determines whether the input string represents an animal classification, which excludes, e.g., humans, blanks, vehicles, unknowns

Parameters:: prediction_string (str) – a string in the semicolon-delimited prediction string format
Returns:: whether this string corresponds to an animal category
Return type:: bool

megadetector.utils.wi_taxonomy_utils.is_human_classification(prediction_string)[source]

Determines whether the input string represents a human classification, which includes a variety of common names (hiker, person, etc.)

Parameters:: prediction_string (str) – a string in the semicolon-delimited prediction string format
Returns:: whether this string corresponds to a human category
Return type:: bool

megadetector.utils.wi_taxonomy_utils.is_taxonomic_prediction_string(s)[source]

Determines whether [s] is a classification string that has taxonomic properties; this does not include, e.g., blanks/vehicles/no cv result. It also excludes “animal”.

Parameters:: s (str) – a five- or seven-token taxonomic string
Returns:: whether [s] is a taxonomic category
Return type:: bool

megadetector.utils.wi_taxonomy_utils.is_valid_prediction_string(s)[source]

Determine whether [s] is a valid WI prediction string. Prediction strings look like:

‘90d950db-2106-4bd9-a4c1-777604c3eada;mammalia;rodentia;;;;rodent’

Parameters:: s (str) – the string to be tested for validity
Returns:: True if this looks more or less like a WI prediction string
Return type:: bool

megadetector.utils.wi_taxonomy_utils.is_valid_taxonomy_string(s)[source]

Determine whether [s] is a valid 5-token WI taxonomy string. Taxonomy strings look like:

‘mammalia;rodentia;;;;rodent’ ‘mammalia;chordata;canidae;canis;lupus dingo’

Parameters:: s (str) – the string to be tested for validity
Returns:: True if this looks more or less like a WI taxonomy string
Return type:: bool

megadetector.utils.wi_taxonomy_utils.is_vehicle_classification(prediction_string)[source]

Determines whether the input string represents a vehicle classification.

Parameters:: prediction_string (str) – a string in the semicolon-delimited prediction string format
Returns:: whether this string corresponds to the vehicle category
Return type:: bool

megadetector.utils.wi_taxonomy_utils.load_md_or_speciesnet_file(fn, verbose=True)[source]

Load a .json file that may be in MD or SpeciesNet format. Typically used so SpeciesNet files can be supplied to functions originally written to support MD format.

Parameters:

fn (str) – a .json file in predictions.json (MD or SpeciesNet) format
verbose (bool, optional) – enable additional debug output

Returns:

the contents of [fn], in MD format.

Return type:

dict

megadetector.utils.wi_taxonomy_utils.merge_prediction_json_files(input_prediction_files, output_prediction_file)[source]

Merge all predictions.json files in [files] into a single .json file.

Parameters:

input_prediction_files (list) – list of predictions.json files to merge
output_prediction_file (str) – output .json file

megadetector.utils.wi_taxonomy_utils.split_instances_into_n_batches(instances_json, n_batches, output_files=None)[source]

Given an instances.json file, split it into batches of equal size.

Parameters:

instances_json (str) – input .json file in
n_batches (int) – number of new files to generate
output_files (list, optional) – output .json files for each batch. If supplied, should have length [n_batches]. If not supplied, filenames will be generated based on [instances_json].

Returns:

list of output files that were written; identical to [output_files] if it was supplied as input.

Return type:

list

megadetector.utils.wi_taxonomy_utils.taxonomy_info_to_taxonomy_string(taxonomy_info, include_taxon_id_and_common_name=False)[source]

Convert a taxonomy record in dict format to a five- or seven-token semicolon-delimited string

Parameters:

taxonomy_info (dict) – dict in the format stored in, e.g., taxonomy_string_to_taxonomy_info
include_taxon_id_and_common_name (bool, optional) – by default, this function returns a five-token string of latin names; if this argument is True, it includes the leading (GUID) and trailing (common name) tokens

Returns:

string in the format used as keys in, e.g., taxonomy_string_to_taxonomy_info

Return type:

str

megadetector.utils.wi_taxonomy_utils.taxonomy_level_index(s)[source]

Returns the taxonomy level up to which [s] is defined (0 for non-taxnomic, 1 for kingdom, 2 for phylum, etc. Empty strings and non-taxonomic strings are treated as level 0. 1 and 2 will never be returned; “animal” doesn’t look like other taxonomic strings, so here we treat it as non-taxonomic.

Parameters:: s (str) – 5-token or 7-token taxonomy string
Returns:: taxonomy level
Return type:: int

megadetector.utils.wi_taxonomy_utils.taxonomy_level_string_to_index(s)[source]

Maps strings (‘kingdom’, ‘species’, etc.) to level indices.

Parameters:: s (str) – taxonomy level string
Returns:: taxonomy level index
Return type:: int

megadetector.utils.wi_taxonomy_utils.taxonomy_level_to_string(k)[source]

Maps taxonomy level indices (0 for kindgom, 1 for phylum, etc.) to strings.

Parameters:: k (int) – taxonomy level index
Returns:: taxonomy level string
Return type:: str

megadetector.utils.wi_taxonomy_utils.test_wi_taxonomy_utils()[source]: Module-level test entry point.

megadetector.utils.wi_taxonomy_utils.validate_predictions_file(fn, instances=None, verbose=True)[source]

Validate the predictions.json file [fn].

Parameters:

fn (str) – a .json file in predictions.json (SpeciesNet) format
instances (str or list, optional) – a folder, instances.json file, or dict loaded from an instances.json file. If supplied, this function will verify that [fn] contains the same number of images as [instances].
verbose (bool, optional) – enable additional debug output

Returns:

the contents of [fn]

Return type:

dict

utils.wi_platform_utils module

wi_platform_utils.py

Utility functions for working with the Wildlife Insights platform, specifically:

Retrieving images based on .csv downloads
Pushing results to the ProcessCVResponse() API (requires an API key)

megadetector.utils.wi_platform_utils.find_images_in_identify_tab(download_folder_with_identify, download_folder_excluding_identify)[source]

Based on extracted download packages with and without the “exclude images in ‘identify’ tab checkbox” checked, figure out which images are in the identify tab. Returns a list of dicts (one per image).

Parameters:

download_folder_with_identify (str) – the folder containing the download bundle that includes images from the “identify” tab
download_folder_excluding_identify (str) – the folder containing the download bundle that excludes images from the “identify” tab

Returns:

list of image records that are present in the identify tab

Return type:

list of dict

megadetector.utils.wi_platform_utils.generate_blank_prediction_payload(data_file_id, project_id, blank_confidence=0.9, model_version='3.1.2', prediction_source='manual_update')[source]

Generate a payload that will set a single image to the blank classification, with no detections. Suitable for upload via push_results_for_images.

Parameters:

data_file_id (str) – unique identifier for this image used in the WI DB
project_id (int) – WI project ID
blank_confidence (float, optional) – confidence value to associate with this prediction
model_version (str, optional) – model version string to include in the payload
prediction_source (str, optional) – prediction source string to include in the payload

Returns:

dictionary suitable for uploading via push_results_for_images

Return type:

dict

megadetector.utils.wi_platform_utils.generate_no_cv_result_payload(data_file_id, project_id, no_cv_confidence=0.9, model_version='3.1.2', prediction_source='manual_update')[source]

Generate a payload that will set a single image to the blank classification, with no detections. Suitable for uploading via push_results_for_images.

Parameters:

data_file_id (str) – unique identifier for this image used in the WI DB
project_id (int) – WI project ID
no_cv_confidence (float, optional) – confidence value to associate with this prediction
model_version (str, optional) – model version string to include in the payload
prediction_source (str, optional) – prediction source string to include in the payload

Returns:

dictionary suitable for uploading via push_results_for_images

Return type:

dict

megadetector.utils.wi_platform_utils.generate_payload_for_prediction_string(data_file_id, project_id, prediction_string, prediction_confidence=0.8, detections=None, model_version='3.1.2', prediction_source='manual_update')[source]

Generate a payload that will set a single image to a particular prediction, optionally including detections. Suitable for uploading via push_results_for_images.

Parameters:

data_file_id (str) – unique identifier for this image used in the WI DB
project_id (int) – WI project ID
prediction_string (str) – WI-formatted prediction string to include in the payload
prediction_confidence (float, optional) – confidence value to associate with this prediction
detections (list, optional) – list of MD-formatted detection dicts, with fields [‘category’] and ‘conf’
model_version (str, optional) – model version string to include in the payload
prediction_source (str, optional) – prediction source string to include in the payload

Returns:

dictionary suitable for uploading via push_results_for_images

Return type:

dict

megadetector.utils.wi_platform_utils.generate_payload_with_replacement_detections(wi_result, detections, prediction_score=0.9, model_version='3.1.2', prediction_source='manual_update')[source]

Generate a payload for a single image that keeps the classifications from [wi_result], but replaces the detections with the MD-formatted list [detections].

Parameters:

wi_result (dict) – dict representing a WI prediction result, with at least the fields in the constant wi_result_fields
detections (list) – list of WI-formatted detection dicts (with fields [‘conf’] and [‘category’])
prediction_score (float, optional) – confidence value to use for the combined prediction
model_version (str, optional) – model version string to include in the payload
prediction_source (str, optional) – prediction source string to include in the payload

Returns:

dictionary suitable for uploading via push_results_for_images

Return type:

dict

megadetector.utils.wi_platform_utils.parallel_push_results_for_images(payloads, headers, url='https://placeholder', verbose=False, pool_type='thread', n_workers=10)[source]

Push results for the list of payloads in [payloads] to the process_cv_response API, parallelized over multiple workers.

Parameters:

payloads (list of dict) – payloads to upload to the API
headers (dict) – authorization headers, see prepare_data_update_auth_headers
url (str, optional) – API URL
verbose (bool, optional) – enable additional debug output
pool_type (str, optional) – ‘thread’ or ‘process’
n_workers (int, optional) – number of parallel workers

Returns:

list of http response codes, one per payload

Return type:

list of int

megadetector.utils.wi_platform_utils.prepare_data_update_auth_headers(auth_token_file)[source]

Read the authorization token from a text file and prepare http headers.

Parameters:

auth_token_file (str) – a single-line text file containing a write-enabled
token. (API)

Returns:

http headers, with fields ‘Authorization’ and ‘Content-Type’

Return type:

dict

megadetector.utils.wi_platform_utils.push_results_for_images(payload, headers, url='https://placeholder', verbose=False)[source]

Push results for one or more images represented in [payload] to the process_cv_response API, to write to the WI DB.

Parameters:

payload (dict) – payload to upload to the API
headers (dict) – authorization headers, see prepare_data_update_auth_headers
url (str, optional) – API URL
verbose (bool, optional) – enable additional debug output

Returns:

response status code

Return type:

int

megadetector.utils.wi_platform_utils.read_images_from_download_bundle(download_folder)[source]

Reads all images.csv files from [download_folder], returns a dict mapping image IDs to a list of dicts that describe each image. It’s a list of dicts rather than a single dict because images may appear more than once, typically indicating multiple species.

Parameters:

download_folder (str) – a folder containing one or more images.csv files, typically representing a Wildlife Insights download bundle. If this is a single .csv file, reads just that file.

Returns:

Maps image GUIDs to dicts with at least the following fields:

project_id (int)
deployment_id (str)
image_id (str, should match the key)
filename (str, the filename without path at the time of upload)
location (str, starting with gs://)

May also contain classification fields: wi_taxon_id (str), species, etc. Returns None if no image .csv files are available.

Return type:

dict

megadetector.utils.wi_platform_utils.read_sequences_from_download_bundle(download_folder)[source]

Reads all sequences.csv files from [download_folder], returns a dict mapping sequence_id values to a list of dicts that describe each image. It’s a list of dicts rather than a single dict because sequences may appear more than once, typically indicating multiple species.

Parameters:

download_folder (str) – a folder containing one or more sequences.csv files, typically representing a Wildlife Insights download bundle. If this is a single .csv file, reads just that file.

Returns:

Maps string-formatted sequence IDs to dicts with at least the following fields:

project_id (int)
deployment_id (str)

May also contain classification fields: wi_taxon_id (str), species, etc. Returns None if no sequence .csv files are available.

Return type:

dict

megadetector.utils.wi_platform_utils.record_is_unidentified(record)[source]

A record is considered “unidentified” if the “identified by” field is either NaN or “computer vision”

Parameters:: record (dict) – dict representing a WI result loaded from a .csv file, with at least the field “identified_by”
Returns:: True if the “identified_by” field is either NaN or a string indicating that this record has not yet been human-reviewed.
Return type:: bool

megadetector.utils.wi_platform_utils.record_lists_are_identical(records_0, records_1, verbose=False)[source]

Takes two lists of records in the form returned by read_images_from_download_bundle and determines whether they are the same.

Parameters:

records_0 (list of dict) – the first list of records to compare
records_1 (list of dict) – the second list of records to compare
verbose (bool, optional) – enable additional debug output

Returns:

True if the two lists are identical

Return type:

bool

megadetector.utils.wi_platform_utils.url_to_relative_path(url, image_flattening='deployment')[source]

Convert a WI gs:// URL to a relative path.

Parameters:

url (str) – the URL to convert to a relative path
image_flattening (str, optional) – if ‘none’, relative paths will be stored as the entire URL for each image, other than gs://. Can be ‘guid’ (just store [GUID].JPG) or ‘deployment’ (store as [deployment]/[GUID].JPG).

Returns:

converted path

Return type:

str

megadetector.utils.wi_platform_utils.validate_payload(payload)[source]

Verifies that the dict [payload] is compatible with the ProcessCVResponse() API. Throws an error if [payload] is invalid.

Parameters:: payload (dict) – payload in the format expected by push_results_for_images.
Returns:: successful validation; this is just future-proofing, currently never returns False
Return type:: bool

megadetector.utils.wi_platform_utils.wi_result_to_prediction_string(r)[source]

Convert the dict [r] - typically loaded from a row in a downloaded .csv file - to a valid prediction string, e.g.:

1f689929-883d-4dae-958c-3d57ab5b6c16;;;;;;animal 90d950db-2106-4bd9-a4c1-777604c3eada;mammalia;rodentia;;;;rodent

Parameters:: r (dict) – dict containing WI prediction information, with at least the fields specified in wi_result_fields.
Returns:: the result in [r], as a semicolon-delimited prediction string
Return type:: str

megadetector.utils.wi_platform_utils.write_download_commands(image_records, download_dir_base, force_download=False, n_download_workers=25, download_command_file_base=None, image_flattening='deployment')[source]

Given a list of dicts with at least the field ‘location’ (a gs:// URL), prepare a set of “gcloud storage” commands to download images, and write those to a series of .sh scripts, along with one .sh script that runs all the others and blocks.

gcloud commands will use relative paths.

Parameters:

image_records (list of dict) – list of dicts with at least the field ‘location’. Can also be a dict whose values are lists of record dicts.
download_dir_base (str) – local destination folder
force_download (bool, optional) – include gs commands even if the target file exists
n_download_workers (int, optional) – number of scripts to write (that’s our hacky way of controlling parallelization)
download_command_file_base (str, optional) – path of the .sh script we should write, defaults to “download_wi_images.sh” in the destination folder. Individual worker scripts will have a number added, e.g. download_wi_images_00.sh.
image_flattening (str, optional) – if ‘none’, relative paths will be preserved representing the entire URL for each image. Can be ‘guid’ (just download to [GUID].JPG) or ‘deployment’ (download to [deployment]/[GUID].JPG).

megadetector.utils.wi_platform_utils.write_prefix_download_command(image_records, download_dir_base, force_download=False, download_command_file=None)[source]

Write a .sh script to download all images (using gcloud) from the longest common URL prefix in the images represented in [image_records].

Parameters:

image_records (list of dict) – list of dicts with at least the field ‘location’. Can also be a dict whose values are lists of record dicts.
download_dir_base (str) – local destination folder
force_download (bool, optional) – overwrite existing files
download_command_file (str, optional) – path of the .sh script we should write, defaults to “download_wi_images_with_prefix.sh” in the destination folder.

utils.write_html_image_list module

write_html_image_list.py

Given a list of image file names, writes an HTML file that shows all those images, with optional one-line headers above each.

Each “filename” can also be a dict with elements ‘filename’,’title’, ‘imageStyle’,’textStyle’, ‘linkTarget’

megadetector.utils.write_html_image_list.write_html_image_list(filename=None, images=None, options=None)[source]

Given a list of image file names, writes an HTML file that shows all those images, with optional one-line headers above each.

Parameters:

filename (str, optional) – the .html output file; if None, just returns a valid options dict
images (list, optional) –
the images to write to the .html file; if None, just returns a valid options dict. This can be a flat list of image filenames, or this can be a list of dictionaries with one or more of the following fields:
- filename (image filename) (required, all other fields are optional)
- imageStyle (css style for this image)
- textStyle (css style for the title associated with this image)
- title (text label for this image)
- linkTarget (URL to which this image should link on click)
options (dict, optional) –
a dict with one or more of the following fields:
- f_html (file pointer to write to, used for splitting write operations over multiple calls)
- pageTitle (HTML page title)
- headerHtml (html text to include before the image list)
- subPageHeaderHtml (html text to include before the images when images are broken into pages)
- trailerHtml (html text to include after the image list)
- defaultImageStyle (default css style for images)
- defaultTextStyle (default css style for image titles)
- maxFiguresPerHtmlFile (max figures for a single HTML file; overflow will be handled by creating multiple files and a TOC with links)
- urlEncodeFilenames (default True, e.g. ‘#’ will be replaced by ‘%23’)
- urlEncodeLinkTargets (default True, e.g. ‘#’ will be replaced by ‘%23’)

utils.extract_frames_from_video module

extract_frames_from_video.py

Extracts frames from a source video or folder of videos and writes those frames to jpeg files. For single videos, writes frame images to the destination folder. For folders of videos, creates subfolders in the destination folder (one per video) and writes frame images to those subfolders.

class megadetector.utils.extract_frames_from_video.FrameExtractionOptions[source]

Bases: object

Parameters controlling the behavior of extract_frames().

detector_output_file: Path to MegaDetector .json output file. When specified, extracts frames referenced in this file. Mutually exclusive with frame_sample. [source] must be a folder when this is specified.

frame_sample: Sample every Nth frame starting from the first frame; if this is None or 1, every frame is extracted. If this is a negative value, it’s interpreted as a sampling rate in seconds, which is rounded to the nearest frame sampling rate. Mutually exclusive with detector_output_file.

max_width: Maximum width for extracted frames (defaults to None)

n_workers: Number of workers to use for parallel processing

parallelize_with_threads: Use threads for parallel processing

quality: JPEG quality for extracted frames

verbose: Enable additional debug output

megadetector.utils.extract_frames_from_video.extract_frames(source, destination, options=None)[source]

Extracts frames from a video or folder of videos.

Parameters:

source (str) – path to a single video file or folder of videos
destination (str) – folder to write frame images to (will be created if it doesn’t exist)
options (FrameExtractionOptions, optional) – parameters controlling frame extraction

Returns:

for single videos, returns (list of frame filenames, frame rate).: for folders, returns (list of lists of frame filenames, list of frame rates, list of video filenames)

Return type:

tuple

extract_frames_from_video - CLI interface

Extract frames from videos and save as JPEG files

extract_frames_from_video [-h] [--n_workers N_WORKERS] [--parallelize_with_threads]
                          [--quality QUALITY] [--max_width MAX_WIDTH] [--verbose]
                          [--frame_sample FRAME_SAMPLE | --detector_output_file DETECTOR_OUTPUT_FILE]
                          source destination

extract_frames_from_video positional arguments

source - Path to a single video file or folder containing videos
destination - Output folder for extracted frames (will be created if it does not exist)

extract_frames_from_video options

-h, --help - show this help message and exit
--n_workers N_WORKERS - Number of workers to use for parallel processing (default: %(default)s)
--parallelize_with_threads - Use threads for parallel processing (default: use processes)
--quality QUALITY - JPEG quality for extracted frames (default: %(default)s)
--max_width MAX_WIDTH - Maximum width for extracted frames (default: no resizing)
--verbose - Enable additional debug output
--frame_sample FRAME_SAMPLE - Sample every Nth frame starting from the first frame; if this is None or 1, every frame is extracted. If this is a negative value, it’s interpreted as a sampling rate in seconds, which is rounded to the nearest frame sampling rate
--detector_output_file DETECTOR_OUTPUT_FILE - Path to MegaDetector .json output file. When specified, extracts frames referenced in this file. Source must be a folder when this is specified.