data_management.databases package

This package contains tools for working with .json files in COCO Camera Traps format.

Submodules

data_management.databases.combine_coco_camera_traps_files module

combine_coco_camera_traps_files.py

Merges two or more .json files in COCO Camera Traps format, optionally writing the results to another .json file.

Concatenates image lists, erroring if images are not unique.
Errors on unrecognized fields.
Checks compatibility in info structs, within reason.

Example command-line invocation

combine_coco_camera_traps_files input1.json input2.json … inputN.json output.json

megadetector.data_management.databases.combine_coco_camera_traps_files.combine_cct_dictionaries(input_dicts, require_uniqueness=True)[source]

Merges the list of COCO Camera Traps dictionaries [input_dicts]. See module header comment for details on merge rules.

Parameters:

input_dicts (list of dict) – list of CCT dicts
require_uniqueness (bool, optional) – whether to require that the images in each input_dict be unique

Returns:

the merged COCO-formatted .json dict

Return type:

dict

megadetector.data_management.databases.combine_coco_camera_traps_files.combine_cct_files(input_files, output_file=None, require_uniqueness=True, filename_prefixes=None)[source]

Merges the list of COCO Camera Traps files [input_files] into a single dictionary, optionally writing the result to [output_file].

Parameters:

input_files (list) – paths to CCT .json files
output_file (str, optional) – path to write merged .json file
require_uniqueness (bool, optional) – whether to require that the images in each input_dict be unique
filename_prefixes (dict, optional) – dict mapping input filenames to strings that should be prepended to image filenames from that source

Returns:

the merged COCO-formatted .json dict

Return type:

dict

combine_coco_camera_traps_files - CLI interface

combine_coco_camera_traps_files [-h] input_paths [input_paths ...] output_path

combine_coco_camera_traps_files positional arguments

input_paths - List of input .json files
output_path - Output .json file

combine_coco_camera_traps_files options

-h, --help - show this help message and exit

data_management.databases.integrity_check_json_db module

integrity_check_json_db.py

Does some integrity-checking and computes basic statistics on a COCO Camera Traps .json file, specifically:

Verifies that required fields are present and have the right types
Verifies that annotations refer to valid images
Verifies that annotations refer to valid categories
Verifies that image, category, and annotation IDs are unique
Optionally checks file existence
Finds un-annotated images
Finds unused categories
Prints a list of categories sorted by count

class megadetector.data_management.databases.integrity_check_json_db.IntegrityCheckOptions[source]

Bases: object

Options for integrity_check_json_db()

allowIntIDs: Allow integer-valued image and annotation IDs (COCO uses this, CCT files use strings)

bCheckImageExistence: Should we check that all the images in the .json file exist on disk?

bCheckImageSizes: Should we validate the image sizes?

bFindUnusedImages: Should we search [baseDir] for images that are not used in the .json file?

bRequireLocation: Should we require that all images in the .json file have a ‘location’ field?

baseDir: Image path; the filenames in the .json file should be relative to this folder

iMaxNumImages: For debugging, limit the number of images we’ll process

nThreads: Number of threads to use for parallelization, set to <= 1 to disable parallelization

parallelizeWithThreads: Whether to use threads (rather than processes for parallelization)

requireInfo: If True, error if the ‘info’ field is not present

validateBoxes: Validate that boxes have positive width/height values, can be ‘error’, ‘warning’, or None

verbose: Enable additional debug output

megadetector.data_management.databases.integrity_check_json_db.integrity_check_json_db(json_file, options=None)[source]

Does some integrity-checking and computes basic statistics on a COCO Camera Traps .json file; see module header comment for a list of the validation steps.

Parameters:

json_file (str) – filename to validate, or an already-loaded dict
options (IntegrityCheckOptions, optional) – see IntegrityCheckOptions

Returns:

tuple containing:

sorted_categories (dict): list of categories used in [json_file], sorted by frequency
data (dict): the data loaded from [json_file]
error_info (dict): specific validation errors

Return type:

tuple

integrity_check_json_db - CLI interface

integrity_check_json_db [-h] [--bCheckImageSizes] [--bCheckImageExistence]
                        [--bFindUnusedImages] [--baseDir BASEDIR] [--bAllowNoLocation]
                        [--iMaxNumImages IMAXNUMIMAGES] [--nThreads NTHREADS]
                        json_file

integrity_check_json_db positional arguments

json_file - COCO-formatted .json file to validate

integrity_check_json_db options

-h, --help - show this help message and exit
--bCheckImageSizes - Validate image size, requires baseDir to be specified. Implies existence checking.
--bCheckImageExistence - Validate image existence, requires baseDir to be specified
--bFindUnusedImages - Check for images in baseDir that aren’t in the database, requires baseDir to be specified
--baseDir BASEDIR - Base directory for images
--bAllowNoLocation - Disable errors when no location is specified for an image
--iMaxNumImages IMAXNUMIMAGES - Cap on total number of images to check
--nThreads NTHREADS - Number of threads (only relevant when verifying image sizes and/or existence)

data_management.databases.subset_json_db module

subset_json_db.py

Select a subset of images (and associated annotations) from a .json file in COCO Camera Traps format based on a string query.

To subset .json files in the MegaDetector output format, see subset_json_detector_output.py.

exclude-members:: main

megadetector.data_management.databases.subset_json_db.main()[source]

megadetector.data_management.databases.subset_json_db.subset_json_db(input_json, query, output_json=None, ignore_case=False, remap_categories=True, verbose=False)[source]

Given a json file (or dictionary already loaded from a json file), produce a new database containing only the images whose filenames contain the string ‘query’, optionally writing that DB output to a new json file.

Parameters:

input_json (str) – COCO Camera Traps .json file to load, or an already-loaded dict
query (str or list) – string to query for, only include images in the output whose filenames contain this string. If this is a list, test for exact matches.
output_json (str, optional) – file to write the resulting .json file to
ignore_case (bool, optional) – whether to perform a case-insensitive search for [query]
remap_categories (bool, optional) – trim the category list to only the categores used in the subset
verbose (bool, optional) – enable additional debug output

Returns:

CCT dictionary containing a subset of the images and annotations in the input dict

Return type:

dict

subset_json_db - CLI interface

subset_json_db [-h] [--ignore_case] input_json output_json query

subset_json_db positional arguments

input_json - Input file (a COCO Camera Traps .json file)
output_json - Output file
query - Filename query

subset_json_db options

-h, --help - show this help message and exit
--ignore_case