data_management.databases package
This package contains tools for working with .json files in COCO Camera Traps format.
Submodules
data_management.databases.combine_coco_camera_traps_files module
combine_coco_camera_traps_files.py
Merges two or more .json files in COCO Camera Traps format, optionally writing the results to another .json file.
Concatenates image lists, erroring if images are not unique.
Errors on unrecognized fields.
Checks compatibility in info structs, within reason.
Example command-line invocation
combine_coco_camera_traps_files input1.json input2.json … inputN.json output.json
- megadetector.data_management.databases.combine_coco_camera_traps_files.combine_cct_dictionaries(input_dicts, require_uniqueness=True)[source]
Merges the list of COCO Camera Traps dictionaries [input_dicts]. See module header comment for details on merge rules.
- Parameters:
input_dicts (list of dict) – list of CCT dicts
require_uniqueness (bool, optional) – whether to require that the images in each input_dict be unique
- Returns:
the merged COCO-formatted .json dict
- Return type:
dict
- megadetector.data_management.databases.combine_coco_camera_traps_files.combine_cct_files(input_files, output_file=None, require_uniqueness=True, filename_prefixes=None)[source]
Merges the list of COCO Camera Traps files [input_files] into a single dictionary, optionally writing the result to [output_file].
- Parameters:
input_files (list) – paths to CCT .json files
output_file (str, optional) – path to write merged .json file
require_uniqueness (bool, optional) – whether to require that the images in each input_dict be unique
filename_prefixes (dict, optional) – dict mapping input filenames to strings that should be prepended to image filenames from that source
- Returns:
the merged COCO-formatted .json dict
- Return type:
dict
combine_coco_camera_traps_files - CLI interface
combine_coco_camera_traps_files [-h] input_paths [input_paths ...] output_path
combine_coco_camera_traps_files positional arguments
input_paths- List of input .json filesoutput_path- Output .json file
combine_coco_camera_traps_files options
data_management.databases.integrity_check_json_db module
integrity_check_json_db.py
Does some integrity-checking and computes basic statistics on a COCO Camera Traps .json file, specifically:
Verifies that required fields are present and have the right types
Verifies that annotations refer to valid images
Verifies that annotations refer to valid categories
Verifies that image, category, and annotation IDs are unique
Optionally checks file existence
Finds un-annotated images
Finds unused categories
Prints a list of categories sorted by count
- class megadetector.data_management.databases.integrity_check_json_db.IntegrityCheckOptions[source]
Bases:
objectOptions for integrity_check_json_db()
- allowIntIDs
Allow integer-valued image and annotation IDs (COCO uses this, CCT files use strings)
- bCheckImageExistence
Should we check that all the images in the .json file exist on disk?
- bCheckImageSizes
Should we validate the image sizes?
- bFindUnusedImages
Should we search [baseDir] for images that are not used in the .json file?
- bRequireLocation
Should we require that all images in the .json file have a ‘location’ field?
- baseDir
Image path; the filenames in the .json file should be relative to this folder
- iMaxNumImages
For debugging, limit the number of images we’ll process
- nThreads
Number of threads to use for parallelization, set to <= 1 to disable parallelization
- parallelizeWithThreads
Whether to use threads (rather than processes for parallelization)
- requireInfo
If True, error if the ‘info’ field is not present
- validateBoxes
Validate that boxes have positive width/height values, can be ‘error’, ‘warning’, or None
- verbose
Enable additional debug output
- megadetector.data_management.databases.integrity_check_json_db.integrity_check_json_db(json_file, options=None)[source]
Does some integrity-checking and computes basic statistics on a COCO Camera Traps .json file; see module header comment for a list of the validation steps.
- Parameters:
json_file (str) – filename to validate, or an already-loaded dict
options (IntegrityCheckOptions, optional) – see IntegrityCheckOptions
- Returns:
- tuple containing:
sorted_categories (dict): list of categories used in [json_file], sorted by frequency
data (dict): the data loaded from [json_file]
error_info (dict): specific validation errors
- Return type:
tuple
integrity_check_json_db - CLI interface
integrity_check_json_db [-h] [--bCheckImageSizes] [--bCheckImageExistence]
[--bFindUnusedImages] [--baseDir BASEDIR] [--bAllowNoLocation]
[--iMaxNumImages IMAXNUMIMAGES] [--nThreads NTHREADS]
json_file
integrity_check_json_db positional arguments
json_file- COCO-formatted .json file to validate
integrity_check_json_db options
--bCheckImageSizes- Validate image size, requires baseDir to be specified. Implies existence checking.--bCheckImageExistence- Validate image existence, requires baseDir to be specified--bFindUnusedImages- Check for images in baseDir that aren’t in the database, requires baseDir to be specified--baseDirBASEDIR- Base directory for images--bAllowNoLocation- Disable errors when no location is specified for an image--iMaxNumImagesIMAXNUMIMAGES- Cap on total number of images to check--nThreadsNTHREADS- Number of threads (only relevant when verifying image sizes and/or existence)
data_management.databases.subset_json_db module
subset_json_db.py
Select a subset of images (and associated annotations) from a .json file in COCO Camera Traps format based on a string query.
To subset .json files in the MegaDetector output format, see subset_json_detector_output.py.
- exclude-members:
main
- megadetector.data_management.databases.subset_json_db.subset_json_db(input_json, query, output_json=None, ignore_case=False, remap_categories=True, verbose=False)[source]
Given a json file (or dictionary already loaded from a json file), produce a new database containing only the images whose filenames contain the string ‘query’, optionally writing that DB output to a new json file.
- Parameters:
input_json (str) – COCO Camera Traps .json file to load, or an already-loaded dict
query (str or list) – string to query for, only include images in the output whose filenames contain this string. If this is a list, test for exact matches.
output_json (str, optional) – file to write the resulting .json file to
ignore_case (bool, optional) – whether to perform a case-insensitive search for [query]
remap_categories (bool, optional) – trim the category list to only the categores used in the subset
verbose (bool, optional) – enable additional debug output
- Returns:
CCT dictionary containing a subset of the images and annotations in the input dict
- Return type:
dict
subset_json_db - CLI interface
subset_json_db [-h] [--ignore_case] input_json output_json query
subset_json_db positional arguments
input_json- Input file (a COCO Camera Traps .json file)output_json- Output filequery- Filename query