data_management.databases package

This package contains tools for working with .json files in COCO Camera Traps format.

Submodules

data_management.databases.combine_coco_camera_traps_files module

combine_coco_camera_traps_files.py

Merges two or more .json files in COCO Camera Traps format, optionally writing the results to another .json file.

  • Concatenates image lists, erroring if images are not unique.

  • Errors on unrecognized fields.

  • Checks compatibility in info structs, within reason.

Example command-line invocation

combine_coco_camera_traps_files input1.json input2.json … inputN.json output.json

megadetector.data_management.databases.combine_coco_camera_traps_files.combine_cct_dictionaries(input_dicts, require_uniqueness=True)[source]

Merges the list of COCO Camera Traps dictionaries [input_dicts]. See module header comment for details on merge rules.

Parameters:
  • input_dicts (list of dict) – list of CCT dicts

  • require_uniqueness (bool, optional) – whether to require that the images in each input_dict be unique

Returns:

the merged COCO-formatted .json dict

Return type:

dict

megadetector.data_management.databases.combine_coco_camera_traps_files.combine_cct_files(input_files, output_file=None, require_uniqueness=True, filename_prefixes=None)[source]

Merges the list of COCO Camera Traps files [input_files] into a single dictionary, optionally writing the result to [output_file].

Parameters:
  • input_files (list) – paths to CCT .json files

  • output_file (str, optional) – path to write merged .json file

  • require_uniqueness (bool, optional) – whether to require that the images in each input_dict be unique

  • filename_prefixes (dict, optional) – dict mapping input filenames to strings that should be prepended to image filenames from that source

Returns:

the merged COCO-formatted .json dict

Return type:

dict

combine_coco_camera_traps_files - CLI interface

combine_coco_camera_traps_files [-h] input_paths [input_paths ...] output_path

combine_coco_camera_traps_files positional arguments

combine_coco_camera_traps_files options

  • -h, --help - show this help message and exit

data_management.databases.integrity_check_json_db module

integrity_check_json_db.py

Does some integrity-checking and computes basic statistics on a COCO Camera Traps .json file, specifically:

  • Verifies that required fields are present and have the right types

  • Verifies that annotations refer to valid images

  • Verifies that annotations refer to valid categories

  • Verifies that image, category, and annotation IDs are unique

  • Optionally checks file existence

  • Finds un-annotated images

  • Finds unused categories

  • Prints a list of categories sorted by count

class megadetector.data_management.databases.integrity_check_json_db.IntegrityCheckOptions[source]

Bases: object

Options for integrity_check_json_db()

allowIntIDs

Allow integer-valued image and annotation IDs (COCO uses this, CCT files use strings)

bCheckImageExistence

Should we check that all the images in the .json file exist on disk?

bCheckImageSizes

Should we validate the image sizes?

bFindUnusedImages

Should we search [baseDir] for images that are not used in the .json file?

bRequireLocation

Should we require that all images in the .json file have a ‘location’ field?

baseDir

Image path; the filenames in the .json file should be relative to this folder

iMaxNumImages

For debugging, limit the number of images we’ll process

nThreads

Number of threads to use for parallelization, set to <= 1 to disable parallelization

parallelizeWithThreads

Whether to use threads (rather than processes for parallelization)

requireInfo

If True, error if the ‘info’ field is not present

validateBoxes

Validate that boxes have positive width/height values, can be ‘error’, ‘warning’, or None

verbose

Enable additional debug output

megadetector.data_management.databases.integrity_check_json_db.integrity_check_json_db(json_file, options=None)[source]

Does some integrity-checking and computes basic statistics on a COCO Camera Traps .json file; see module header comment for a list of the validation steps.

Parameters:
  • json_file (str) – filename to validate, or an already-loaded dict

  • options (IntegrityCheckOptions, optional) – see IntegrityCheckOptions

Returns:

tuple containing:
  • sorted_categories (dict): list of categories used in [json_file], sorted by frequency

  • data (dict): the data loaded from [json_file]

  • error_info (dict): specific validation errors

Return type:

tuple

integrity_check_json_db - CLI interface

integrity_check_json_db [-h] [--bCheckImageSizes] [--bCheckImageExistence]
                        [--bFindUnusedImages] [--baseDir BASEDIR] [--bAllowNoLocation]
                        [--iMaxNumImages IMAXNUMIMAGES] [--nThreads NTHREADS]
                        json_file

integrity_check_json_db positional arguments

  • json_file - COCO-formatted .json file to validate

integrity_check_json_db options

  • -h, --help - show this help message and exit

  • --bCheckImageSizes - Validate image size, requires baseDir to be specified. Implies existence checking.

  • --bCheckImageExistence - Validate image existence, requires baseDir to be specified

  • --bFindUnusedImages - Check for images in baseDir that aren’t in the database, requires baseDir to be specified

  • --baseDir BASEDIR - Base directory for images

  • --bAllowNoLocation - Disable errors when no location is specified for an image

  • --iMaxNumImages IMAXNUMIMAGES - Cap on total number of images to check

  • --nThreads NTHREADS - Number of threads (only relevant when verifying image sizes and/or existence)

data_management.databases.subset_json_db module

subset_json_db.py

Select a subset of images (and associated annotations) from a .json file in COCO Camera Traps format based on a string query.

To subset .json files in the MegaDetector output format, see subset_json_detector_output.py.

exclude-members:

main

megadetector.data_management.databases.subset_json_db.main()[source]
megadetector.data_management.databases.subset_json_db.subset_json_db(input_json, query, output_json=None, ignore_case=False, remap_categories=True, verbose=False)[source]

Given a json file (or dictionary already loaded from a json file), produce a new database containing only the images whose filenames contain the string ‘query’, optionally writing that DB output to a new json file.

Parameters:
  • input_json (str) – COCO Camera Traps .json file to load, or an already-loaded dict

  • query (str or list) – string to query for, only include images in the output whose filenames contain this string. If this is a list, test for exact matches.

  • output_json (str, optional) – file to write the resulting .json file to

  • ignore_case (bool, optional) – whether to perform a case-insensitive search for [query]

  • remap_categories (bool, optional) – trim the category list to only the categores used in the subset

  • verbose (bool, optional) – enable additional debug output

Returns:

CCT dictionary containing a subset of the images and annotations in the input dict

Return type:

dict

subset_json_db - CLI interface

subset_json_db [-h] [--ignore_case] input_json output_json query

subset_json_db positional arguments

subset_json_db options