postprocessing.repeat_detection_elimination package
This package contains tools for running MegaDetector’s repeat detection elimination (RDE) process, for quickly getting rid of false positives that are frequently detected as objects of interest. The RDE page on GitHub provides documentation about how to run that process.
Submodules
postprocessing.repeat_detection_elimination.find_repeat_detections module
find_repeat_detections.py
If you want to use this script, we recommend that you read the RDE user’s guide:
Really, don’t try to run this script without reading the user’s guide, you’ll think it’s more magical than it is.
This script looks through a sequence of detections in the API output json file, and finds candidates that might be “repeated false positives”, i.e. that random branch that the detector thinks is an animal/person/vehicle.
Typically after running this script, you would do a manual step to remove true positives, then run remove_repeat_detections to produce a final output file.
There’s no way that statement was self-explanatory; see the user’s guide.
This script is just a command-line driver for repeat_detections_core.py.
find_repeat_detections - CLI interface
find_repeat_detections [-h] [--outputFile OUTPUTFILE] [--imageBase IMAGEBASE]
[--outputBase OUTPUTBASE] [--confidenceMin CONFIDENCEMIN]
[--confidenceMax CONFIDENCEMAX] [--iouThreshold IOUTHRESHOLD]
[--occurrenceThreshold OCCURRENCETHRESHOLD]
[--minSuspiciousDetectionSize MINSUSPICIOUSDETECTIONSIZE]
[--maxSuspiciousDetectionSize MAXSUSPICIOUSDETECTIONSIZE]
[--maxImagesPerFolder MAXIMAGESPERFOLDER]
[--excludeClasses EXCLUDECLASSES [EXCLUDECLASSES ...]]
[--pass_detections_to_processes_method PASS_DETECTIONS_TO_PROCESSES_METHOD]
[--nWorkers NWORKERS] [--parallelizationUsesProcesses]
[--filterFileToLoad FILTERFILETOLOAD] [--omitFilteringFolder]
[--debugMaxDir DEBUGMAXDIR] [--debugMaxRenderDir DEBUGMAXRENDERDIR]
[--debugMaxRenderDetection DEBUGMAXRENDERDETECTION]
[--debugMaxRenderInstance DEBUGMAXRENDERINSTANCE]
[--forceSerialComparisons] [--forceSerialRendering]
[--maxOutputImageWidth MAXOUTPUTIMAGEWIDTH]
[--lineThickness LINETHICKNESS] [--boxExpansion BOXEXPANSION]
[--nDirLevelsFromLeaf NDIRLEVELSFROMLEAF] [--bRenderOtherDetections]
[--bRenderDetectionTiles]
[--detectionTilesPrimaryImageWidth DETECTIONTILESPRIMARYIMAGEWIDTH]
inputFile
find_repeat_detections positional arguments
inputFile- MD results .json file to process
find_repeat_detections options
--outputFileOUTPUTFILE- .json file to write filtered results to… do not use this if you are going to do manual review of the repeat detection images (which you should)--imageBaseIMAGEBASE- Image base dir--outputBaseOUTPUTBASE- filtering folder output dir--confidenceMinCONFIDENCEMIN- Detection confidence threshold; don’t process anything below this--confidenceMaxCONFIDENCEMAX- Detection confidence threshold; don’t process anything above this--iouThresholdIOUTHRESHOLD- Detections with IOUs greater than this are considered"the same detection"--occurrenceThresholdOCCURRENCETHRESHOLD- More than this many near-identical detections in a group (e.g. a folder) is considered suspicious--minSuspiciousDetectionSizeMINSUSPICIOUSDETECTIONSIZE- Detections smaller than this fraction of image area are not considered suspicious--maxSuspiciousDetectionSizeMAXSUSPICIOUSDETECTIONSIZE- Detections larger than this fraction of image area are not considered suspicious--maxImagesPerFolderMAXIMAGESPERFOLDER- Ignore folders with more than this many images in them--excludeClassesEXCLUDECLASSES- List of integer classes we don’t want to treat as suspicious, separated by spaces.--pass_detections_to_processes_methodPASS_DETECTIONS_TO_PROCESSES_METHOD- Pass detections information to/from workers via"memory"(default) or"files"--nWorkersNWORKERS- Level of parallelism for rendering and IOU computation--parallelizationUsesProcesses- Parallelize with processes (defaults to threads)--filterFileToLoadFILTERFILETOLOAD- Path to detectionIndex.json, which should be inside a folder of images that are manually verified to _not_ contain valid animals--omitFilteringFolder- Should we skip creating the folder of rendered detections filtering?--debugMaxDirDEBUGMAXDIR- For debugging only, limit the number of directories we process--debugMaxRenderDirDEBUGMAXRENDERDIR- For debugging only, limit the number of directories we render--debugMaxRenderDetectionDEBUGMAXRENDERDETECTION- For debugging only, limit the number of detections we process per folder--debugMaxRenderInstanceDEBUGMAXRENDERINSTANCE- For debugging only, limit the number of instances we process per detection--forceSerialComparisons- Disable parallelization during the comparison stage--forceSerialRendering- Disable parallelization during the rendering stage--maxOutputImageWidthMAXOUTPUTIMAGEWIDTH- Maximum output size for thumbnail images--lineThicknessLINETHICKNESS- Line thickness thumbnail images--boxExpansionBOXEXPANSION- Box expansion for thumbnail images--nDirLevelsFromLeafNDIRLEVELSFROMLEAF- Number of levels from the leaf folders to use for repeat detection (0 == leaves)--bRenderOtherDetections- Show non-target detections in light gray on each image--bRenderDetectionTiles- Should we render a grid showing every instance (up to a limit) for each detection?--detectionTilesPrimaryImageWidthDETECTIONTILESPRIMARYIMAGEWIDTH- The width of the main image when rendering images with detection tiles
postprocessing.repeat_detection_elimination.remove_repeat_detections module
remove_repeat_detections.py
Used after running find_repeat_detections, then manually filtering the results, to create a final filtered output file.
If you want to use this script, we recommend that you read the RDE user’s guide:
- megadetector.postprocessing.repeat_detection_elimination.remove_repeat_detections.remove_repeat_detections(input_file, output_file, filtering_dir)[source]
Given an index file that was produced in a first pass through find_repeat_detections, and a folder of images (from which the user has deleted images they don’t want removed), remove the identified repeat detections from a set of MD results and write to a new file.
- Parameters:
input_file (str) – .json file of MD results, from which we should remove repeat detections
output_file (str) – output .json file to which we should write MD results (with repeat detections removed)
filtering_dir (str) – the folder produced by find_repeat_detections, containing a detectionIndex.json file
remove_repeat_detections - CLI interface
remove_repeat_detections [-h] input_file output_file filtering_dir
remove_repeat_detections positional arguments
input_file- .json file containing the original, unfiltered API resultsoutput_file- .json file to which you want to write the final, filtered API resultsfiltering_dir- directory where you looked at lots of images and decided which ones were really false positives
remove_repeat_detections options
postprocessing.repeat_detection_elimination.repeat_detections_core module
repeat_detections_core.py
Core utilities shared by find_repeat_detections and remove_repeat_detections.
Nothing in this file (in fact nothing in this subpackage) will make sense until you read the RDE user’s guide:
- class megadetector.postprocessing.repeat_detection_elimination.repeat_detections_core.DetectionLocation(instance, detection, relative_dir, category, id=None)[source]
Bases:
objectA unique-ish detection location, meaningful in the context of one directory. All detections within an IoU threshold of self.bbox will be stored in IndexedDetection objects.
- bbox
bbox as x,y,w,h
- category
category ID (not name) for this detection
- clusterLabel
only used when doing cluster-based sorting
- id
ID for this detection; this ID is only guaranteed to be unique within a directory
- instances
list of IndexedDetections that match this detection
- relativeDir
relative folder (i.e., camera name) in which this detectin was found
- sampleImageDetections
list of detections on that canonical image that match this detection
- sampleImageRelativeFileName
relative path to the canonical image representing this detection
- class megadetector.postprocessing.repeat_detection_elimination.repeat_detections_core.IndexedDetection(i_detection=-1, filename='', bbox=None, confidence=-1, category='unknown')[source]
Bases:
objectA single detection event on a single image
- bbox
[x_min, y_min, width_of_box, height_of_box]
- category
category ID (not name) of this detection
- confidence
confidence value of this detection
- filename
path to the image corresponding to this detection
- i_detection
index of this detection within all detections for this filename
- class megadetector.postprocessing.repeat_detection_elimination.repeat_detections_core.RepeatDetectionOptions[source]
Bases:
objectOptions that control the behavior of repeat detection elimination
- bFailOnRenderError
Determines whether bounding-box rendering errors (typically network errors) should be treated as failures
- bParallelizeComparisons
Should we parallelize (across cameras) comparisons to find repeat detections?
- bParallelizeRendering
Should we parallelize image rendering?
- bPrintMissingImageWarnings
Should we print a warning if images referred to in the MD results file are missing?
- bRenderDetectionTiles
Optionally show a grid that includes a sample image for the detection, plus the top N additional detections
- bRenderOtherDetections
Optionally show other detections (i.e., detections other than the one the user is evaluating), typically in a light gray.
- bWriteFilteringFolder
Should we write the folder of images used to manually review repeat detections?
- boxExpansion
Box expansion (in pixels)
- categoryAgnosticComparisons
If this is False (default), a detection from class A is not considered to be “the same” as a detection from class B, even if they’re at the same location.
- confidenceMax
Don’t consider detections with confidence higher than this as suspicious
- confidenceMin
Don’t consider detections with confidence lower than this as suspicious
- customDirNameFunction
An optional function that takes a string (an image file name) and returns a string (the corresponding folder ID), typically used when multiple folders actually correspond to the same camera in a manufacturer-specific way (e.g. a/b/c/RECONYX100 and a/b/c/RECONYX101 may really be the same camera).
See ct_utils for a common replacement function that handles most common manufacturer folder names:
from megadetector.utils import ct_utils self.customDirNameFunction = ct_utils.image_file_to_camera_folder
- debugMaxDir
limit comparisons to a specific number of folders
- Type:
For debugging
- debugMaxRenderDetection
limit comparisons to a specific number of detections
- Type:
For debugging
- debugMaxRenderDir
limit rendering to a specific number of folders
- Type:
For debugging
- debugMaxRenderInstance
limit comparisons to a specific number of instances
- Type:
For debugging
- detectionTilesCroppedGridWidth
Width to use for the grid of detection instances.
Can be a width in pixels, or a number from 0 to 1 representing a fraction of the primary image width.
If you want to render the grid at exactly 1 pixel wide, I guess you’re out of luck.
- detectionTilesMaxCrops
Maximum number of individual detection instances to include in the mosaic
- detectionTilesPrimaryImageLocation
Location of the primary image within the mosaic (‘right’ or ‘left)
- detectionTilesPrimaryImageWidth
Width of the original image (within the larger output image) when bRenderDetectionTiles is True.
If this is None, we’ll render the original image in the detection tile image at its original width.
- excludeClasses
A list of category IDs (ints) that we don’t want consider as candidate repeat detections.
Typically used to say, e.g., “don’t bother analyzing people or vehicles for repeat detections”, which you could do by saying excludeClasses = [2,3].
- excludeFolders
Exclude specific folders, mutually exclusive with [includeFolders]
- filenameReplacements
Replace filename tokens after reading, useful when the directory structure has changed relative to the structure the detector saw.
- filterFileToLoad
If this is not empty, we’ll load detections from a filter file rather than finding them from the detector output. This should be a .json file containing detections, generally this is the detectionIndex.json file in the filtering_* folder produced by find_repeat_detections().
- filteredFileListToLoad
(optional) List of filenames remaining after deletion of identified repeated detections that are actually animals. This should be a flat text file, one relative filename per line.
This is a pretty esoteric code path and a candidate for removal.
The scenario where I see it being most useful is the very hypothetical one where we use an external tool for image handling that allows us to do something smarter and less destructive than deleting images to mark them as non-false-positives.
- imageBase
Folder where images live; filenames in the MD results .json file should be relative to this folder.
imageBase can also be a SAS URL, in which case some error-checking is disabled.
- includeFolders
Include only specific folders, mutually exclusive with [excludeFolders]
- iouThreshold
What’s the IOU threshold for considering two boxes the same?
- lineThickness
Line thickness (in pixels) for box rendering
- maxImagesPerFolder
Ignore folders with more than this many images in them
- maxOutputImageWidth
Image width for rendered images (it’s called “max” because we don’t resize smaller images).
Original size is preserved if this is None.
This does not include the tile image grid.
- maxSuspiciousDetectionSize
Ignore “suspicious” detections larger than some size; these are often animals taking up the whole image. This is expressed as a fraction of the image size.
- minSuspiciousDetectionSize
Ignore “suspicious” detections smaller than some size
- missingImageWarningType
If bPrintMissingImageWarnings is True, should we print a warning about missing images just once (‘once’) or every time (‘all’)?
- nDirLevelsFromLeaf
How many folders up from the leaf nodes should we be going to aggregate images into cameras?
If this is zero, each leaf folder is treated as a camera.
- nWorkers
Number of workers to use for parallel operations
- occurrenceThreshold
How many occurrences of a single location (as defined by the IOU threshold) are required before we declare it suspicious?
- otherDetectionsColors
If bRenderOtherDetections is True, what color should we use to render the (hopefully pretty subtle) non-target detections?
In theory I’d like these “other detection” rectangles to be partially transparent, but this is not straightforward, and the alpha is ignored here. But maybe if I leave it here and wish hard enough, someday it will work.
otherDetectionsColors = [‘dimgray’]
- otherDetectionsLineWidth
Line width (in pixels) for other detections
- otherDetectionsThreshold
Threshold to use for other detections
- outputBase
Folder where we should write temporary output.
- parallelizationUsesThreads
Should we use threads (True) or processes (False) for parallelization?
Not relevant if nWorkers <= 1, or if bParallelizeComparisons and bParallelizeRendering are both False.
- pass_detections_to_processes_method
For very large sets of results, passing chunks of results to and from workers as parameters (‘memory’) can be memory-intensive, so we can serialize to intermediate files instead (‘file’).
The use of ‘file’ here is still experimental.
- smartSort
Sort detections within a directory so nearby detections are adjacent in the list, for faster review.
Can be None, ‘xsort’, or ‘clustersort’
None sorts detections chronologically by first occurrence
‘xsort’ sorts detections from left to right
‘clustersort’ clusters detections and sorts by cluster
- smartSortDistanceThreshold
Only relevant if smartSort == ‘clustersort’
- class megadetector.postprocessing.repeat_detection_elimination.repeat_detections_core.RepeatDetectionResults[source]
Bases:
objectThe results of an entire repeat detection analysis
- detectionResults
The data table (Pandas DataFrame), as loaded from the input json file via load_api_results(). Has columns [‘file’, ‘detections’,’failure’].
- detectionResultsFiltered
The data table after modification
- filename_to_row
dict mapping filenames to rows in the master table
- filterFile
The location of the .json file written with information about the RDE review images (typically detectionIndex.json)
- otherFields
The other fields in the input json file, loaded via load_api_results()
- rows_by_directory
dict mapping folder names to whole rows from the data table
- suspicious_detections
An array of length nDirs, where each element is a list of DetectionLocation objects for that directory that have been flagged as suspicious
- megadetector.postprocessing.repeat_detection_elimination.repeat_detections_core.find_repeat_detections(input_filename, output_file_name=None, options=None)[source]
Find detections in a MD results file that occur repeatedly and are likely to be rocks/sticks.
- Parameters:
input_filename (str) – the MD results .json file to analyze
output_file_name (str, optional) – the filename to which we should write results with repeat detections removed, typically set to None during the first part of the RDE process.
options (RepeatDetectionOptions, optional) – all the interesting options controlling this process; see RepeatDetectionOptions for details.
- Returns:
results of the RDE process; see RepeatDetectionResults for details.
- Return type: