data_management.lila package

This package contains tools for updating and working with datasets hosted on lila.science.

Though not documented here, since they’re not formal modules, see the following for examples of using the lila_common module:

create_lila_blank_set.py: a script for downloading a large, diverse set of blank images from LILA
create_lila_test_set.py: a script for creating a test set of empty and non-empty images from LILA
download_lila_subset.py: a script for downloading a specific set of images from LILA, e.g. “all the foxes”
get_lila_image_counts.py: a script for counting the number of images in each LILA camera trap dataset

Submodules

data_management.lila.lila_common module

lila_common.py

Common constants and functions related to LILA data management/retrieval.

megadetector.data_management.lila.lila_common.read_lila_all_images_file(metadata_dir, force_download=False, read_to_dataframe=True)[source]

Downloads if necessary - then unzips if necessary - the .csv file with label mappings for all LILA files, and opens the resulting .csv file as a Pandas DataFrame.

Parameters:

metadata_dir (str) – folder to use for temporary LILA metadata files
force_download (bool, optional) – download the metadata file even if the local file exists.
read_to_dataframe (bool, optional) – read the .csv file into a dataframe

Returns:

a DataFrame containing one row per identification in a LILA camera trap image, or None if read_to_dataframe is False

Return type:

pd.DataFrame

megadetector.data_management.lila.lila_common.read_lila_metadata(metadata_dir, force_download=False)[source]

Reads LILA metadata (URLs to each dataset), downloading the .csv file if necessary.

Parameters:

metadata_dir (str) – folder to use for temporary LILA metadata files
force_download (bool, optional) – download the metadata file even if the local file exists.

Returns:

a dict mapping dataset names (e.g. “Caltech Camera Traps”) to dicts with keys corresponding to the headers in the .csv file, currently:

name
short_name
continent
country
region
image_base_url_relative
bbox_url_relative
image_base_url_gcp
metadata_url_gcp
bbox_url_gcp
image_base_url_aws
metadata_url_aws
bbox_url_aws
image_base_url_azure
metadata_url_azure
box_url_azure
mdv4_results_raw
mdv5b_results_raw
md_results_with_rde
json_filename

Return type:

dict

megadetector.data_management.lila.lila_common.read_lila_taxonomy_mapping(metadata_dir, force_download=False)[source]

Reads the LILA taxonomy mapping file, downloading the .csv file if necessary.

Parameters:

metadata_dir (str) – folder to use for temporary LILA metadata files
force_download (bool, optional) – download the taxonomy mapping file even if the local file exists.

Returns:

a DataFrame with one row per identification

Return type:

pd.DataFrame

megadetector.data_management.lila.lila_common.read_metadata_file_for_dataset(ds_name, metadata_dir, metadata_table=None, json_url=None, preferred_cloud='gcp', force_download=False)[source]

Downloads if necessary - then unzips if necessary - the .json file for a specific dataset.

Parameters:

ds_name (str) – the name of the dataset for which you want to retrieve metadata (e.g. “Caltech Camera Traps”)
metadata_dir (str) – folder to use for temporary LILA metadata files
metadata_table (dict, optional) – an optional dictionary already loaded via read_lila_metadata()
json_url (str, optional) – the URL of the metadata file, if None will be retrieved via read_lila_metadata()
preferred_cloud (str, optional) – ‘gcp’ (default), ‘azure’, or ‘aws’
force_download (bool, optional) – download the metadata file even if the local file exists.

Returns:

the .json filename on the local disk

Return type:

str

megadetector.data_management.lila.lila_common.read_wildlife_insights_taxonomy_mapping(metadata_dir, force_download=False)[source]

Reads the WI taxonomy mapping file, downloading the .json data (and writing to .csv) if necessary.

Parameters:

metadata_dir (str) – folder to use for temporary LILA metadata files
force_download (bool, optional) – download the taxonomy mapping file even if the local file exists.

Returns:

A DataFrame with taxonomy information

Return type:

pd.dataframe