data_management.lila package

This package contains tools for updating and working with datasets hosted on lila.science.

Though not documented here, since they’re not formal modules, see the following for examples of using the lila_common module:

Submodules

data_management.lila.lila_common module

lila_common.py

Common constants and functions related to LILA data management/retrieval.

megadetector.data_management.lila.lila_common.read_lila_all_images_file(metadata_dir, force_download=False, read_to_dataframe=True)[source]

Downloads if necessary - then unzips if necessary - the .csv file with label mappings for all LILA files, and opens the resulting .csv file as a Pandas DataFrame.

Parameters:
  • metadata_dir (str) – folder to use for temporary LILA metadata files

  • force_download (bool, optional) – download the metadata file even if the local file exists.

  • read_to_dataframe (bool, optional) – read the .csv file into a dataframe

Returns:

a DataFrame containing one row per identification in a LILA camera trap image, or None if read_to_dataframe is False

Return type:

pd.DataFrame

megadetector.data_management.lila.lila_common.read_lila_metadata(metadata_dir, force_download=False)[source]

Reads LILA metadata (URLs to each dataset), downloading the .csv file if necessary.

Parameters:
  • metadata_dir (str) – folder to use for temporary LILA metadata files

  • force_download (bool, optional) – download the metadata file even if the local file exists.

Returns:

a dict mapping dataset names (e.g. “Caltech Camera Traps”) to dicts with keys corresponding to the headers in the .csv file, currently:

  • name

  • short_name

  • continent

  • country

  • region

  • image_base_url_relative

  • bbox_url_relative

  • image_base_url_gcp

  • metadata_url_gcp

  • bbox_url_gcp

  • image_base_url_aws

  • metadata_url_aws

  • bbox_url_aws

  • image_base_url_azure

  • metadata_url_azure

  • box_url_azure

  • mdv4_results_raw

  • mdv5b_results_raw

  • md_results_with_rde

  • json_filename

Return type:

dict

megadetector.data_management.lila.lila_common.read_lila_taxonomy_mapping(metadata_dir, force_download=False)[source]

Reads the LILA taxonomy mapping file, downloading the .csv file if necessary.

Parameters:
  • metadata_dir (str) – folder to use for temporary LILA metadata files

  • force_download (bool, optional) – download the taxonomy mapping file even if the local file exists.

Returns:

a DataFrame with one row per identification

Return type:

pd.DataFrame

megadetector.data_management.lila.lila_common.read_metadata_file_for_dataset(ds_name, metadata_dir, metadata_table=None, json_url=None, preferred_cloud='gcp', force_download=False)[source]

Downloads if necessary - then unzips if necessary - the .json file for a specific dataset.

Parameters:
  • ds_name (str) – the name of the dataset for which you want to retrieve metadata (e.g. “Caltech Camera Traps”)

  • metadata_dir (str) – folder to use for temporary LILA metadata files

  • metadata_table (dict, optional) – an optional dictionary already loaded via read_lila_metadata()

  • json_url (str, optional) – the URL of the metadata file, if None will be retrieved via read_lila_metadata()

  • preferred_cloud (str, optional) – ‘gcp’ (default), ‘azure’, or ‘aws’

  • force_download (bool, optional) – download the metadata file even if the local file exists.

Returns:

the .json filename on the local disk

Return type:

str

megadetector.data_management.lila.lila_common.read_wildlife_insights_taxonomy_mapping(metadata_dir, force_download=False)[source]

Reads the WI taxonomy mapping file, downloading the .json data (and writing to .csv) if necessary.

Parameters:
  • metadata_dir (str) – folder to use for temporary LILA metadata files

  • force_download (bool, optional) – download the taxonomy mapping file even if the local file exists.

Returns:

A DataFrame with taxonomy information

Return type:

pd.dataframe