Find and download files from ESGF#

esmvalcore.esgf#

Find files on the ESGF and download them.

Note

This module uses esgf-pyclient to search for and download files from the Earth System Grid Federation (ESGF). esgf-pyclient uses a deprecated API that is scheduled to be taken offline and replaced by new APIs based on STAC (ESGF East) and Globus (ESGF West). An ESGF node mimicking the deprecated API but built op top of Globus will be kept online for some time at https://esgf-node.ornl.gov/esgf-1-5-bridge, but users are encouraged to migrate to the new APIs as soon as possible by using the esmvalcore.io.intake_esgf module instead.

This module provides the function esmvalcore.esgf.find_files() for searching for files on ESGF using the ESMValTool vocabulary. It returns esmvalcore.esgf.ESGFFile objects, which have a convenient esmvalcore.esgf.ESGFFile.download() method for downloading the file. A esmvalcore.esgf.download() function for downloading multiple files in parallel is also available.

It also provides an esmvalcore.esgf.ESGFDataSource that can be used to find files on ESGF from the Dataset or the recipe. To use it, run the command

esmvalcore config copy data-esmvalcore-esgf.yml

to copy the default configuration file for this module to your configuration directory. This will create a file with the following content:

Listing 8 Contents of data-esmvalcore-esgf.yml#
# Download CMIP, CORDEX, and obs4MIPs data from ESGF using the `esmvalcore.esgf`
# module, which uses the legacy ESGF search interface.
projects:
  CMIP6: &esgf-pyclient-data
    data:
      esgf-pyclient:
        type: "esmvalcore.esgf.ESGFDataSource"
        download_dir: ~/climate_data
        # Use a lower priority than for esmvalcore.local.LocalDataSource
        # to avoid searching ESGF with the setting `search_esgf: when_missing`.
        priority: 10
  CMIP5:
    <<: *esgf-pyclient-data
  CMIP3:
    <<: *esgf-pyclient-data
  CORDEX:
    <<: *esgf-pyclient-data
  obs4MIPs:
    <<: *esgf-pyclient-data

See Data sources for more information on configuring data sources and ESGF configuration for additional configuration options of this module.

Classes:

ESGFDataSource(name, project, priority, ...)

ESGFFile(results[, dest_folder])

File on the ESGF.

Functions:

download(files[, dest_folder, n_jobs])

Download multiple ESGFFiles in parallel.

find_files(*, project, short_name, dataset, ...)

Search for files on ESGF.

class esmvalcore.esgf.ESGFDataSource(name: 'str', project: 'str', priority: 'int', download_dir: 'Path')[source]#

Bases: DataSource

Attributes:

debug_info

A string containing debug information when no data is found.

download_dir

The destination directory where data will be downloaded.

name

A name identifying the data source.

priority

The priority of the data source.

project

The project that the data source provides data for.

Methods:

find_data(**facets)

Find data.

Parameters:
debug_info: str = ''#

A string containing debug information when no data is found.

download_dir: Path#

The destination directory where data will be downloaded.

find_data(**facets: FacetValue) list[ESGFFile][source]#

Find data.

Parameters:

**facets (FacetValue) – Find data matching these facets.

Returns:

A list of files that have been found on ESGF.

Return type:

list of esmvalcore.esgf.ESGFFile

name: str#

A name identifying the data source.

priority: int#

The priority of the data source. Lower values have priority.

project: str#

The project that the data source provides data for.

class esmvalcore.esgf.ESGFFile(results: Iterable[FileResult], dest_folder: Path | None = None)[source]#

Bases: DataElement

File on the ESGF.

This is the object returned by esmvalcore.esgf.find_files().

Parameters:
  • results (Iterable[FileResult])

  • dest_folder (Path | None)

dataset#

The name of the dataset that the file is part of.

Type:

str

facets#

Facets describing the file.

Type:

dict[str,str]

name#

The name of the file.

Type:

str

size#

The size of the file in bytes.

Type:

int

urls#

The URLs where the file can be downloaded.

Type:

list[str]

Attributes:

attributes

Attributes are key-value pairs describing the data.

Methods:

download(dest_folder)

Download the file.

local_file(dest_folder)

Return the path to the local file after download.

prepare()

Prepare the data for access.

to_iris()

Load the data as Iris cubes.

property attributes: dict[str, Any]#

Attributes are key-value pairs describing the data.

download(dest_folder: Path | None) LocalFile[source]#

Download the file.

Parameters:

dest_folder (Path | None) – The destination folder.

Raises:

DownloadError: – Raised if downloading the file failed.

Returns:

The path where the file will be located after download.

Return type:

LocalFile

local_file(dest_folder: Path | None) LocalFile[source]#

Return the path to the local file after download.

Parameters:

dest_folder (Path | None) – The destination folder.

Returns:

The path where the file will be located after download.

Return type:

LocalFile

prepare() None[source]#

Prepare the data for access.

Return type:

None

to_iris() iris.cube.CubeList[source]#

Load the data as Iris cubes.

Returns:

The loaded data.

Return type:

iris.cube.CubeList

esmvalcore.esgf.download(files, dest_folder=None, n_jobs=4)[source]#

Download multiple ESGFFiles in parallel.

Parameters:
  • files (list of ESGFFile) – The files to download.

  • dest_folder (Path or None) – The destination folder.

  • n_jobs (int) – The number of files to download in parallel.

Raises:

DownloadError: – Raised if one or more files failed to download.

esmvalcore.esgf.find_files(*, project, short_name, dataset, **facets)[source]#

Search for files on ESGF.

Parameters:
  • project (str) – Choose from CMIP3, CMIP5, CMIP6, CORDEX, or obs4MIPs.

  • short_name (str) – The name of the variable.

  • dataset (str) – The name of the dataset.

  • **facets (Union[str, list[str]]) – Any other search facets. An '*' can be used to match any value. By default, only the latest version of a file will be returned. To select all versions use version='*' while other omitted facets will default to '*'. It is also possible to specify multiple values for a facet, e.g. exp=['historical', 'ssp585'] will match any file that belongs to either the historical or ssp585 experiment. The timerange facet can be specified in ISO 8601 format.

Note

A value of timerange='*' is supported, but combining a '*' with a time or period as supported in the recipe is currently not supported and will return all found files.

Examples

Examples of how to use this function for all supported projects.

Search for a CMIP3 dataset:

>>> find_files(
...     project='CMIP3',
...     frequency='mon',
...     short_name='tas',
...     dataset='cccma_cgcm3_1',
...     exp='historical',
...     ensemble='run1',
... )
[ESGFFile:cmip3/CCCma/cccma_cgcm3_1/historical/mon/atmos/run1/tas/v1/tas_a1_20c3m_1_cgcm3.1_t47_1850_2000.nc]

Search for a CMIP5 dataset:

>>> find_files(
...     project='CMIP5',
...     mip='Amon',
...     short_name='tas',
...     dataset='inmcm4',
...     exp='historical',
...     ensemble='r1i1p1',
... )
[ESGFFile:cmip5/output1/INM/inmcm4/historical/mon/atmos/Amon/r1i1p1/v20130207/tas_Amon_inmcm4_historical_r1i1p1_185001-200512.nc]

Search for a CMIP6 dataset:

>>> find_files(
...     project='CMIP6',
...     mip='Amon',
...     short_name='tas',
...     dataset='CanESM5',
...     exp='historical',
...     ensemble='r1i1p1f1',
... )
[ESGFFile:CMIP6/CMIP/CCCma/CanESM5/historical/r1i1p1f1/Amon/tas/gn/v20190429/tas_Amon_CanESM5_historical_r1i1p1f1_gn_185001-201412.nc]

Search for a CORDEX dataset and limit the search results to files containing data to the years in the range 1990-2000:

>>> find_files(
...     project='CORDEX',
...     frequency='mon',
...     dataset='COSMO-crCLIM-v1-1',
...     short_name='tas',
...     exp='historical',
...     ensemble='r1i1p1',
...     domain='EUR-11',
...     driver='MPI-M-MPI-ESM-LR',
...     timerange='1990/2000',
... )
[ESGFFile:cordex/output/EUR-11/CLMcom-ETH/MPI-M-MPI-ESM-LR/historical/r1i1p1/COSMO-crCLIM-v1-1/v1/mon/tas/v20191219/tas_EUR-11_MPI-M-MPI-ESM-LR_historical_r1i1p1_CLMcom-ETH-COSMO-crCLIM-v1-1_v1_mon_198101-199012.nc,
 ESGFFile:cordex/output/EUR-11/CLMcom-ETH/MPI-M-MPI-ESM-LR/historical/r1i1p1/COSMO-crCLIM-v1-1/v1/mon/tas/v20191219/tas_EUR-11_MPI-M-MPI-ESM-LR_historical_r1i1p1_CLMcom-ETH-COSMO-crCLIM-v1-1_v1_mon_199101-200012.nc]

Search for an obs4MIPs dataset:

>>> find_files(
...     project='obs4MIPs',
...     frequency='mon',
...     dataset='CERES-EBAF',
...     short_name='rsutcs',
... )
[ESGFFile:obs4MIPs/NASA-LaRC/CERES-EBAF/atmos/mon/v20160610/rsutcs_CERES-EBAF_L3B_Ed2-8_200003-201404.nc]

Search for any ensemble member:

>>> find_files(
...     project='CMIP6',
...     mip='Amon',
...     short_name='tas',
...     dataset='BCC-CSM2-MR',
...     exp='historical',
...     ensemble='*',
... )
[ESGFFile:CMIP6/CMIP/BCC/BCC-CSM2-MR/historical/r1i1p1f1/Amon/tas/gn/v20181126/tas_Amon_BCC-CSM2-MR_historical_r1i1p1f1_gn_185001-201412.nc,
 ESGFFile:CMIP6/CMIP/BCC/BCC-CSM2-MR/historical/r2i1p1f1/Amon/tas/gn/v20181115/tas_Amon_BCC-CSM2-MR_historical_r2i1p1f1_gn_185001-201412.nc,
 ESGFFile:CMIP6/CMIP/BCC/BCC-CSM2-MR/historical/r3i1p1f1/Amon/tas/gn/v20181119/tas_Amon_BCC-CSM2-MR_historical_r3i1p1f1_gn_185001-201412.nc]

Search for all available versions of a file:

>>> find_files(
...     project='CMIP5',
...     mip='Amon',
...     short_name='tas',
...     dataset='CCSM4',
...     exp='historical',
...     ensemble='r1i1p1',
...     version='*',
... )
[ESGFFile:cmip5/output1/NCAR/CCSM4/historical/mon/atmos/Amon/r1i1p1/v20121031/tas_Amon_CCSM4_historical_r1i1p1_185001-200512.nc,
 ESGFFile:cmip5/output1/NCAR/CCSM4/historical/mon/atmos/Amon/r1i1p1/v20130425/tas_Amon_CCSM4_historical_r1i1p1_185001-200512.nc,
 ESGFFile:cmip5/output1/NCAR/CCSM4/historical/mon/atmos/Amon/r1i1p1/v20160829/tas_Amon_CCSM4_historical_r1i1p1_185001-200512.nc]

Search for a specific version of a file:

>>> find_files(
...     project='CMIP5',
...     mip='Amon',
...     short_name='tas',
...     dataset='CCSM4',
...     exp='historical',
...     ensemble='r1i1p1',
...     version='v20130425',
... )
[ESGFFile:cmip5/output1/NCAR/CCSM4/historical/mon/atmos/Amon/r1i1p1/v20130425/tas_Amon_CCSM4_historical_r1i1p1_185001-200512.nc]
Returns:

A list of files that have been found.

Return type:

list of ESGFFile

esmvalcore.esgf.facets#

Module containing mappings from our names to ESGF names.

Data:

DATASET_MAP

Cache for the mapping between recipe/filesystem and ESGF dataset names.

FACETS

Mapping between the recipe and ESGF facet names.

Functions:

create_dataset_map()

Create the DATASET_MAP from recipe datasets to ESGF dataset names.

esmvalcore.esgf.facets.DATASET_MAP = {'CMIP3': {}, 'CMIP5': {'ACCESS1-0': 'ACCESS1.0', 'ACCESS1-3': 'ACCESS1.3', 'CESM1-BGC': 'CESM1(BGC)', 'CESM1-CAM5': 'CESM1(CAM5)', 'CESM1-CAM5-1-FV2': 'CESM1(CAM5.1,FV2)', 'CESM1-FASTCHEM': 'CESM1(FASTCHEM)', 'CESM1-WACCM': 'CESM1(WACCM)', 'CSIRO-Mk3-6-0': 'CSIRO-Mk3.6.0', 'GFDL-CM2p1': 'GFDL-CM2.1', 'MRI-AGCM3-2H': 'MRI-AGCM3.2H', 'MRI-AGCM3-2S': 'MRI-AGCM3.2S', 'bcc-csm1-1': 'BCC-CSM1.1', 'bcc-csm1-1-m': 'BCC-CSM1.1(m)', 'fio-esm': 'FIO-ESM', 'inmcm4': 'INM-CM4'}, 'CMIP6': {}, 'CORDEX': {}, 'obs4MIPs': {}}#

Cache for the mapping between recipe/filesystem and ESGF dataset names.

esmvalcore.esgf.facets.FACETS = {'CMIP3': {'dataset': 'model', 'ensemble': 'ensemble', 'exp': 'experiment', 'frequency': 'time_frequency', 'short_name': 'variable'}, 'CMIP5': {'dataset': 'model', 'ensemble': 'ensemble', 'exp': 'experiment', 'frequency': 'time_frequency', 'institute': 'institute', 'mip': 'cmor_table', 'product': 'product', 'short_name': 'variable'}, 'CMIP6': {'activity': 'activity_drs', 'dataset': 'source_id', 'ensemble': 'member_id', 'exp': 'experiment_id', 'grid': 'grid_label', 'institute': 'institution_id', 'mip': 'table_id', 'short_name': 'variable'}, 'CORDEX': {'dataset': 'rcm_name', 'domain': 'domain', 'driver': 'driving_model', 'ensemble': 'ensemble', 'exp': 'experiment', 'frequency': 'time_frequency', 'institute': 'institute', 'product': 'product', 'short_name': 'variable'}, 'obs4MIPs': {'dataset': 'source_id', 'frequency': 'time_frequency', 'institute': 'institute', 'short_name': 'variable'}}#

Mapping between the recipe and ESGF facet names.

esmvalcore.esgf.facets.create_dataset_map()[source]#

Create the DATASET_MAP from recipe datasets to ESGF dataset names.

Run python -m esmvalcore.esgf.facets to print an up to date map.