Rainfall Runoff datasets

This section include datasets which can be used for rainfall runoff modeling. They all contain observed streamflow and meteological data as time series. These are named as dynamic features. The physical catchment properties are included as static features as tabular data, where each row corresponds to one catchment and each column to one static feature.

In addition to published datasets, this package introduces 10 new datasets for rainfall-runoff modeling. These datasets have not yet been published but follow the CAMELS dataset series convention. They include Ireland, Finland, Italy, Poland, Portugal, Japan, Thailand, Arcticnet, Spain, and the USGS. The observed streamflow data are sourced from the national meteorological or hydrological websites of the respective countries. Catchment boundaries and meteorological data for Ireland, Finland, Italy, Poland, and Portugal are obtained from EStreams (Nascimento et al., 2024), and similarly for Japan, Thailand, Arcticnet, and Spain from GSHA (Peirong et al., 2023). For USGS, the catchment boundaries are sourced from HYSETS (Arsenault et al., 2020).

Although each data source has a dedicated, however all datasets listed in Table List of datasets are accessible via the aqua_fetch.rr.RainfallRunoff class, which allows for a unified and consistent approach to each dataset. The class provides several methods to access static features, dynamic features, or catchment boundaries. Although the raw data files for each dataset may come in different formats, the methods to access these features through the aqua_fetch.rr.RainfallRunoff class remain the same. Individual classes for each dataset are also available and may offer more control to users over specific datasets. However, for most cases, the use of the aqua_fetch.rr.RainfallRunoff class will suffice.

The naming and units of dynamic features in each dataset may vary. However, we have standardized these features using the formula name_unit_specifier for each dynamic feature across all datasets. In this formula, the specifier can indicate the source (such as ERA5 or MSWEP for precipitation), the method used to calculate the feature (like makkink or penman for evapotranspiration), or the aggregation type (min, max, mean). For example, a precipitation dynamic feature from MSWEP would be labeled as pcp_mm_mswep. This approach ensures that feature names are representative and understandable. Dynamic features for which this method is inapplicable retain their original names.

Another feature of the AquaFetch is the optional inclusion of static and dynamic features from EStreams and GSHA for all datasets listed in Table List of datasets. This is beneficial as EStreams and GSHA include several static and dynamic features calculated for the catchments, which are not included in other datasets. For instance, EStreams provides information on annual variation in land use for all European catchments, a feature not available in CAMELS-GB (Coxon et al., 2020) or other European datasets. This step is optional since it initiaties the download of GSHA and EStreams datasets which can be time-consuming and may not always be necessary.

Certain datasets in this package feature overlapping stations from the same region. For example, both the aqua_fetch.Bull and Spain datasets cover Spain. However, the Bull dataset was introduced by by Aparicio et al., 2024 , whereas the Spain dataset was introduced in this work. The Spain dataset contains more stations, totaling 889, while the Bull dataset includes 484 stations. Similarly, both the CABra (Almagro et al., 2021) and CAMELS_BR (Chagas et al., 2020) datasets cover Brazil and have been published in peer-reviewed journals. However, they differ in their temporal coverage and the number of static and dynamic features. Furthermore, Denmark is covered by two datasets, Caravan_DK (Koch 2022) and CAMELS_DK (Liu et al., 2024), which differ in temporal coverage and the number of static and dynamic features. The HYSETS dataset (Arsenault et al., 2020) covers Mexico, the US, and Canada. However, we identified issues with the observed streamflow data for the US in HYSETS. As a result, we introduced the USGS dataset, which focuses specifically on the US region. The catchment boundaries, static features, and meteorological data for USGS, however, are still obtained from HYSETS.

List of datasets

Stations per Source

Source Name

Class

Number of Daily Stations

Number of Hourly Stations

Dynamic features

Static features

Temporal Coverage

Spatial Coverage

Reference

Arcticnet

aqua_fetch.rr.Arcticnet

106

27

35

1979 - 2003

Arctic (Russia)

R-Arcticnet

Bull

aqua_fetch.Bull

484

55

214

1990 - 2020

Spain

Aparicio et al., 2024

CABra

aqua_fetch.rr.CABra

735

12

97

1980 - 2010

Brazil

Almagro et al., 2021

CAMELSH

aqua_fetch.rr.CAMELSH

5667

13

799

1900 - 2018

United States of America

Tran et al., (2025)

CAMELS_AUS

aqua_fetch.rr.CAMELS_AUS

222, 561

26

166, 187

1900 - 2018

Australia

Flower et al., 2021 , Flower et al., 2024

CAMELS_BR

aqua_fetch.rr.CAMELS_BR

897

10

67

1920 - 2019

Brazil

Chagas et al., 2020

CAMELS_CH

aqua_fetch.rr.CAMELS_CH

331

9

209

1981 - 2020

Switzerland

Hoege et al., 2023

CAMELS_CL

aqua_fetch.rr.CAMELS_CL

516

12

104

1913 - 2018

Chile

Alvarez-Garreton et al., 2018

CAMELS_COL

aqua_fetch.rr.CAMELS_COL

347

6

255

1981 - 2022

Columbia

Jimenez et al., 2025

CAMELS_DE

aqua_fetch.rr.CAMELS_DE

1555

21

111

1951 - 2020

Germany

Loritz et al., 2024

CAMELS_DK

aqua_fetch.rr.CAMELS_DK

304

13

119

1989 - 2023

Denmark

Liu et al., 2024

CAMELS_FI

aqua_fetch.rr.CAMELS_FI

320

16

111

1963 - 2023

Finland

Seppä et al., 2025

CAMELS_FR

aqua_fetch.rr.CAMELS_FR

654

22

344

1970 - 2021

France

Delaigue et al., 2024

CAMELS_GB

aqua_fetch.rr.CAMELS_GB

671

10

145

1970 - 2015

Britain

Coxon et al., 2020

CAMELS_IND

aqua_fetch.rr.CAMELS_IND

472

20

210

1980 - 2020

Republic of India

Mangukiya et al., 2024

CAMELS_LUX

aqua_fetch.rr.CAMELS_LUX

56

56

25

61

2004 - 2021

Luxumbourg

Nijzink et al., 2025

CAMELS_NZ

aqua_fetch.rr.CAMELS_NZ

369

5

40

1972 - 2024

New Zealand

Harrigan et al., 2025

CAMELS_SE

aqua_fetch.rr.CAMELS_SE

50

4

76

1961 - 2020

Sweden

Teutschbein et al., 2024

CAMELS_SK

aqua_fetch.rr.CAMELS_SK

178

17

215

2000 - 2019

South Korea

Kim et al., 2025

CAMELS_US

aqua_fetch.rr.CAMELS_US

671

8

59

1980 - 2014

United States

Newman et al., 2014

Caravan_DK

aqua_fetch.rr.Caravan_DK

304

38

211

1981 - 2020

Denmark

Koch 2022

CCAM

aqua_fetch.rr.CCAM

111

16

124

1990 - 2020

China

Hao et al., 2021

Finland

aqua_fetch.rr.Finland

669

27

35

2012 - 2023

Finland

ymparisto.fi

GRDCCaravan

aqua_fetch.rr.GRDCCaravan

5357

39

211

1950 - 2023

Global

Faerber et al., 2023

HYPE

aqua_fetch.rr.HYPE

561

Arciniega-Esparza and Birkel, 2020

HYSETS

aqua_fetch.rr.HYSETS

14425

5

28

1950 - 2018

North America (Mexico, Canada, USA)

Arsenault et al., 2020

Ireland

aqua_fetch.rr.Ireland

464

27

35

1992 - 2020

Ireland

EPA Ireland

Italy

aqua_fetch.rr.Italy

294

37

35

1992 - 2020

Italy

EPA Ireland

Japan

aqua_fetch.rr.Japan

751

696

27

35

1979 - 2022

Japan

river.go.jp

LamaHCE

aqua_fetch.rr.LamaHCE

859

859

22

80

1981 - 2019

Central Europe

Klingler et al., 2021

LamaHIce

aqua_fetch.rr.LamaHIce

111

111

36

154

1950 - 2021

Iceland

Helgason and Nijssen 2024

NPCTR Catchments

aqua_fetch.rr.NPCTRCatchments

7

14

14

2013 - 2019

Canada

Korver et al., (2024)

Poland

aqua_fetch.rr.Poland

1287

27

35

1992 - 2020

Poland

imgw.pl

Portugal

aqua_fetch.rr.Portugal

280

27

35

1992 - 2020

Portugal

snirh

RRLuleaSweden

aqua_fetch.RRLuleaSweden

1

2

0

2016 - 2019

Lulea (Sweden)

Broekhuizen et al., 2020

Simbi

aqua_fetch.rr.Simbi

24

3

232

1920 - 1940

Haiti

Bathelemy et al., 2024

Slovenia

aqua_fetch.rr.Slovenia

117

3

10

1950 - 2023

Slovenia

vode.arso.gov.si

Spain

aqua_fetch.rr.Spain

889

27

35

1979 - 2020

Spain

ceh-flumen64

Thailand

aqua_fetch.rr.Thailand

73

27

35

1980 - 1999

Thailand

RID project

USGS

aqua_fetch.rr.USGS

12004

5

27

1950 - 2018

United States

USGS nwis

WaterBenchIowa

aqua_fetch.rr.WaterBenchIowa

125

3

7

2011 - 2018

Iowa (USA)

Demir et al., 2022

Duplicate Datasets

For some regions/coutries, there are multiple datasets available. These datasets may have different number of stations, temporal coverage, static and dynamic features. The following table lists the duplicate datasets available in AquaFetch.

Duplicate Datasets

Country/Region

First Dataset

Second Dataset

Third Dataset

USA

aqua_fetch.rr.CAMELS_US

aqua_fetch.rr.HYSETS

aqua_fetch.rr.USGS

Denmark

aqua_fetch.rr.CAMELS_DK

aqua_fetch.rr.Caravan_DK

Brazil

aqua_fetch.rr.CAMELS_BR

aqua_fetch.rr.CABra

Spain

aqua_fetch.Bull

aqua_fetch.rr.Spain

High Level API

The aqua_fetch.rr.RainfallRunoff class represents high level API which provides a unified and easy-to-use interface to access all the datasets. It is recommended to use this class to access the datasets.

class aqua_fetch.rr.RainfallRunoff(dataset: str, path: str | PathLike = None, overwrite: bool = False, to_netcdf: bool = True, processes: int = None, remove_zip: bool = True, verbosity: int = 1, **kwargs)[source]

Bases: object

This class provides access to all the rainfall-runoff datasets. For simiplity and resusability, use this class instead of using the individual dataset classes.

Examples

>>> from aqua_fetch import RainfallRunoff
>>> dataset = RainfallRunoff('CAMELS_SE')  # instead of CAMELS_SE, you can provide any other dataset name
... # get data by station id
>>> _, dynamic = dataset.fetch(stations='5', as_dataframe=True)
>>> df = dynamic['5'] # dynamic is a dictionary of with keys as station names and values as DataFrames
>>> df.shape
(21915, 4)
...
... # get name of all stations as list
>>> stns = dataset.stations()
>>> len(stns)
   50
... # get data of 10 % of stations as dataframe
>>> _, dynamic = dataset.fetch(0.1, as_dataframe=True)
>>> len(dynamic)  # dynamic has data for 10% of stations (5)
   5
...
... # dynamic is a dictionary whose values are dataframes of dynamic features
>>> [df.shape for df in dynamic.values()]
    [(21915, 4), (21915, 4), (21915, 4), (21915, 4), (21915, 4)]
...
... get the data of a single (randomly selected) station
>>> _, dynamic = dataset.fetch(stations=1, as_dataframe=True)
>>> len(dynamic)  # dynamic has data for 1 station
    1
... # get names of available dynamic features
>>> dataset.dynamic_features
... # get only selected dynamic features
>>> _, dynamic = dataset.fetch('5', as_dataframe=True,
...  dynamic_features=['pcp_mm', 'airtemp_C_mean', 'q_cms_obs'])
>>> dynamic['5'].shape
   (21915, 3)
...
... # get names of available static features
>>> dataset.static_features
... # get data of 10 random stations
>>> _, dynamic = dataset.fetch(10, as_dataframe=True)
>>> len(dynamic)  # remember this is a dictionary with values as dataframe
   10
...
# If we get both static and dynamic data
>>> static, dynamic = dataset.fetch(stations='5', static_features="all", as_dataframe=True)
>>> static.shape, len(dynamic), dynamic['5'].shape
((1, 76), 1, (21915, 4))
...
# If we don't set as_dataframe=True and have xarray installed then the returned data will be a xarray Dataset
>>> _, dynamic = dataset.fetch(10)
... type(dynamic)   # -> xarray.core.dataset.Dataset
...
>>> dynamic.dims   # -> FrozenMappingWarningOnValuesAccess({'time': 21915, 'dynamic_features': 4})
...
>>> len(dynamic.data_vars)   # -> 10
...
>>> coords = dataset.stn_coords() # returns coordinates of all stations
>>> coords.shape
    (50, 2)
>>> dataset.stn_coords('5')  # returns coordinates of station whose id is 5
    68.035599       21.9758
>>> dataset.stn_coords(['5', '736'])  # returns coordinates of two stations
...
# get area of a single station
>>> dataset.area('5')
# get coordinates of two stations
>>> dataset.area(['5', '736'])
...
# if fiona library is installed we can get the boundary as fiona Geometry
>>> dataset.get_boundary('5')
...

See sphx_glr_auto_examples_camels_australia.py for more comprehensive usage example.

__init__(dataset: str, path: str | PathLike = None, overwrite: bool = False, to_netcdf: bool = True, processes: int = None, remove_zip: bool = True, verbosity: int = 1, **kwargs)[source]

Rainfall Runoff datasets

Parameters:
  • dataset (str) –

    dataset name. This must be one of the following:

    • Arcticnet

    • Bull

    • CABra

    • CCAM

    • CAMELSH

    • CAMELS_AUS

    • CAMELS_BR

    • CAMELS_CH

    • CAMELS_CL

    • CAMELS_COL

    • CAMELS_DE

    • CAMELS_DK0

    • CAMELS_DK

    • CAMELS_FI

    • CAMELS_FR

    • CAMELS_GB

    • CAMELS_IND

    • CAMELS_LUX

    • CAMELS_NZ

    • CAMELS_SE

    • CAMELS_SK

    • CAMELS_US

    • EStreams

    • Finland

    • GRDCCaravan

    • GSHA

    • HYSETS

    • HYPE

    • Ireland

    • Italy

    • Japan

    • LamaHCE

    • LamaHIce

    • Poland

    • Portugal

    • RRLuleaSweden

    • Simbi

    • Slovenia

    • Spain

    • Thailand

    • USGS

    • WaterBenchIowa

  • path (str) – path to directory inside which data is located/downloaded. If provided and the path/dataset exists, then the data will be read from this path. If provided and the path/dataset does not exist, then the data will be downloaded at this path. If not provided, then the data will be downloaded in the default path which is .../aqua_fetch/data/.

  • overwrite (bool) – If the data is already downloaded then you can set it to True, to make a fresh download.

  • to_netcdf (bool) – whether to convert all the data into one netcdf file or not. This will fasten repeated calls to fetch etc but will require netCDF4 package as well as xarray.

  • verbosity (int) – 0: no message will be printed

  • kwargs – additional keyword arguments for the underlying dataset class For example version for aqua_fetch.rr.CAMELS_AUS or timestep for aqua_fetch.rr.LamaHCE dataset or met_src for aqua_fetch.rr.CAMELS_BR

area(stations: str | List[str] = 'all') Series[source]

Returns area (Km2) of all/selected catchments as pandas.Series

Parameters:

stations (str/list (default=``all``)) – name/names of stations. Default is all, which will return area of all stations. For names of stations, see stations().

Returns:

a pandas.Series whose indices are catchment ids and values are areas of corresponding catchments.

Return type:

pd.Series

Examples

>>> from aqua_fetch import RainfallRunoff
>>> dataset = RainfallRunoff('CAMELS_CH')
>>> dataset.area()  # returns area of all stations
>>> dataset.area('2004')  # returns area of station whose id is 2004
>>> dataset.area(['2004', '6004'])  # returns area of two stations
property dynamic_features: List[str]

returns names of dynamic features as python list of strings

Examples

>>> from aqua_fetch import RainfallRunoff
>>> dataset = RainfallRunoff('CAMELS_AUS')
>>> dataset.dynamic_features
property end: str

returns end date of data

Examples

>>> from aqua_fetch import RainfallRunoff
>>> dataset = RainfallRunoff('CAMELS_AUS')
>>> dataset.end()
fetch(stations: str | List[str] | int | float = 'all', dynamic_features: List[str] | str | None = 'all', static_features: str | List[str] | None = None, st: None | str = None, en: None | str = None, as_dataframe: bool = False, **kwargs) tuple[DataFrame, Dict[str, DataFrame] | Dataset][source]

Fetches the features of one or more stations.

Parameters:
  • stations

    It can have following values:

    • int : number of (randomly selected) stations to fetch

    • float : fraction of (randomly selected) stations to fetch

    • str : name/id of station to fetch. However, if all is provided, then all stations will be fetched. For names of stations, see stations().

    • list : list of names/ids of stations to fetch

  • dynamic_features ((default=``all``)) –

    It can have following values:

    • str : name of dynamic feature to fetch. If all is provided, then all dynamic features will be fetched. For names of dynamic features, see dynamic_features().

    • list : list of dynamic features to fetch.

    • None : No dynamic feature will be fetched. The second returned value will be None.

  • static_features ((default=None)) –

    It can have following values:

    • str : name of static feature to fetch. If all is provided, then all static features will be fetched. For names of static features, see static_features().

    • list : list of static features to fetch.

    • None : No static feature will be fetched. The first returned value will be None.

  • st – starting date of data to be returned. If None, the data will be returned from where it is available.

  • en – end date of data to be returned. If None, then the data will be returned till the date data is available.

  • as_dataframe – whether to return dynamic attributes as pandas.DataFrame or as xarray.Dataset. if xarray library is not installed, then this parameter will be ignored and the data will be returned as pandas.DataFrame.

  • kwargs – keyword arguments

Returns:

A tuple of static and dynamic features. Static features are always returned as pandas.DataFrame with shape (stations, static features). The index of static features’ DataFrame is the station/gauge ids while the columns are names of the static features. Dynamic features are returned either as xarray.Dataset or a python dictionary whose keys are station names and values are pandas.DataFrame. It depends upon whether as_dataframe is True or False and whether the xarray library is installed or not. If dynamic features are xarray.Dataset, then this dataset consists of data_vars equal to the number of stations and station names as xarray.Dataset.variables and time and dynamic_features as dimensions and coordinates.

Return type:

tuple

Examples

>>> from aqua_fetch import RainfallRunoff
>>> dataset = RainfallRunoff('CAMELS_AUS')
...
>>> # get data of 10% of stations
>>> _, dynamic = dataset.fetch(stations=0.1, as_dataframe=True)  # dynamic is a dictionary
...
...  # fetch data of 5 (randomly selected) stations
>>> _, five_random_stn_data = dataset.fetch(stations=5, as_dataframe=True)
...
... # fetch data of 2 selected stations
>>> _, two_selec_stn_data = dataset.fetch(stations=['912101A','912105A'], as_dataframe=True)
...
... # fetch data of a single stations
>>> _, single_stn_data = dataset.fetch(stations='912101A', as_dataframe=True)
...
... # get both static and dynamic features as dictionary
>>> static, dyanmic = dataset.fetch(1, static_features="all", as_dataframe=True)  # -> dict
>>> dynamic
...
... # get only selected dynamic features
>>> _, sel_dyn_features = dataset.fetch(stations='912101A',
...     dynamic_features=['q_cms_obs', 'pcp_mm_silo'], as_dataframe=True)
...
... # fetch data between selected periods
>>> _, data = dataset.fetch(stations='912101A', st="20010101", en="20101231", as_dataframe=True)
fetch_dynamic_features(station: str, dynamic_features='all', st=None, en=None, as_dataframe=False) DataFrame | Dataset[source]

Fetches all or selected dynamic attributes of one station.

Parameters:
  • station (str) – name/id of station of which to extract the data. For names of stations see stations()

  • dynamic_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available dynamic features are returned. For names of dynamic features, see dynamic_features()

  • st (Optional (default=None)) – start time from where to fetch the data.

  • en (Optional (default=None)) – end time untill where to fetch the data

  • as_dataframe (bool, optional (default=False)) – if true, the returned data is pandas.DataFrame otherwise it is xarray.Dataset

Returns:

a pandas.DataFrame or xarray.Dataset depending upon the value of as_dataframe and whether xarray is installed or not.

Return type:

pd.DataFrame or xr.Dataset

Examples

>>> from aqua_fetch import RainfallRunoff
>>> camels = RainfallRunoff('CAMELS_AUS')
>>> camels.fetch_dynamic_features('912101A', as_dataframe=True)
>>> camels.dynamic_features
>>> camels.fetch_dynamic_features('912101A',
... features=['airtemp_C_silo_max', 'vp_hpa_silo', 'q_cms_obs'],
... as_dataframe=True)
fetch_static_features(stations: str | list = 'all', static_features: str | list = 'all') DataFrame[source]

Fetches all or selected static attributes of one or more stations.

Parameters:
  • stations (str) – name/id of station of which to extract the data . For names of stations see stations() .

  • static_features (list/str, optional (default="all")) – The name/names of static features to fetch. By default, all available static features are returned. For names of static features, see static_features() .

Returns:

a pandas pandas.DataFrame

Return type:

pd.DataFrame

Examples

>>> from aqua_fetch import RainfallRunoff
>>> camels = RainfallRunoff('CAMELS_AUS')
>>> camels.fetch_static_features('912101A')
>>> camels.static_features
>>> camels.fetch_static_features('912101A',
... features=['elev_mean', 'relief', 'ksat', 'pop_mean'])
fetch_station_features(station: str, dynamic_features: str | list | None = 'all', static_features: str | list | None = None, st: str | None = None, en: str | None = None, **kwargs) tuple[DataFrame, DataFrame][source]

Fetches static and dynamic features for one station.

Parameters:
  • station (str) – station id/gauge id for which the data is to be fetched. For names of stations, see stations()

  • dynamic_features (str/list, optional) – names of dynamic features/attributes to fetch. For names of dynamic features, check the output of dynamic_features()

  • static_features – names of static features/attributes to be fetches. For names of static features, check the output of static_features()

  • st (str,optional) – starting point from which the data to be fetched. By default, the data will be fetched from where it is available.

  • en (str, optional) – end point of data to be fetched. By default the dat will be fetched

Returns:

A tuple of static and dynamic features, both as pandas.DataFrame. The dataframe of static features will be of single row while the dynamic features will be of shape (time, dynamic features).

Return type:

tuple

Examples

>>> from aqua_fetch import RainfallRunoff
>>> dataset = RainfallRunoff('CAMELS_AUS')
>>> static, dynamic = dataset.fetch_station_features('912101A')
>>> static.shape
...
>>> dynamic.shape
fetch_stations_features(stations: str | List[str], dynamic_features: str | List[str] | None = 'all', static_features: str | List[str] | None = None, st=None, en=None, as_dataframe: bool = False, **kwargs) tuple[DataFrame, Dict[str, DataFrame] | Dataset][source]

Reads attributes of more than one stations.

Parameters:
  • stations – name/ids of stations for which data is to be fetched. For names of stations, see stations().

  • dynamic_features – list of dynamic features to be fetched. For names of dynamic features, see dynamic_features(). if all, then all dynamic features will be fetched. If None, then no dynamic attribute will be fetched and the second returned value will be None.

  • static_features – list of static features to be fetched. If all, then all static features will be fetched. If None, then no static attribute will be fetched. For names of static features, see static_features().

  • st – start of data to be fetched.

  • en – end of data to be fetched.

  • as_dataframe (whether to return the data as pandas.DataFrame. default) – is xarray.Dataset object

  • dict (kwargs) – additional keyword arguments

Returns:

A tuple of static and dynamic features. Static features are always returned as pandas.DataFrame with shape (stations, static features). The index of static features’ DataFrame is the station/gauge ids while the columns are names of the static features. Dynamic features are returned either as xarray.Dataset or a python dictionary whose keys are names of stations and values are pandas.DataFrame depending upon whether as_dataframe is True or False and whether the xarray library is installed or not. If dynamic features are xarray.Dataset, then this dataset consists of data_vars equal to the number of stations and station names as xarray.Dataset.variables and time and dynamic_features as dimensions and coordinates.

Return type:

tuple

Raises:

ValueError – if both dynamic_features and static_features are None

Examples

>>> from aqua_fetch import RainfallRunoff
>>> dataset = RainfallRunoff('CAMELS_AUS')
... # find out station ids
>>> dataset.stations()
... # get data of selected stations
>>> static, dynamic = dataset.fetch_stations_features(['912101A', '912105A', '915011A'],
...  as_dataframe=True)
get_boundary(station: str)[source]

returns boundary of a catchment as fiona.Geometry object.

Parameters:

station (str) – name/id of catchment. For names of catchments, see stations().

Returns:

a fiona.Geometry object representing the boundary of the catchment.

Return type:

fiona.Geometry

Examples

>>> from aqua_fetch import RainfallRunoff
>>> dataset = RainfallRunoff('CAMELS_SE')
>>> dataset.get_boundary(dataset.stations()[0])
property name: str

returns name of dataset

num_dynamic() int[source]

number of dynamic features associated with the dataset

num_static() int[source]

number of static features associated with the dataset

property path: str

returns path where the data is stored. The default path is ~../aqua_fetch/data

plot_catchment(station: str, show_outlet: bool = False, ax: Axes = None, show: bool = True, **kwargs)[source]

plots catchment boundaries

Parameters:
  • station (str) – name/id of station. For names of stations, see stations()

  • show_outlet (bool, optional (default=False)) – if True, then outlet of the catchment will be plotted as a red dot

  • ax (plt.Axes) – matplotlib axes to draw the plot. If not given, then new axes will be created.

  • show (bool)

  • **kwargs

Return type:

plt.Axes

Examples

>>> from aqua_fetch import RainfallRunoff
>>> dataset = RainfallRunoff('CAMELS_AUS')
>>> dataset.plot_catchment()
>>> dataset.plot_catchment(marker='o', ms=0.3)
>>> ax = dataset.plot_catchment(marker='o', ms=0.3, show=False)
>>> ax.set_title("Catchment Boundaries")
>>> plt.show()
plot_num_observations(stations: str | List[str] = 'all', dynamic_features: str | List[str] = 'all', start: str | Timestamp = None, end: str | Timestamp = None, show_constant: bool = False, figsize: Tuple[float, float] = None, ax=None, show: bool = True)[source]

Plots the number of observations available for different dynamic features as cumulative distribution function (CDF). This plot is not plotted if all stations have same number of observations for a dynamic feature.

Parameters:
  • stations (Union[str, List[str]]) – The stations to include in the plot. If ‘all’, all stations will be included.

  • dynamic_features (Union[str, List[str]]) – The dynamic features to include in the plot. If ‘all’, all dynamic features will be included.

  • start (Union[str, pd.Timestamp], optional) – The start date for the data to consider. If None, the start date of the dataset will be used.

  • end (Union[str, pd.Timestamp], optional) – The end date for the data to consider. If None, the end date of the dataset will be used.

  • show_constant (bool, optional) – Whether to show features with constant number of observations across stations. If True, these features will be included in the plot as well.

  • figsize (Tuple[float, float], optional) – The size of the figure to create. If None, a default size will be used.

  • ax (plt.Axes, optional) – The matplotlib axes to draw the plot. If not given, then new axes will be created.

  • show (bool, optional) – Whether to display the plot immediately.

Returns:

The matplotlib axes containing the plot.

Return type:

plt.Axes

Examples

>>> from aqua_fetch import CAMELS_FI
>>> dataset = CAMELS_FI()
>>> dataset.plot_num_observations()
# plotting for different time periods
>>> dataset = RainfallRunoff('CAMELS_COL')
...
# plot number of observations for different periods
>>> _, ax = plt.subplots()
>>> for idx, period in enumerate([("19810101", "19901231"), ("19910101", "20001231"), ("20010101", "20101231")]):
>>> start, end = period
>>> ax = dataset.plot_num_observations(
>>>     dynamic_features=['q_cms_obs'],
>>>     ax=ax,
>>>     start=start, end=end, show=False)
>>> ax.lines[idx].set_label(f'{start} to {end}')
>>> assert isinstance(ax, plt.Axes)
>>> ax.legend()
>>> plt.show()
plot_stations(stations: List[str] = 'all', marker='.', color: str = None, ax: Axes = None, show: bool = True, **kwargs) Axes[source]

plots coordinates of stations

Parameters:
  • stations – name/names of stations. If not given, all stations will be plotted. For names of stations, see stations().

  • marker – marker to use.

  • color (str, optional) – name of static feature to use as color.

  • ax (plt.Axes) – matplotlib axes to draw the plot. If not given, then new axes will be created.

  • show (bool)

  • **kwargs

Return type:

plt.Axes

Examples

>>> from aqua_fetch import RainfallRunoff
>>> dataset = RainfallRunoff('CAMELS_AUS')
>>> dataset.plot_stations()
>>> dataset.plot_stations(['1', '2', '3'])
>>> dataset.plot_stations(marker='o', ms=0.3)
>>> ax = dataset.plot_stations(marker='o', ms=0.3, show=False)
>>> ax.set_title("Stations")
>>> plt.show()
using area as color
>>> ds.plot_stations(color='area_km2')
q_mm(stations: str | List[str] = 'all') DataFrame[source]

returns streamflow in the units of milimeter per timestep (e.g. mm/day or mm/hour). This is obtained by diving q/area

Parameters:

stations (str/list) – name/names of stations. Default is all, which will return area of all stations. For names of stations, see stations().

Returns:

a pandas.DataFrame whose indices are time-steps and columns are catchment/station ids.

Return type:

pd.DataFrame

property start: str

returns starting date of data

Examples

>>> from aqua_fetch import RainfallRunoff
>>> dataset = RainfallRunoff('CAMELS_AUS')
>>> dataset.start()
property static_features: List[str]

returns names of static features as python list of strings

Examples

>>> from aqua_fetch import RainfallRunoff
>>> dataset = RainfallRunoff('CAMELS_AUS')
>>> dataset.static_features
stations() List[str][source]

Names/ids of stations/catchment/basins/gauges or whatever that would be used to index each catchment in the dataset. Every catchment has a unique name/id which can be used to fetch its data.

Examples

>>> from aqua_fetch import RainfallRunoff
>>> dataset = RainfallRunoff('CAMELS_AUS')
>>> dataset.stations()
stn_coords(stations: str | List[str] = 'all') DataFrame[source]

returns coordinates of stations as pandas.DataFrame with long and lat as columns.

Parameters:

stations – name/names of stations. If not given, coordinates of all stations will be returned. For names of stations, see stations().

Returns:

pandas.DataFrame with long and lat columns. The length of dataframe will be equal to number of stations wholse coordinates are to be fetched.

Return type:

pd.DataFrame

Examples

>>> from aqua_fetch import RainfallRunoff
>>> dataset = RainfallRunoff('CAMELS_CH')
>>> dataset.stn_coords() # returns coordinates of all stations
>>> dataset.stn_coords('2004')  # returns coordinates of station whose id is 2004
>>> dataset.stn_coords(['2004', '6004'])  # returns coordinates of two stations
>>> from aqua_fetch import RainfallRunoff
>>> dataset = RainfallRunoff('CAMELS_AUS')
>>> dataset.stn_coords() # returns coordinates of all stations
>>> dataset.stn_coords('912101A')  # returns coordinates of station whose id is 912101A
>>> dataset.stn_coords(['G0050115', '912101A'])  # returns coordinates of two stations

Low Level API

The low level API provides access to each individual dataset classes. This provides more control over the datasets.

class aqua_fetch.rr._RainfallRunoff(path: str = None, timestep: str = 'D', to_netcdf: bool = True, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]

Bases: Datasets

This is the parent class for invidual rainfall-runoff datasets like CAMELS-GB etc. This class is not meant to be for direct use. It is inherited by the child classes which are specific to a dataset like CAMELS-GB, CAMELS-AUS etc. This class first downloads the dataset if it is not already downloaded. Then the selected features for a selected catchment/station are fetched and provided to the user using the method fetch.

- path str/path
Type:

diretory of the dataset

- dynamic_features list

this dataset

Type:

tells which dynamic features are available in

- static_features list
Type:

a list of static features.

- static_attribute_categories list

are present in this category.

Type:

tells which kinds of static features

- stations : returns name/id of stations for which the data (dynamic features)

exists as list of strings.

- fetch : fetches all features (both static and dynamic type) of all

station/gauge_ids or a speficified station. It can also be used to fetch all features of a number of stations ids either by providing their guage_id or by just saying that we need data of 20 stations which will then be chosen randomly.

- fetch_dynamic_features :

fetches speficied dynamic features of one specified station. If the dynamic attribute is not specified, all dynamic features will be fetched for the specified station. If station is not specified, the specified dynamic features will be fetched for all stations.

- fetch_static_features :

works same as fetch_dynamic_features but for static features. Here if the category is not specified then static features of the specified station for all categories are returned.

stations : returns list of stations

__init__(path: str = None, timestep: str = 'D', to_netcdf: bool = True, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]
Parameters:
  • path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.

  • timestep (str) – This can only be set for datasets which are available at multiple timesteps such as LamaHCE or LamaHIce etc.

  • to_netcdf (bool) – whether the data should be saved in netCDF format or not If set to true, the data will be saved in netCDF format which can take time for the first time it is created. However, it leads to faster I/O operations in subsequent accesses.

  • overwrite (bool) – whether to overwrite existing files or not. If set to True, the data will be redownloaded.

  • verbosity (int) –

    This parameter determines the level of verbosity for logging messages.

    • 0: no message will be printed

    • 1: only important messages will be printed

    • >1: any higher value greater than 1 will result in more verbose output

  • kwargs – Any other keyword arguments for the parent Datasets class

area(stations: str | List[str] = 'all') Series[source]

Returns area (Km2) of all/selected catchments as pandas.Series

Parameters:

stations (str/list (default=None)) – name/names of stations. Default is all, which will return area of all stations

Returns:

a pandas.Series whose indices are catchment ids and values are areas of corresponding catchments.

Return type:

pd.Series

Examples

>>> from aqua_fetch import CAMELS_CH
>>> dataset = CAMELS_CH()
>>> dataset.area()  # returns area of all stations
>>> dataset.area('2004')  # returns area of station whose id is 2004
>>> dataset.area(['2004', '6004'])  # returns area of two stations
property boundary_id_map: str

Name of the attribute in the boundary (shapefile/.gpkg) file that will be used to map the catchment/station id to the geometry of the catchment/station. This is used to create the boundary id map. if not given, then the first attribute in the boundary file will be used.

property camels_dir

Directory where all camels datasets will be saved. This will under datasets directory

cms_to_mm(q_cms: Series) Series[source]

convert streamflow from cms to mm/timestep

property dyn_fname: str | PathLike

name of the .nc file which contains dynamic features. This file is created during dataset initialization only if to_netcdf is True and xarray is installed and the file does not already exists. The creation of this file can take some time however it leads to faster I/O operations.

property dyn_fpath_exists: bool

checks if the .nc file which contains dynamic features exists

property dyn_map: Dict[str, str]

A dictionary that maps dynamic features to their names in the dataset.

property dynamic_features: List[str]

Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.

Returns:

a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using fetch_dynamic_features().

Return type:

List[str]

property end: Timestamp

end of data

fetch(stations: str | list | int | float = 'all', dynamic_features: List[str] | str | None = 'all', static_features: str | List[str] | None = None, st: None | str = None, en: None | str = None, as_dataframe: bool = False, **kwargs) Tuple[DataFrame, Dict[str, DataFrame] | Dataset][source]

Fetches the features of one or more stations.

Parameters:
  • stations

    It can have following values:
    • int : number of (randomly selected) stations to fetch

    • float : fraction of (randomly selected) stations to fetch

    • strname/id of station to fetch. However, if all is

      provided, then all stations will be fetched.

    • list : list of names/ids of stations to fetch

  • dynamic_features (If not None, then it is the features to be) – fetched. If None, then all available features are fetched

  • static_features (list of static features to be fetches. None) – means no static attribute will be fetched.

  • st (starting date of data to be returned. If None, the data will be) – returned from where it is available.

  • en (end date of data to be returned. If None, then the data will be) – returned till the date data is available.

  • as_dataframe (whether to return dynamic features as pandas.DataFrame) – or as xarray.Dataset.

  • kwargs (keyword arguments to read the files)

Returns:

A tuple of static and dynamic features. Static features are always returned as pandas DataFrame with shape (stations, staticfeatures). The index of static features is the station/gauge ids while the columns are the static features. Dynamic features are returned as either xarray Dataset or a dictionary with keys as station names and values as pandas DataFrame. This depends upon whether as_dataframe is True or False and whether the xarray module is installed or not. If dynamic features are xarray Dataset, then it consists of data_vars equal to the number of stations and time adn dynamic_features as dimensions.

Return type:

tuple

Examples

>>> from aqua_fetch import CAMELS_AUS
>>> dataset = CAMELS_AUS()
>>> # get data of 10% of stations
>>> _, dynamic = dataset.fetch(stations=0.1, as_dataframe=True)  # dynamic is a dictionary
...  # fetch data of 5 (randomly selected) stations
>>> _, five_random_stn_data = dataset.fetch(stations=5, as_dataframe=True)
... # fetch data of 3 selected stations
>>> _, three_selec_stn_data = dataset.fetch(stations=['912101A','912105A','915011A'], as_dataframe=True)
... # fetch data of a single stations
>>> _, single_stn_data = dataset.fetch(stations='318076', as_dataframe=True)
... # get both static and dynamic features as dictionary
>>> static, dynamic = dataset.fetch(1, static_features="all", as_dataframe=True)  # -> dict
>>> dynamic
... # get only selected dynamic features
>>> _, sel_dyn_features = dataset.fetch(stations='318076',
...     dynamic_features=['q_mm_obs', 'solrad_wm2_silo'], as_dataframe=True)
... # fetch data between selected periods
>>> _, data = dataset.fetch(stations='318076', st="20010101", en="20101231", as_dataframe=True)
fetch_dynamic_features(station: str, dynamic_features='all', st=None, en=None, as_dataframe=False) DataFrame | Dataset[source]

Fetches all or selected dynamic features of one station.

Parameters:
  • station (str) – name/id of station of which to extract the data

  • features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available dynamic features are returned.

  • st (Optional (default=None)) – start time from where to fetch the data.

  • en (Optional (default=None)) – end time untill where to fetch the data

  • as_dataframe (bool, optional (default=False)) – if true, the returned data is pandas DataFrame otherwise it is xarray.Dataset

Returns:

a pandas dataframe or xarray dataset of dynamic features If as_dataframe is True, then the returned data is a pandas DataFrame whose index is time and the columns are dynamic_features. If as_dataframe is False, and xarray module is installed, then the returned data is xarray dataset with data_vars equal to the number of stations and time and dynamic_features as dimensions.

Return type:

pd.DataFrame/xr.Dataset

Examples

>>> from aqua_fetch import CAMELS_AUS
>>> camels = CAMELS_AUS()
>>> camels.fetch_dynamic_features('912101A', as_dataframe=True)
>>> camels.dynamic_features
>>> camels.fetch_dynamic_features('912101A',
... dynamic_features=['airtemp_C_awap_max', 'vp_hpa_awap', 'q_cms_obs'],
... as_dataframe=True)
fetch_static_features(stations: str | list = 'all', static_features: str | list = 'all') DataFrame[source]

Fetches all or selected static features of one or more stations.

Parameters:
  • stations (str/list) – name/id of station of which to extract the data

  • static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.

Returns:

a pandas.DataFrame

Return type:

pd.DataFrame

Examples

>>> from aqua_fetch import CAMELS_AUS
>>> camels = CAMELS_AUS()
>>> camels.fetch_static_features('912101A')
>>> camels.static_features
>>> camels.fetch_static_features('912101A',
... static_features=['elev_mean', 'relief', 'ksat', 'pop_mean'])
for CAMELS_FR
>>> from aqua_fetch import CAMELS_FR
>>> dataset = CAMELS_FR()
get the names of stations
>>> stns = dataset.stations()
>>> len(stns)
    654
get all static data of all stations
>>> static_data = dataset.fetch_static_features(stns)
>>> static_data.shape
   (472, 210)
get static data of one station only
>>> static_data = dataset.fetch_static_features('42600042')
>>> static_data.shape
   (1, 210)
get the names of static features
>>> dataset.static_features
get only selected features of all stations
>>> static_data = dataset.fetch_static_features(stns, ['slope_mean', 'aridity'])
>>> static_data.shape
   (472, 2)
>>> data = dataset.fetch_static_features('42600042', static_features=['slope_mean', 'aridity'])
>>> data.shape
   (1, 2)
fetch_station_features(station: str, dynamic_features: str | list | None = 'all', static_features: str | list | None = None, st: str | None = None, en: str | None = None, **kwargs) tuple[DataFrame, DataFrame][source]

Fetches features for one station.

Parameters:
  • station – station id/gauge id for which the data is to be fetched.

  • dynamic_features (str/list, optional) – names of dynamic features/attributes to fetch

  • static_features – names of static features/attributes to be fetches

  • st (str,optional) – starting point from which the data to be fetched. By default, the data will be fetched from where it is available.

  • en (str, optional) – end point of data to be fetched. By default the dat will be fetched

Returns:

A tuple of static and dynamic features, both as pandas.DataFrame. The dataframe of static features will be of single row while the dynamic features will be of shape (time, dynamic features).

Return type:

tuple

Examples

>>> from aqua_fetch import CAMELS_AUS
>>> dataset = CAMELS_AUS()
>>> static, dynamic = dataset.fetch_station_features('912101A')
>>> static.shape, dynamic.shape
fetch_stations_features(stations: str | List[str], dynamic_features: str | List[str] = 'all', static_features: str | List[str] = None, st: str | Timestamp = None, en: str | Timestamp = None, as_dataframe: bool = False, **kwargs) Tuple[DataFrame, Dict[str, DataFrame] | Dataset][source]

Reads features of more than one stations.

Parameters:
  • stations – list of stations for which data is to be fetched.

  • dynamic_features – list of dynamic features to be fetched. if all, then all dynamic features will be fetched.

  • static_features (list of static features to be fetched.) – If all, then all static features will be fetched. If None, `then no static attribute will be fetched.

  • st – start of data to be fetched.

  • en – end of data to be fetched.

  • as_dataframe – whether to return the dynamic data as pandas dataframe. default is xarray.Dataset object

  • dict (kwargs) – additional keyword arguments

Returns:

  • tuple – A tuple of static and dynamic features. Static features are always returned as pandas.DataFrame with shape (stations, staticfeatures). The index of static features is the station/gauge ids while the columns are the static features. Dynamic features are returned as either xarray.Dataset or a dict with keys as station names and values as pandas.DataFrame depending upon whether as_dataframe is True or False and whether the xarray module is installed or not. If dynamic features are xarray Dataset, then it consists of data_vars equal to the number of stations and time and dynamic_features as dimensions.

  • Raises – ValueError, if both dynamic_features and static_features are None

Examples

>>> from aqua_fetch import CAMELS_AUS
>>> dataset = CAMELS_AUS()
... # find out station ids
>>> dataset.stations()
... # get data of selected stations as xarray Dataset
>>> dataset.fetch_stations_features(['912101A', '912105A', '915011A'])
... # get data of selected stations as dictionary of pandas DataFrame
>>> dataset.fetch_stations_features(['912101A', '912105A', '915011A'],
...  as_dataframe=True)
... # get both dynamic and static features of selected stations
>>> dataset.fetch_stations_features(['912101A', '912105A', '915011A'],
... dynamic_features=['q_mm_obs', 'airtemp_C_mean_silo'], static_features=['elev_mean'])
get_boundary(catchment_id: str, to_wgs84: bool = True)[source]

returns boundary of a catchment in a required format

Parameters:
  • catchment_id (str) – name/id of catchment

  • to_wgs84 (bool, optional (default=True)) – if True, then the boundary will be transformed to WGS84 (EPSG:4326) if it is not already in WGS84.

Returns:

geometry

Return type:

fiona.Geometry

Examples

>>> from aqua_fetch import CAMELS_SE
>>> dataset = CAMELS_SE()
>>> dataset.get_boundary(dataset.stations()[0])
static mean_temp(tmin: Series, tmax: Series) Series[source]

calculates mean temperature from tmin and tmax

mm_to_cms(q_mm: Series) Series[source]

converts discharge from mm/timestep to cms

plot_catchment(catchment_id: str, show_outlet: bool = False, ax: Axes = None, show: bool = True, **kwargs)[source]

plots catchment boundaries

Parameters:
  • catchment_id (str) – name/id of catchment to plot

  • show_outlet (bool, optional (default=False)) – if True, then outlet of the catchment will be plotted as a red dot

  • ax (plt.Axes) – matplotlib axes to draw the plot. If not given, then new axes will be created.

  • show (bool)

  • **kwargs

Return type:

plt.Axes

Examples

>>> from aqua_fetch import CAMELS_AUS
>>> dataset = CAMELS_AUS()
>>> dataset.plot_catchment('912101A')
>>> dataset.plot_catchment('912101A', marker='o', ms=0.3)
>>> ax = dataset.plot_catchment('912101A', marker='o', ms=0.3, show=False)
>>> ax.set_title("Catchment Boundary")
>>> plt.show()
# show the outlet as well
>>> CAMELS_AUS.plot_catchment('912101A', show_outlet=True)
plot_num_observations(stations: str | List[str] = 'all', dynamic_features: str | List[str] = 'all', start: str | Timestamp = None, end: str | Timestamp = None, show_constant: bool = False, figsize: Tuple[float, float] = None, ax=None, show: bool = True)[source]

Plots the number of observations available for different dynamic features as cumulative distribution function (CDF). This plot is not plotted if all stations have same number of observations for a dynamic feature.

Parameters:
  • stations (Union[str, List[str]]) – The stations to include in the plot. If ‘all’, all stations will be included.

  • dynamic_features (Union[str, List[str]]) – The dynamic features to include in the plot. If ‘all’, all dynamic features will be included.

  • start (Union[str, pd.Timestamp], optional) – The start date for the data to consider. If None, the start date of the dataset will be used.

  • end (Union[str, pd.Timestamp], optional) – The end date for the data to consider. If None, the end date of the dataset will be used.

  • show_constant (bool, optional) – Whether to show features with constant number of observations across stations. If True, these features will be included in the plot as well.

  • figsize (Tuple[float, float], optional) – The size of the figure to create. If None, a default size will be used.

  • ax (plt.Axes, optional) – The matplotlib axes to draw the plot. If not given, then new axes will be created.

  • show (bool, optional) – Whether to display the plot immediately.

Returns:

The matplotlib axes containing the plot.

Return type:

plt.Axes

Examples

>>> from aqua_fetch import CAMELS_FI
>>> dataset = CAMELS_FI()
>>> dataset.plot_num_observations()
# plotting for different time periods
>>> dataset = RainfallRunoff('CAMELS_COL')
>>> _, ax = plt.subplots()
>>> for idx, period in enumerate([("19810101", "19901231"), ("19910101", "20001231"), ("20010101", "20101231")]):
>>>     start, end = period
>>>     ax = dataset.plot_num_observations(
>>>         dynamic_features=['q_cms_obs'],
>>>         ax=ax,
>>>         start=start, end=end, show=False)
>>>     ax.lines[idx].set_label(f'{start} to {end}')
>>>     assert isinstance(ax, plt.Axes)
>>> ax.legend()
>>> plt.show()
plot_stations(stations: List[str] = 'all', marker='.', color: str = None, ax: Axes = None, show: bool = True, **kwargs) Axes[source]

plots coordinates of stations

Parameters:
  • stations – name/names of stations. If not given, all stations will be plotted

  • marker – marker to use.

  • color (str, optional) – name of static feature to use as color.

  • ax (plt.Axes) – matplotlib axes to draw the plot. If not given, then new axes will be created.

  • show (bool)

  • **kwargs

Return type:

plt.Axes

Examples

>>> from aqua_fetch import CAMELS_AUS
>>> dataset = CAMELS_AUS()
>>> dataset.plot_stations()
>>> dataset.plot_stations(['1', '2', '3'])
>>> dataset.plot_stations(marker='o', ms=0.3)
>>> ax = dataset.plot_stations(marker='o', ms=0.3, show=False)
>>> ax.set_title("Stations")
>>> plt.show()
using area as color
>>> ds.plot_stations(color='area_km2')
q_mm(stations: str | List[str] = 'all') DataFrame[source]

returns streamflow in the units of milimeter per timestep (e.g. mm/day or mm/hour). This is obtained by diving q/area

Parameters:

stations (str/list) – name/names of stations. Default is all, which will return q_mm of all stations

Returns:

a pandas.DataFrame whose indices are time-steps and columns are catchment/station ids.

Return type:

pd.DataFrame

property static_factors: Dict[str, str]

A dictionary that maps static features to the factors with they needs to be multiplied to get the actual value

property static_features: List[str]

Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.

Returns:

a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using fetch_static_features().

Return type:

List[str]

property static_map: Dict[str, str]

A dictionary that maps static features to their names in the dataset.

stations() List[str][source]

Names/ids of stations/catchment/gauges or whatever that would be used to index each station in the dataset.

stn_coords(stations: str | List[str] = 'all') DataFrame[source]

returns coordinates of stations as DataFrame with long and lat as columns.

Parameters:

stations – name/names of stations. If not given, coordinates of all stations will be returned.

Returns:

pandas.DataFrame with long and lat columns. The length of dataframe will be equal to number of stations wholse coordinates are to be fetched.

Return type:

pd.DataFrame

Examples

>>> from aqua_fetch import CAMELS_CH
>>> dataset = CAMELS_CH()
>>> dataset.stn_coords() # returns coordinates of all stations
>>> dataset.stn_coords('2004')  # returns coordinates of station whose id is 2004
>>> dataset.stn_coords(['2004', '6004'])  # returns coordinates of two stations
>>> from aqua_fetch import CAMELS_AUS
>>> dataset = CAMELS_AUS()
>>> dataset.stn_coords() # returns coordinates of all stations
>>> dataset.stn_coords('912101A')  # returns coordinates of station whose id is 912101A
>>> dataset.stn_coords(['G0050115', '912101A'])  # returns coordinates of two stations
transform_boundary(xyz: ndarray) ndarray[source]

transforms boundary coordinates from projected to geographic

must be implemented in base classes

transform_stn_coords(df: DataFrame) DataFrame[source]

transforms coordinates from geographic to projected

must be implemented in base classes

class aqua_fetch.rr._gsha._GSHA(gsha_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]

Bases: _RainfallRunoff

Parent class for those datasets which uses static and dynamic features from GSHA dataset . The following dataset classes are based on this class:

__init__(gsha_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
Parameters:
  • path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.

  • timestep (str) – This can only be set for datasets which are available at multiple timesteps such as LamaHCE or LamaHIce etc.

  • to_netcdf (bool) – whether the data should be saved in netCDF format or not If set to true, the data will be saved in netCDF format which can take time for the first time it is created. However, it leads to faster I/O operations in subsequent accesses.

  • overwrite (bool) – whether to overwrite existing files or not. If set to True, the data will be redownloaded.

  • verbosity (int) –

    This parameter determines the level of verbosity for logging messages.

    • 0: no message will be printed

    • 1: only important messages will be printed

    • >1: any higher value greater than 1 will result in more verbose output

  • kwargs – Any other keyword arguments for the parent Datasets class

property dynamic_features: List[str]

Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.

Returns:

a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using fetch_dynamic_features().

Return type:

List[str]

property end: Timestamp

end of data

fetch_static_features(stations: str | List[str] = 'all', static_features: str | List[str] = 'all', st=None, en=None) DataFrame[source]

returns static atttributes of one or multiple stations

Parameters:
  • stations (str) – name/id of station of which to extract the data

  • static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.

  • st

  • en

Examples

>>> from aqua_fetch import Japan
>>> dataset = Japan()
get the names of stations
>>> stns = dataset.stations()
>>> len(stns)
    12004
get all static data of all stations
>>> static_data = dataset.fetch_static_features(stns)
>>> static_data.shape
   (12004, 27)
get static data of one station only
>>> static_data = dataset.fetch_static_features('01010070')
>>> static_data.shape
   (1, 27)
get the names of static features
>>> dataset.static_features
get only selected features of all stations
>>> static_data = dataset.fetch_static_features(stns, ['Drainage_Area_km2', 'Elevation_m'])
>>> static_data.shape
   (12004, 2)
fetch_stations_features(stations: list, dynamic_features: str | list | None = 'all', static_features: str | list | None = None, st=None, en=None, as_dataframe: bool = False, **kwargs) Tuple[DataFrame, DataFrame | Dataset][source]

returns features of multiple stations

Examples

>>> from aqua_fetch import Arcticnet
>>> dataset = Arcticnet()
>>> stations = dataset.stations()
>>> features = dataset.fetch_stations_features(stations)
Returns:

A tuple of static and dynamic features. Static features are always returned as pandas.DataFrame with shape (stations, staticfeatures). The index of static features is the station/gauge ids while the columns are the static features. Dynamic features are returned as either xarray Dataset or pandas.DataFrame depending upon whether as_dataframe is True or False and whether the xarray module is installed or not. If dynamic features are xarray Dataset, then it consists of data_vars equal to the number of stations and time adn dynamic_features as dimensions. If dynamic features are returned as pandas DataFrame, then the first index is time and the second index is dynamic_features.

Return type:

tuple

property static_features: List[str]

Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.

Returns:

a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using fetch_static_features().

Return type:

List[str]

stations() List[str][source]

Names/ids of stations/catchment/gauges or whatever that would be used to index each station in the dataset.

class aqua_fetch.rr._estreams._EStreams(path: str | PathLike = None, estreams_path: str | PathLike = None, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]

Bases: _RainfallRunoff

Parent/Helper class for those datasets which use static and dynamic data from EStreams. It handles specifically following classes

  • aqua_fetch.Finland

  • aqua_fetch.Ireland

  • aqua_fetch.Italy

  • aqua_fetch.Poland

  • aqua_fetch.Portugal

  • aqua_fetch.Slovenia

__init__(path: str | PathLike = None, estreams_path: str | PathLike = None, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]
Parameters:
  • path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.

  • timestep (str) – This can only be set for datasets which are available at multiple timesteps such as LamaHCE or LamaHIce etc.

  • to_netcdf (bool) – whether the data should be saved in netCDF format or not If set to true, the data will be saved in netCDF format which can take time for the first time it is created. However, it leads to faster I/O operations in subsequent accesses.

  • overwrite (bool) – whether to overwrite existing files or not. If set to True, the data will be redownloaded.

  • verbosity (int) –

    This parameter determines the level of verbosity for logging messages.

    • 0: no message will be printed

    • 1: only important messages will be printed

    • >1: any higher value greater than 1 will result in more verbose output

  • kwargs – Any other keyword arguments for the parent Datasets class

property dynamic_features: List[str]

Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.

Returns:

a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using fetch_dynamic_features().

Return type:

List[str]

property end: Timestamp

end of data

fetch_static_features(stations: str | List[str] = 'all', static_features: str | List[str] = 'all', countries: List[str] = 'all') DataFrame[source]

returns static atttributes of one or multiple stations

Parameters:
  • stations (str) – name/id of station of which to extract the data

  • static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.

Examples

>>> from aqua_fetch import Japan
>>> dataset = Japan()
get the names of stations
>>> stns = dataset.stations()
>>> len(stns)
    12004
get all static data of all stations
>>> static_data = dataset.fetch_static_features(stns)
>>> static_data.shape
   (12004, 27)
get static data of one station only
>>> static_data = dataset.fetch_static_features('01010070')
>>> static_data.shape
   (1, 27)
get the names of static features
>>> dataset.static_features
get only selected features of all stations
>>> static_data = dataset.fetch_static_features(stns, ['Drainage_Area_km2', 'Elevation_m'])
>>> static_data.shape
   (12004, 2)
fetch_stations_features(stations: list, dynamic_features: str | list | None = 'all', static_features: str | list | None = None, st=None, en=None, as_dataframe: bool = False, **kwargs)[source]

returns features of multiple stations

Returns:

A tuple of static and dynamic features. Static features are always returned as pandas.DataFrame with shape (stations, static features). The index of static features’ DataFrame is the station/gauge ids while the columns are names of the static features. Dynamic features are returned either as xarray.Dataset or a python dictionary whose keys are station names and values are pandas.DataFrame depending upon whether as_dataframe is True or False and whether the xarray library is installed or not. If dynamic features are xarray.Dataset, then this dataset consists of data_vars equal to the number of stations and station names as xarray.Dataset.variables and time and dynamic_features as dimensions and coordinates.

Return type:

tuple

Examples

>>> from aqua_fetch import Arcticnet
>>> dataset = Arcticnet()
>>> stations = dataset.stations()
>>> features = dataset.fetch_stations_features(stations)
gauge_id_basin_id_map() dict[source]

For example for Portugal, it is guage_id : ‘03J/02H’ basin_id ‘PT000001’ ‘03J/02H’ -> ‘PT000001’

for Slovenia, it is gauge id : 1060 basin_id : SI000001 ‘1060’ -> ‘SI000001’

property static_features: List[str]

Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.

Returns:

a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using fetch_static_features().

Return type:

List[str]

stations() List[str][source]

Returns a list of all station names. Note that the basin_id column is used as the station name.

class aqua_fetch.Arcticnet(path: str | PathLike = None, gsha_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]

Bases: _GSHA

Data of 106 catchments of arctic region from r-arcticnet project . The meteorological data static catchment features and catchment boundaries taken from GSHA project. Therefore, the number of static features are 35 and dynamic features are 27 and the data is available from 1979-01-01 to 2003-12-31 although the observed streamflow (q_cms_obs) for some stations is available as earlier as from 1913-01-01.

__init__(path: str | PathLike = None, gsha_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
Parameters:
  • path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.

  • timestep (str) – This can only be set for datasets which are available at multiple timesteps such as LamaHCE or LamaHIce etc.

  • to_netcdf (bool) – whether the data should be saved in netCDF format or not If set to true, the data will be saved in netCDF format which can take time for the first time it is created. However, it leads to faster I/O operations in subsequent accesses.

  • overwrite (bool) – whether to overwrite existing files or not. If set to True, the data will be redownloaded.

  • verbosity (int) –

    This parameter determines the level of verbosity for logging messages.

    • 0: no message will be printed

    • 1: only important messages will be printed

    • >1: any higher value greater than 1 will result in more verbose output

  • kwargs – Any other keyword arguments for the parent Datasets class

property end: Timestamp

end of data

property static_map: Dict[str, str]

A dictionary that maps static features to their names in the dataset.

stations() List[str][source]

Names/ids of stations/catchment/gauges or whatever that would be used to index each station in the dataset.

class aqua_fetch.Bull(path, overwrite=False, **kwargs)[source]

Bases: _RainfallRunoff

Following the works of Aparicio et al., 2024. The data is taken from the Zenodo repository. This dataset contains 484 stations with 55 dynamic (time series) features and 214 static features. The dynamic features span from 1951 to 2021.

Examples

>>> from aqua_fetch import Bull
>>> dataset = Bull()
... # get data by station id
>>> _, dynamic = dataset.fetch(stations='BULL_9007', as_dataframe=True)
>>> df = dynamic['BULL_9007'] # dynamic is a dictionary of with keys as station names and values as DataFrames
>>> df.shape
(25932, 55)
...
... # get name of all stations as list
>>> stns = dataset.stations()
>>> len(stns)
   484
... # get data of 10 % of stations as dataframe
>>> _, dynamic = dataset.fetch(0.1, as_dataframe=True)
>>> len(dynamic)  # dynamic has data for 10% of stations (48 out of 484)
   48
...
... # dynamic is a dictionary whose values are dataframes of dynamic features
>>> [df.shape for df in dynamic.values()]
    [(25932, 55), (25932, 55), (25932, 55),... (25932, 55), (25932, 55)]
...
... get the data of a single (randomly selected) station
>>> _, dynamic = dataset.fetch(stations=1, as_dataframe=True)
>>> len(dynamic)  # dynamic has data for 1 station
    1
... # get names of available dynamic features
>>> dataset.dynamic_features
... # get only selected dynamic features
>>> _, dynamic = dataset.fetch('BULL_9007', as_dataframe=True,
...  dynamic_features=['pet_mm_AEMET',  'airtemp_C_mean_AEMET', 'pcp_mm_ERA5Land', 'q_obs_cms'])
>>> dynamic['BULL_9007'].shape
   (25932, 4)
...
... # get names of available static features
>>> dataset.static_features
... # get data of 10 random stations
>>> _, dynamic = dataset.fetch(10, as_dataframe=True)
>>> len(dynamic)  # remember this is a dictionary with values as dataframe
   10
...
# If we get both static and dynamic data
>>> static, dynamic = dataset.fetch(stations='BULL_9007', static_features="all", as_dataframe=True)
>>> static.shape, len(dynamic), dynamic['BULL_9007'].shape
((1, 214), 1, (25932, 55))
...
# If we don't set as_dataframe=True and have xarray installed then the returned data will be a xarray Dataset
>>> _, dynamic = dataset.fetch(10)
... type(dynamic)
xarray.core.dataset.Dataset
...
>>> dynamic.dims
FrozenMappingWarningOnValuesAccess({'time': 25932, 'dynamic_features': 55})
...
>>> len(dynamic.data_vars)
10
...
>>> coords = dataset.stn_coords() # returns coordinates of all stations
>>> coords.shape
    (484, 2)
>>> dataset.stn_coords('BULL_9007')  # returns coordinates of station whose id is BULL_9007
    41.298  -1.967
>>> dataset.stn_coords(['BULL_9007', 'BULL_8083'])  # returns coordinates of two stations
...
# get area of a single station
>>> dataset.area('BULL_9007')
# get coordinates of two stations
>>> dataset.area(['BULL_9007', 'BULL_8083'])
...
# if fiona library is installed we can get the boundary as fiona Geometry
>>> dataset.get_boundary('BULL_9007')
__init__(path, overwrite=False, **kwargs)[source]
Parameters:
  • path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.

  • timestep (str) – This can only be set for datasets which are available at multiple timesteps such as LamaHCE or LamaHIce etc.

  • to_netcdf (bool) – whether the data should be saved in netCDF format or not If set to true, the data will be saved in netCDF format which can take time for the first time it is created. However, it leads to faster I/O operations in subsequent accesses.

  • overwrite (bool) – whether to overwrite existing files or not. If set to True, the data will be redownloaded.

  • verbosity (int) –

    This parameter determines the level of verbosity for logging messages.

    • 0: no message will be printed

    • 1: only important messages will be printed

    • >1: any higher value greater than 1 will result in more verbose output

  • kwargs – Any other keyword arguments for the parent Datasets class

caravan_attributes() DataFrame[source]

a dataframe of shape (484, 10)

property dyn_map

A dictionary that maps dynamic features to their names in the dataset.

property dynamic_features: List[str]

Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.

Returns:

a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using fetch_dynamic_features().

Return type:

List[str]

property end

end of data

hydroatlas_attributes() DataFrame[source]

a dataframe of shape (484, 197)

other_attributes() DataFrame[source]

a dataframe of shape (484, 7)

property static_features: List[str]

Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.

Returns:

a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using fetch_static_features().

Return type:

List[str]

property static_map: Dict[str, str]

A dictionary that maps static features to their names in the dataset.

stations() List[str][source]

Names/ids of stations/catchment/gauges or whatever that would be used to index each station in the dataset.

class aqua_fetch.rr.CABra(path=None, overwrite=False, to_netcdf: bool = True, met_src: str = 'ens', **kwargs)[source]

Bases: _RainfallRunoff

Reads and fetches CABra dataset which is catchment attribute dataset following the work of Almagro et al., 2021 This dataset consists of 87 static and 13 dynamic features of 735 Brazilian catchments. The temporal extent is from 1980 to 2020. The dyanmic features consist of daily hydro-meteorological time series

Examples

>>> from aqua_fetch import CABra
>>> dataset = CABra()
... # get data by station id
>>> _, dynamic = dataset.fetch(stations='92', as_dataframe=True)
>>> df = dynamic['92'] # dynamic is a dictionary of with keys as station names and values as DataFrames
>>> df.shape
(10956, 13)
...
... # get name of all stations as list
>>> stns = dataset.stations()
>>> len(stns)
   735
... # get data of 10 % of stations as dataframe
>>> _, dynamic = dataset.fetch(0.1, as_dataframe=True)
>>> len(dynamic)  # dynamic has data for 10% of stations (73 out of 735)
   73
...
... # dynamic is a dictionary whose values are dataframes of dynamic features
>>> [df.shape for df in dynamic.values()]
    [(10956, 13), (10956, 13), (10956, 13),... (10956, 13), (10956, 13)]
...
... get the data of a single (randomly selected) station
>>> _, dynamic = dataset.fetch(stations=1, as_dataframe=True)
>>> len(dynamic)  # dynamic has data for 1 station
    1
... # get names of available dynamic features
>>> dataset.dynamic_features
... # get only selected dynamic features
>>> _, dynamic = dataset.fetch('92', as_dataframe=True,
...  dynamic_features=['pcp_mm_ens', 'airtemp_C_ens_max', 'pet_mm_pm', 'rh_%_ens', 'q_cms_obs'])
>>> dynamic['92'].shape
   (10956, 4)
...
... # get names of available static features
>>> dataset.static_features
... # get data of 10 random stations
>>> _, dynamic = dataset.fetch(10, as_dataframe=True)
>>> len(dynamic)  # remember this is a dictionary with values as dataframe
   10
...
# If we get both static and dynamic data
>>> static, dynamic = dataset.fetch(stations='92', static_features="all", as_dataframe=True)
>>> static.shape, len(dynamic), dynamic['92'].shape
((1, 87), 1, (10956, 13))

# If we don’t set as_dataframe=True and have xarray installed then the returned data will be a xarray Dataset >>> _, dynamic = dataset.fetch(10) … type(dynamic) xarray.core.dataset.Dataset … >>> dynamic.dims FrozenMappingWarningOnValuesAccess({‘time’: 10956, ‘dynamic_features’: 13}) … >>> len(dynamic.data_vars) 10 … >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape

(735, 2)

>>> dataset.stn_coords('92')  # returns coordinates of station whose id is 92
    -2.509  -47.764
>>> dataset.stn_coords(['92', '5'])  # returns coordinates of two stations
...
# get area of a single station
>>> dataset.area('92')
# get coordinates of two stations
>>> dataset.area(['92', '5'])
...
# if fiona library is installed we can get the boundary as fiona Geometry
>>> dataset.get_boundary('92')
__init__(path=None, overwrite=False, to_netcdf: bool = True, met_src: str = 'ens', **kwargs)[source]
Parameters:
  • path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless overwrite is set to True.

  • overwrite (bool) – If the data is already down then you can set it to True, to make a fresh download.

  • to_netcdf (bool) – whether to convert all the data into one netcdf file or not. This will fasten repeated calls to fetch etc but will require netCDF4 package as well as xarry.

  • met_src (str) – source of meteorological data, must be one of ens, era5 or ref.

add_attrs() DataFrame[source]

Returns additional catchment attributes

property boundary_id_map: str

Name of the attribute in the boundary (.shp/.gpkg) file that will be used to map the catchment/station id to the geometry of the catchment/station. This is used to create the boundary id map.

climate_attrs() DataFrame[source]

returns climate attributes for all catchments

property dyn_fname: str | PathLike

name of the .nc file which contains dynamic features. This file is created during dataset initialization only if to_netcdf is True and xarray is installed and the file does not already exists. The creation of this file can take some time however it leads to faster I/O operations.

property dyn_map

A dictionary that maps dynamic features to their names in the dataset.

property dynamic_features: List[str]

Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.

Returns:

a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using fetch_dynamic_features().

Return type:

List[str]

property end: Timestamp

end of data

general_attrs() DataFrame[source]

returns general attributes for all catchments

geology_attrs() DataFrame[source]

returns geological attributes for all catchments

gw_attrs() DataFrame[source]

returns groundwater attributes for all catchments

hydro_distrub_attrs() DataFrame[source]

returns geological attributes for all catchments

lc_attrs() DataFrame[source]

returns land cover attributes for all catchments

q_attrs() DataFrame[source]

returns streamflow attributes for all catchments

soil_attrs() DataFrame[source]

returns soil attributes for all catchments

property static_features: List[str]

names of static features

property static_map: Dict[str, str]

A dictionary that maps static features to their names in the dataset.

stations() List[str][source]

Names/ids of stations/catchment/gauges or whatever that would be used to index each station in the dataset.

topology_attrs() DataFrame[source]

returns topology attributes for all catchments

class aqua_fetch.rr.CAMELSH(path=None, overwrite=False, timestep='H', **kwargs)[source]

Bases: _RainfallRunoff

Hourly data of 5,767 catchments from United States of America with 13 dynamic features and 779 static features for each catchment. For more details on data see Tran et al., (2025) . The dynamic features span from 19800101 to 20241231 . The data is downloaded from Zenodo.

Please note that usage of this dataset requires xarray and netCDF4 libraries.

Examples

>>> from aqua_fetch import CAMELSH
>>> dataset = CAMELSH()
... # get name of all stations as list
>>> stns = dataset.stations()
>>> len(stns)
   5767
... # get data by station id/name
>>> _, dynamic = dataset.fetch(stations='02342070', as_dataframe=True)
>>> df = dynamic['02342070'] # dynamic is a dictionary of with keys as station names and values as DataFrames
>>> df.shape
(394488, 13)
...
... # get data of 10 % of stations as dataframe
>>> _, dynamic = dataset.fetch(0.1, as_dataframe=True)
>>> len(dynamic)  # dynamic has data for 10% of stations (67 out of 5767)
   67
...
... # dynamic is a dictionary whose values are dataframes of dynamic features
>>> [df.shape for df in dynamic.values()]
    [(394488, 13), (394488, 8), (394488, 13),... (394488, 13), (394488, 13)]
...
... get the data of a single (randomly selected) station
>>> _, dynamic = dataset.fetch(stations=1, as_dataframe=True)
>>> len(dynamic)  # dynamic has data for 1 station
    1
... # get names of available dynamic features
>>> dataset.dynamic_features
... # get only selected dynamic features
>>> _, dynamic = dataset.fetch('02342070', as_dataframe=True,
...  dynamic_features=['SWdown', 'pcp_mm', 'pet_mm', 'airtemp_C_mean', 'q_cms_obs'])
>>> dynamic['02342070'].shape
   (394488, 5)
...
... # get names of available static features
>>> dataset.static_features
... # get data of 10 random stations
>>> _, dynamic = dataset.fetch(10, as_dataframe=True)
>>> len(dynamic)  # remember this is a dictionary with values as dataframe
   10
...
# If we get both static and dynamic data
>>> static, dynamic = dataset.fetch(stations='02342070', static_features="all", as_dataframe=True)
>>> static.shape, len(dynamic), dynamic['02342070'].shape
((1, 779), 1, (394488, 13))
...
# If we don't set as_dataframe=True and have xarray installed then the returned data will be a xarray Dataset
>>> _, dynamic = dataset.fetch(10)
... type(dynamic)
xarray.core.dataset.Dataset
...
>>> dynamic.dims
FrozenMappingWarningOnValuesAccess({'time': 394488, 'dynamic_features': 8})
...
>>> len(dynamic.data_vars)
10
...
>>> coords = dataset.stn_coords() # returns coordinates of all stations
>>> coords.shape
    (5767, 2)
>>> dataset.stn_coords('02342070')  # returns coordinates of station whose id is 02342070
    32.37431        -84.957993
>>> dataset.stn_coords(['02342070', '14316700'])  # returns coordinates of two stations
...
# get area of a single station
>>> dataset.area('02342070')
# get coordinates of two stations
>>> dataset.area(['02342070', '14316700'])
...
# if fiona library is installed we can get the boundary as fiona Geometry
>>> dataset.get_boundary('02342070')
__init__(path=None, overwrite=False, timestep='H', **kwargs)[source]
Parameters:
  • path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.

  • timestep (str) – This can only be set for datasets which are available at multiple timesteps such as LamaHCE or LamaHIce etc.

  • to_netcdf (bool) – whether the data should be saved in netCDF format or not If set to true, the data will be saved in netCDF format which can take time for the first time it is created. However, it leads to faster I/O operations in subsequent accesses.

  • overwrite (bool) – whether to overwrite existing files or not. If set to True, the data will be redownloaded.

  • verbosity (int) –

    This parameter determines the level of verbosity for logging messages.

    • 0: no message will be printed

    • 1: only important messages will be printed

    • >1: any higher value greater than 1 will result in more verbose output

  • kwargs – Any other keyword arguments for the parent Datasets class

property boundary_id_map: str

Name of the attribute in the boundary (shapefile/.gpkg) file that will be used to map the catchment/station id to the geometry of the catchment/station. This is used to create the boundary id map. if not given, then the first attribute in the boundary file will be used.

collate_forcing_data()[source]

Collate forcing data of all stations into a single NetCDF file using multiprocessing.

property dyn_map: Dict[str, str]

A dictionary that maps dynamic features to their names in the dataset.

property dynamic_features: List[str]

Returns a list of dynamic features that are available in the dataset.

Returns:

a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using fetch_dynamic_features().

Return type:

List[str]

fetch_q(stations: List[str] = 'all')[source]

Since fetching q from other methods can be slower because of merging with other dynamic (forcing) features, this method fetches only observed streamflow data for given stations using multiprocessing.

Returns:

xarray Dataset whose data variables are station names and dimensions are time and dynamic features

Return type:

xr.Dataset

fetch_stations_features(stations: str | List[str], dynamic_features: str | List[str] = 'all', static_features: str | List[str] = None, st: str | Timestamp = None, en: str | Timestamp = None, as_dataframe: bool = False, **kwargs) Tuple[DataFrame, Dict[str, DataFrame] | Dataset][source]

Reads features of more than one stations.

Parameters:
  • stations – list of stations for which data is to be fetched.

  • dynamic_features – list of dynamic features to be fetched. if all, then all dynamic features will be fetched.

  • static_features (list of static features to be fetched.) – If all, then all static features will be fetched. If None, `then no static attribute will be fetched.

  • st – start of data to be fetched.

  • en – end of data to be fetched.

  • as_dataframe – whether to return the dynamic data as pandas dataframe. default is xarray.Dataset object

  • dict (kwargs) – additional keyword arguments

Returns:

  • tuple – A tuple of static and dynamic features. Static features are always returned as pandas.DataFrame with shape (stations, staticfeatures). The index of static features is the station/gauge ids while the columns are the static features. Dynamic features are returned as either xarray.Dataset or a dict with keys as station names and values as pandas.DataFrame depending upon whether as_dataframe is True or False and whether the xarray module is installed or not. If dynamic features are xarray Dataset, then it consists of data_vars equal to the number of stations and time and dynamic_features as dimensions.

  • Raises – ValueError, if both dynamic_features and static_features are None

Examples

>>> from aqua_fetch import CAMELSH
>>> dataset = CAMELSH()
... # find out station ids
>>> dataset.stations()
... # get data of selected stations as xarray Dataset
>>> dataset.fetch_stations_features(['01141800', '02349900', '11062000'])
... # get data of selected stations as dictionary of pandas DataFrame
>>> dataset.fetch_stations_features(['01141800', '02349900', '11062000'],
...  as_dataframe=True)
... # get both dynamic and static features of selected stations
>>> dataset.fetch_stations_features(['01141800', '02349900', '11062000'],
... dynamic_features=['q_mm_obs', 'air_temp_C', 'pcp_mm'], static_features=['elev_catch_m'])
q_mm(stations: str | List[str] = 'all', as_dataframe: bool = True) DataFrame[source]

returns streamflow in the units of milimeter per timestep (mm/hour). This is obtained by diving q by area.

Parameters:
  • stations (str/list) – name/names of stations. Default is all, which will return q_mm of all stations

  • as_dataframe (bool) – whether to return the data as pandas DataFrame. Default is True. Setting it to False will return xarray Dataset and can be faster.

Returns:

a pandas.DataFrame whose indices are time-steps and columns are catchment/station ids.

Return type:

pd.DataFrame or xr.Dataset

property static_map: Dict[str, str]

A dictionary that maps static features to their names in the dataset.

stations() List[str][source]

Names/ids of stations/catchment/gauges or whatever that would be used to index each station in the dataset.

class aqua_fetch.rr.CAMELS_AUS(path: str = None, version: int = 2, to_netcdf: bool = True, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]

Bases: _RainfallRunoff

This is a dataset of 561 Australian catchments with 187 static features and 28 dyanmic features for each catchment. The dyanmic features are timeseries from 1950-01-01 to 2022-03-31. By default this class reads version 2 of CAMELS-AUS dataset following Fowler et al., 2024 .

If version is 1 then this class reads data following Fowler et al., 2021 which is a dataset of 222 Australian catchments with 161 static features and 26 dyanmic features for each catchment. The dyanmic features are timeseries from 1957-01-01 to 2018-12-31.

Examples

>>> from aqua_fetch import CAMELS_AUS
>>> dataset = CAMELS_AUS()
... # get data by station id
>>> _, dynamic = dataset.fetch(stations='912101A', as_dataframe=True)
>>> df = dynamic['912101A'] # dynamic is a dictionary of with keys as station names and values as DataFrames
>>> df.shape
(26388, 28)
...
... # get name of all stations as list
>>> stns = dataset.stations()
>>> len(stns)
   561
... # get data of 10 % of stations as dataframe
>>> _, dynamic = dataset.fetch(0.1, as_dataframe=True)
>>> len(dynamic)  # dynamic has data for 10% of stations (56 out of 561)
   56
...
... # dynamic is a dictionary whose values are dataframes of dynamic features
>>> [df.shape for df in dynamic.values()]
    [(26388, 28), (26388, 28), (26388, 28),... (26388, 28), (26388, 28)]
...
... get the data of a single (randomly selected) station
>>> _, dynamic = dataset.fetch(stations=1, as_dataframe=True)
>>> len(dynamic)  # dynamic has data for 1 station
    1
... # get names of available dynamic features
>>> dataset.dynamic_features
... # get only selected dynamic features
>>> _, dynamic = dataset.fetch('912101A', as_dataframe=True,
...  dynamic_features=['airtemp_C_awap_max', 'pcp_mm_awap', 'et_morton_actual_SILO', 'q_cms_obs'])
>>> dynamic['912101A'].shape
   (26388, 4)
...
... # get names of available static features
>>> dataset.static_features
... # get data of 10 random stations
>>> _, dynamic = dataset.fetch(10, as_dataframe=True)
>>> len(dynamic)  # remember this is a dictionary with values as dataframe
   10
...
# If we get both static and dynamic data
>>> static, dynamic = dataset.fetch(stations='912101A', static_features="all", as_dataframe=True)
>>> static.shape, len(dynamic), dynamic['912101A'].shape
((1, 187), 1, (26388, 28))
...
# If we don't set as_dataframe=True and have xarray installed then the returned data will be a xarray Dataset
>>> _, dynamic = dataset.fetch(10)
... type(dynamic)
xarray.core.dataset.Dataset
...
>>> dynamic.dims
FrozenMappingWarningOnValuesAccess({'time': 26388, 'dynamic_features': 28})
...
>>> len(dynamic.data_vars)
10
...
>>> coords = dataset.stn_coords() # returns coordinates of all stations
>>> coords.shape
    (561, 2)
>>> dataset.stn_coords('912101A')  # returns coordinates of station whose id is 912101A
    -38.214199      -71.8283
>>> dataset.stn_coords(['912101A', '912105A'])  # returns coordinates of two stations
...
# get area of a single station
>>> dataset.area('912101A')
# get coordinates of two stations
>>> dataset.area(['912101A', '912105A'])
...
# if fiona library is installed we can get the boundary as fiona Geometry
>>> dataset.get_boundary('912101A')
...
# The version 1 can be of CAMELS_AUS can be accessed as below
>>> dataset = CAMELS_AUS(version=1)
>>> len(dataset.stations())
222
>>> _, dynamic = dataset.fetch(stations='912101A', as_dataframe=True)
>>> dynamic['912101A'].shape
(23376, 26)
__init__(path: str = None, version: int = 2, to_netcdf: bool = True, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]
Parameters:
  • path – path where the CAMELS_AUS dataset has been downloaded. This path must contain five zip files and one xlsx file. If None, then the data will be downloaded.

  • version – version of the dataset to download. Allowed values are 1 and 2.

  • to_netcdf

property dyn_map

A dictionary that maps dynamic features to their names in the dataset.

property dynamic_features: list

Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.

Returns:

a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using fetch_dynamic_features().

Return type:

List[str]

property end

end of data

property static_features: List[str]

Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.

Returns:

a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using fetch_static_features().

Return type:

List[str]

property static_map: Dict[str, str]

A dictionary that maps static features to their names in the dataset.

stations(as_list=True) list[source]

Names/ids of stations/catchment/gauges or whatever that would be used to index each station in the dataset.

class aqua_fetch.rr.CAMELS_BR(path=None, verbosity: int = 1, **kwargs)[source]

Bases: _RainfallRunoff

This is a dataset of 897 Brazilian catchments with 67 static features and 10 dyanmic features for each catchment. The dyanmic features are timeseries from 1920-01-01 to 2019-02-28. This class downloads and processes CAMELS dataset of Brazil as provided by VP Changas et al., 2020 . The simulated streamflow of 593 and raw streamflow of 3679 stations shipped with this data is not included in dynamic features. Both can be fetched through fetch_simulated_streamflow and fetch_raw_streamflow methods.

Examples

>>> from aqua_fetch import CAMELS_BR
>>> dataset = CAMELS_BR()
... # get data by station id
>>> _, dynamic = dataset.fetch(stations='46035000', as_dataframe=True)
>>> df = dynamic['46035000'] # dynamic is a dictionary of with keys as station names and values as DataFrames
>>> df.shape
(14245, 10)
...
... # get name of all stations as list
>>> stns = dataset.stations()
>>> len(stns)
   593
... # get data of 10 % of stations as dataframe
>>> _, dynamic = dataset.fetch(0.1, as_dataframe=True)
>>> len(dynamic)  # dynamic has data for 10% of stations (59 out of 593)
   59
...
... # dynamic is a dictionary whose values are dataframes of dynamic features
>>> [df.shape for df in dynamic.values()]
    [(14245, 10), (14245, 10), (14245, 10),... (14245, 10), (14245, 10)]
...
... get the data of a single (randomly selected) station
>>> _, dynamic = dataset.fetch(stations=1, as_dataframe=True)
>>> len(dynamic)  # dynamic has data for 1 station
    1
... # get names of available dynamic features
>>> dataset.dynamic_features
... # get only selected dynamic features
>>> _, dynamic = dataset.fetch('46035000', as_dataframe=True,
...  dynamic_features=['pcp_mm_cpc', 'aet_mm_mgb', 'airtemp_C_mean', 'q_cms_obs'])
>>> dynamic['46035000'].shape
   (14245, 4)
...
... # get names of available static features
>>> dataset.static_features
... # get data of 10 random stations
>>> _, dynamic = dataset.fetch(10, as_dataframe=True)
>>> len(dynamic)  # remember this is a dictionary with values as dataframe
   10
...
# If we get both static and dynamic data
>>> static, dynamic = dataset.fetch(stations='46035000', static_features="all", as_dataframe=True)
>>> static.shape, len(dynamic), dynamic['46035000'].shape
((1, 67), 1, (14245, 10))
...
# If we don't set as_dataframe=True and have xarray installed then the returned data will be a xarray Dataset
>>> _, dynamic = dataset.fetch(10)
... type(dynamic)
xarray.core.dataset.Dataset
>>> dynamic.dims
FrozenMappingWarningOnValuesAccess({'time': 14245, 'dynamic_features': 10})
...
>>> len(dynamic.data_vars)
10
...
>>> coords = dataset.stn_coords() # returns coordinates of all stations
>>> coords.shape
    (593, 2)
>>> dataset.stn_coords('46035000')  # returns coordinates of station whose id is 46035000
    -12.8686        -43.3797
>>> dataset.stn_coords(['46035000', '57170000'])  # returns coordinates of two stations
...
# get area of a single station
>>> dataset.area('46035000')
# get coordinates of two stations
>>> dataset.area(['46035000', '57170000'])
...
# if fiona library is installed we can get the boundary as fiona Geometry
>>> dataset.get_boundary('46035000')
__init__(path=None, verbosity: int = 1, **kwargs)[source]
Parameters:

path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless overwrite is set to True.

all_stations(feature: str) List[str][source]

Tells all station ids for which a data of a specific attribute is available.

area(stations: str | List[str] = 'all', source: str = 'gsim') Series[source]

Returns area (Km2) of all catchments as pandas.Series

Parameters:
  • stations (str/list) – name/names of stations. Default is None, which will return area of all stations

  • source (str) – source of area calculation. It should be either gsim or ana

Returns:

a pandas.Series whose indices are catchment ids and values are areas of corresponding catchments.

Return type:

pd.Series

Examples

>>> from aqua_fetch import CAMELS_BR
>>> dataset = CAMELS_BR()
>>> dataset.area()  # returns area of all stations
>>> dataset.stn_coords('65100000')  # returns area of station whose id is 912101A
>>> dataset.stn_coords(['65100000', '64075000'])  # returns area of two stations
property boundary_id_map: str

Name of the attribute in the boundary (.shp/.gpkg) file that will be used to map the catchment/station id to the geometry of the catchment/station. This is used to create the boundary id map.

property dyn_map

A dictionary that maps dynamic features to their names in the dataset.

property dynamic_features: List[str]

Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.

Returns:

a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using fetch_dynamic_features().

Return type:

List[str]

property end

end of data

fetch_raw_streamflow(stations: str = None) DataFrame[source]

returns raw streamflow data for one or more stations.

Example

>>> dataset = CAMELS_BR()
>>> data = dataset.fetch_raw_streamflow('10500000')
... # fetch all time series data associated with a station.
>>> x = dataset.fetch_raw_streamflow(dataset.all_stations())
fetch_simulated_streamflow(stations: str = None) DataFrame[source]

returns raw streamflow data for one or more stations.

Example

>>> dataset = CAMELS_BR()
>>> data = dataset.fetch_simulated_streamflow('10500000')
... # fetch all time series data associated with a station.
>>> x = dataset.fetch_simulated_streamflow(dataset.all_stations())
q_mm(stations: str | List[str] = 'all') DataFrame[source]

returns streamflow in the units of milimeter per day. he name of original timeseries is streamflow_mm.

Parameters:

stations (str/list) – name/names of stations. Default is None, which will return area of all stations

Returns:

a pandas.DataFrame whose indices are time-steps and columns are catchment/station ids.

Return type:

pd.DataFrame

property static_features

Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.

Returns:

a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using fetch_static_features().

Return type:

List[str]

property static_map: Dict[str, str]

A dictionary that maps static features to their names in the dataset.

stations() List[str][source]

Returns a list of station ids.

Example

>>> dataset = CAMELS_BR()
>>> stations = dataset.stations()
stn_coords(stations: str | List[str] = 'all') DataFrame[source]

returns coordinates of stations as pandas.DataFrame with long and lat as columns.

Parameters:

stations – name/names of stations. If not given, coordinates of all stations will be returned.

Returns:

pandas.DataFrame with long and lat columns. The length of dataframe will be equal to number of stations wholse coordinates are to be fetched.

Return type:

pd.DataFrame

Examples

>>> dataset = CAMELS_BR()
>>> dataset.stn_coords() # returns coordinates of all stations
>>> dataset.stn_coords('65100000')  # returns coordinates of station whose id is 912101A
>>> dataset.stn_coords(['65100000', '64075000'])  # returns coordinates of two stations
class aqua_fetch.rr.CAMELS_CH(path=None, overwrite: bool = False, to_netcdf: bool = True, timestep: str = 'D', **kwargs)[source]

Bases: _RainfallRunoff

Data of 331 Swiss catchments from Hoege et al., 2023 . The dataset consists of 209 static catchment features and 9 dynamic features. The dynamic features span from 19810101 to 20201231 with daily timestep. For daily (D) timestep, only streamflow is available for 170 swiss catchments. The hourly (H) streamflow data is obtained from Kauzlaric et al., 2023 .

Examples

>>> from aqua_fetch import CAMELS_CH
>>> dataset = CAMELS_CH()
... # get data by station id
>>> _, dynamic = dataset.fetch(stations='2004', as_dataframe=True)
>>> df = dynamic['2004'] # dynamic is a dictionary of with keys as station names and values as DataFrames
>>> df.shape
(14610, 9)
...
... # get name of all stations as list
>>> stns = dataset.stations()
>>> len(stns)
   331
... # get data of 10 % of stations as dataframe
>>> _, dynamic = dataset.fetch(0.1, as_dataframe=True)
>>> len(dynamic)  # dynamic has data for 10% of stations (33 out of 331)
   33
...
... # dynamic is a dictionary whose values are dataframes of dynamic features
>>> [df.shape for df in dynamic.values()]
    [(14610, 9), (14610, 9), (14610, 9),... (14610, 9), (14610, 9)]
...
... get the data of a single (randomly selected) station
>>> _, dynamic = dataset.fetch(stations=1, as_dataframe=True)
>>> len(dynamic)  # dynamic has data for 1 station
    1
... # get names of available dynamic features
>>> dataset.dynamic_features
... # get only selected dynamic features
>>> _, dynamic = dataset.fetch('2004', as_dataframe=True,
...  dynamic_features=['pcp_mm', 'airtemp_C_mean', 'q_cms_obs'])
>>> dynamic['2004'].shape
   (14610, 3)
...
... # get names of available static features
>>> dataset.static_features
... # get data of 10 random stations
>>> _, dynamic = dataset.fetch(10, as_dataframe=True)
>>> len(dynamic)  # remember this is a dictionary with values as dataframe
   10
...
# If we get both static and dynamic data
>>> static, dynamic = dataset.fetch(stations='2004', static_features="all", as_dataframe=True)
>>> static.shape, len(dynamic), dynamic['2004'].shape
((1, 209), 1, (14610, 9))
...
# If we don't set as_dataframe=True and have xarray installed then the returned data will be a xarray Dataset
>>> _, dynamic = dataset.fetch(10)
... type(dynamic)
xarray.core.dataset.Dataset
...
>>> dynamic.dims
FrozenMappingWarningOnValuesAccess({'time': 14610, 'dynamic_features': 9})
...
>>> len(dynamic.data_vars)
10
...
>>> coords = dataset.stn_coords() # returns coordinates of all stations
>>> coords.shape
    (331, 2)
>>> dataset.stn_coords('2004')  # returns coordinates of station whose id is 2004
    47.925221       8.191595
>>> dataset.stn_coords(['2004', '2007'])  # returns coordinates of two stations
...
# get area of a single station
>>> dataset.area('2004')
# get coordinates of two stations
>>> dataset.area(['2004', '2007'])
...
# if fiona library is installed we can get the boundary as fiona Geometry
>>> dataset.get_boundary('2004')
__init__(path=None, overwrite: bool = False, to_netcdf: bool = True, timestep: str = 'D', **kwargs)[source]
Parameters:
  • path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless overwrite is set to True.

  • overwrite (bool) – If the data is already down then you can set it to True, to make a fresh download.

  • to_netcdf (bool) – whether to convert all the data into one netcdf file or not. This will fasten repeated calls to fetch etc. but will require netCDF4 package as well as xarry.

all_hourly_stations() List[str][source]

Names of all stations which have hourly data

climate_attrs() DataFrame[source]

returns 14 climate attributes of catchments.

property dyn_map

A dictionary that maps dynamic features to their names in the dataset.

property dynamic_features: List[str]

Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.

Returns:

a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using fetch_dynamic_features().

Return type:

List[str]

property end

end of data

foen_stations() List[str][source]

Returns all the stations in the FOEN folder

geol_attrs() DataFrame[source]

15 geological features

glacier_attrs() DataFrame[source]
returns a dataframe with four columns
  • ‘glac_area’

  • ‘glac_vol’

  • ‘glac_mass’

  • ‘glac_area_neighbours’

hourly_stations() List[str][source]

IDs of those stations which have hourly data and which are also part of CAMELS-CH dataset

human_inf_attrs() DataFrame[source]

14 athropogenic factors

hydrogeol_attrs() DataFrame[source]

10 hydrogeological factors

hydrol_attrs() DataFrame[source]

14 hydrological parameters + 2 useful infos

landcolover_attrs() DataFrame[source]

13 landcover parameters

soil_attrs() DataFrame[source]

80 soil parameters

property static_features

Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.

Returns:

a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using fetch_static_features().

Return type:

List[str]

property static_map: Dict[str, str]

A dictionary that maps static features to their names in the dataset.

stations() List[str][source]

Returns station ids for catchments

supp_geol_attrs() DataFrame[source]

supplimentary geological features

topo_attrs() DataFrame[source]

topographic parameters

class aqua_fetch.rr.CAMELS_CL(path: str = None, **kwargs)[source]

Bases: _RainfallRunoff

This is a dataset of 516 Chilean catchments with 104 static features and 12 dyanmic features for each catchment. The dyanmic features are timeseries from 1913-02-15 to 2018-03-09. This class downloads and processes CAMELS dataset of Chile following the work of Alvarez-Garreton et al., 2018 .

Examples

>>> from aqua_fetch import CAMELS_CL
>>> dataset = CAMELS_CL()
... # get data by station id
>>> _, dynamic = dataset.fetch(stations='8350001', as_dataframe=True)
>>> df = dynamic['8350001'] # dynamic is a dictionary of with keys as station names and values as DataFrames
>>> df.shape
(38374, 12)
...
... # get name of all stations as list
>>> stns = dataset.stations()
>>> len(stns)
   516
... # get data of 10 % of stations as dataframe
>>> _, dynamic = dataset.fetch(0.1, as_dataframe=True)
>>> len(dynamic)  # dynamic has data for 10% of stations (51 out of 516)
   51
...
... # dynamic is a dictionary whose values are dataframes of dynamic features
>>> [df.shape for df in dynamic.values()]
    [(38374, 12), (38374, 12), (38374, 12),... (38374, 12), (38374, 12)]
...
... get the data of a single (randomly selected) station
>>> _, dynamic = dataset.fetch(stations=1, as_dataframe=True)
>>> len(dynamic)  # dynamic has data for 1 station
    1
... # get names of available dynamic features
>>> dataset.dynamic_features
... # get only selected dynamic features
>>> _, dynamic = dataset.fetch('8350001', as_dataframe=True,
...  dynamic_features=['pet_mm_hargreaves', 'pcp_mm_mswep', 'airtemp_C_mean', 'q_cms_obs'])
>>> dynamic['8350001'].shape
   (38374, 4)
...
... # get names of available static features
>>> dataset.static_features
... # get data of 10 random stations
>>> _, dynamic = dataset.fetch(10, as_dataframe=True)
>>> len(dynamic)  # remember this is a dictionary with values as dataframe
   10
...
# If we get both static and dynamic data
>>> static, dynamic = dataset.fetch(stations='8350001', static_features="all", as_dataframe=True)
>>> static.shape, len(dynamic), dynamic['8350001'].shape
((1, 104), 1, (38374, 12))
...
# If we don't set as_dataframe=True and have xarray installed then the returned data will be a xarray Dataset
>>> _, dynamic = dataset.fetch(10)
... type(dynamic)
xarray.core.dataset.Dataset
...
>>> dynamic.dims
FrozenMappingWarningOnValuesAccess({'time': 38374, 'dynamic_features': 12})
...
>>> len(dynamic.data_vars)
10
...
>>> coords = dataset.stn_coords() # returns coordinates of all stations
>>> coords.shape
    (516, 2)
>>> dataset.stn_coords('8350001')  # returns coordinates of station whose id is 8350001
    -38.214199      -71.8283
>>> dataset.stn_coords(['8350001', '3820003'])  # returns coordinates of two stations
...
# get area of a single station
>>> dataset.area('8350001')
# get coordinates of two stations
>>> dataset.area(['8350001', '3820003'])
...
# if fiona library is installed we can get the boundary as fiona Geometry
>>> dataset.get_boundary('8350001')
__init__(path: str = None, **kwargs)[source]
Parameters:

path – path where the CAMELS-CL dataset has been downloaded. This path must contain five zip files and one xlsx file.

property dyn_map

A dictionary that maps dynamic features to their names in the dataset.

property dynamic_features: List[str]

Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.

Returns:

a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using fetch_dynamic_features().

Return type:

List[str]

property end

end of data

property static_features: List[str]

Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.

Returns:

a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using fetch_static_features().

Return type:

List[str]

property static_map: Dict[str, str]

A dictionary that maps static features to their names in the dataset.

stations() list[source]

Tells all station ids for which a data of a specific attribute is available.

stn_coords(stations: str | List[str] = 'all') DataFrame[source]

returns coordinates of stations as DataFrame with long and lat as columns.

Parameters:

stations – name/names of stations. If not given, coordinates of all stations will be returned.

Returns:

pandas.DataFrame with long and lat columns. The length of dataframe will be equal to number of stations wholse coordinates are to be fetched.

Return type:

pd.DataFrame

Examples

>>> dataset = CAMELS_CL()
>>> dataset.stn_coords() # returns coordinates of all stations
>>> dataset.stn_coords('12872001')  # returns coordinates of station whose id is 912101A
>>> dataset.stn_coords(['12872001', '12876004'])  # returns coordinates of two stations
class aqua_fetch.rr.CAMELS_COL(path=None, overwrite=False, to_netcdf: bool = True, **kwargs)[source]

Bases: _RainfallRunoff

Dataset of 347 catchments from Colombia following the works of Jimenez et al., 2025. The dataset consists of 255 static catchment features and 6 dynamic features. The dynamic features span from 19810101 to 20221231 with daily timestep. The data is downloaded from Zenodo.

Examples

>>> from aqua_fetch import CAMELS_COL
>>> dataset = CAMELS_COL()
... # get data by station id
>>> _, dynamic = dataset.fetch(stations='35067040', as_dataframe=True)
>>> df = dynamic['35067040'] # dynamic is a dictionary of with keys as station names and values as DataFrames
>>> df.shape
(15340, 6)
...
... # get name of all stations as list
>>> stns = dataset.stations()
>>> len(stns)
   347
... # get data of 10 % of stations as dataframe
>>> _, dynamic = dataset.fetch(0.1, as_dataframe=True)
>>> len(dynamic)  # dynamic has data for 10% of stations (34 out of 347)
   34
...
... # dynamic is a dictionary whose values are dataframes of dynamic features
>>> [df.shape for df in dynamic.values()]
    [(15340, 6), (15340, 6), (15340, 6),... (15340, 6), (15340, 6)]
...
... get the data of a single (randomly selected) station
>>> _, dynamic = dataset.fetch(stations=1, as_dataframe=True)
>>> len(dynamic)  # dynamic has data for 1 station
    1
... # get names of available dynamic features
>>> dataset.dynamic_features
... # get only selected dynamic features
>>> _, dynamic = dataset.fetch('35067040', as_dataframe=True,
...  dynamic_features=['pcp_mm', 'airtemp_C_mean', 'pet_mm', 'q_cms_obs'])
>>> dynamic['35067040'].shape
   (15340, 4)
...
... # get names of available static features
>>> dataset.static_features
... # get data of 10 random stations
>>> _, dynamic = dataset.fetch(10, as_dataframe=True)
>>> len(dynamic)  # remember this is a dictionary with values as dataframe
   10
...
# If we get both static and dynamic data
>>> static, dynamic = dataset.fetch(stations='35067040', static_features="all", as_dataframe=True)
>>> static.shape, len(dynamic), dynamic['35067040'].shape
((1, 255), 1, (15340, 6))
...
# If we don't set as_dataframe=True and have xarray installed then the returned data will be a xarray Dataset
>>> _, dynamic = dataset.fetch(10)
... type(dynamic)
xarray.core.dataset.Dataset
...
>>> dynamic.dims
FrozenMappingWarningOnValuesAccess({'time': 15340, 'dynamic_features': 6})
...
>>> len(dynamic.data_vars)
10
...
>>> coords = dataset.stn_coords() # returns coordinates of all stations
>>> coords.shape
    (347, 2)
>>> dataset.stn_coords('35067040')  # returns coordinates of station whose id is 35067040
    4.746433        -73.587807
>>> dataset.stn_coords(['35067040', '21187030'])  # returns coordinates of two stations
...
# get area of a single station
>>> dataset.area('35067040')
# get coordinates of two stations
>>> dataset.area(['35067040', '21187030'])
...
# if fiona library is installed we can get the boundary as fiona Geometry
>>> dataset.get_boundary('35067040')
__init__(path=None, overwrite=False, to_netcdf: bool = True, **kwargs)[source]
Parameters:
  • path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.

  • timestep (str) – This can only be set for datasets which are available at multiple timesteps such as LamaHCE or LamaHIce etc.

  • to_netcdf (bool) – whether the data should be saved in netCDF format or not If set to true, the data will be saved in netCDF format which can take time for the first time it is created. However, it leads to faster I/O operations in subsequent accesses.

  • overwrite (bool) – whether to overwrite existing files or not. If set to True, the data will be redownloaded.

  • verbosity (int) –

    This parameter determines the level of verbosity for logging messages.

    • 0: no message will be printed

    • 1: only important messages will be printed

    • >1: any higher value greater than 1 will result in more verbose output

  • kwargs – Any other keyword arguments for the parent Datasets class

property dyn_map: Dict[str, str]

dynamic features map for CAMELS-LUX catchments

property dynamic_features: List[str]

Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.

Returns:

a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using fetch_dynamic_features().

Return type:

List[str]

property end: Timestamp

end of data

property static_features: List[str]

returns static features for Colombia catchments

property static_map: Dict[str, str]

A dictionary that maps static features to their names in the dataset.

stations() List[str][source]

Names/ids of stations/catchment/gauges or whatever that would be used to index each station in the dataset.

class aqua_fetch.rr.CAMELS_DE(path=None, overwrite: bool = False, to_netcdf: bool = True, verbosity: int = 1, **kwargs)[source]

Bases: _RainfallRunoff

This is the data from 1582 German catchments following the work of Loritz et al., 2024 . The data is downloaded from zenodo . This data consists of 111 static and 21 dynamic features. The dynamic features span from 1951-01-01 to 2020-12-31 with daily timestep.

Examples

>>> from aqua_fetch import CAMELS_DE
>>> dataset = CAMELS_DE()
... # get data by station id
>>> _, dynamic = dataset.fetch(stations='DE110260', as_dataframe=True)
>>> df = dynamic['DE110260'] # dynamic is a dictionary of with keys as station names and values as DataFrames
>>> df.shape
(25568, 21)
...
... # get name of all stations as list
>>> stns = dataset.stations()
>>> len(stns)
   1582
... # get data of 10 % of stations as dataframe
>>> _, dynamic = dataset.fetch(0.1, as_dataframe=True)
>>> len(dynamic)  # dynamic has data for 10% of stations (155 out of 1582)
   155
...
... # dynamic is a dictionary whose values are dataframes of dynamic features
>>> [df.shape for df in dynamic.values()]
    [(25568, 21), (25568, 21), (25568, 21),... (25568, 21), (25568, 21)]
...
... get the data of a single (randomly selected) station
>>> _, dynamic = dataset.fetch(stations=1, as_dataframe=True)
>>> len(dynamic)  # dynamic has data for 1 station
    1
... # get names of available dynamic features
>>> dataset.dynamic_features
... # get only selected dynamic features
>>> _, dynamic = dataset.fetch('DE110260', as_dataframe=True,
...  dynamic_features=['airtemp_C_mean', 'rh_%', 'pcp_mm_mean', 'q_cms_obs'])
>>> dynamic['DE110260'].shape
   (25568, 4)
...
... # get names of available static features
>>> dataset.static_features
... # get data of 10 random stations
>>> _, dynamic = dataset.fetch(10, as_dataframe=True)
>>> len(dynamic)  # remember this is a dictionary with values as dataframe
   10
...
# If we get both static and dynamic data
>>> static, dynamic = dataset.fetch(stations='DE110260', static_features="all", as_dataframe=True)
>>> static.shape, len(dynamic), dynamic['DE110260'].shape
((1, 111), 1, (25568, 21))
...
# If we don't set as_dataframe=True and have xarray installed then the returned data will be a xarray Dataset
>>> _, dynamic = dataset.fetch(10)
... type(dynamic)
xarray.core.dataset.Dataset
...
>>> dynamic.dims
FrozenMappingWarningOnValuesAccess({'time': 25568, 'dynamic_features': 21})
...
>>> len(dynamic.data_vars)
10
...
>>> coords = dataset.stn_coords() # returns coordinates of all stations
>>> coords.shape
    (1582, 2)
>>> dataset.stn_coords('DE110260')  # returns coordinates of station whose id is DE110260
    47.925221       8.191595
>>> dataset.stn_coords(['DE110260', 'DE110250'])  # returns coordinates of two stations
...
# get area of a single station
>>> dataset.area('DE110260')
# get coordinates of two stations
>>> dataset.area(['DE110260', 'DE110250'])
...
# if fiona library is installed we can get the boundary as fiona Geometry
>>> dataset.get_boundary('DE110260')
__init__(path=None, overwrite: bool = False, to_netcdf: bool = True, verbosity: int = 1, **kwargs)[source]
Parameters:
  • path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless overwrite is set to True.

  • overwrite (bool) – If the data is already down then you can set it to True, to make a fresh download.

  • to_netcdf (bool) – whether to convert all the data into one netcdf file or not. This will fasten repeated calls to fetch etc. but will require netCDF4 package as well as xarray.

property dyn_map

A dictionary that maps dynamic features to their names in the dataset.

property dynamic_features: List[str]

Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.

Returns:

a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using fetch_dynamic_features().

Return type:

List[str]

property end

end of data

property static_features: List[str]

Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.

Returns:

a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using fetch_static_features().

Return type:

List[str]

property static_map: Dict[str, str]

A dictionary that maps static features to their names in the dataset.

stations() List[str][source]

Names/ids of stations/catchment/gauges or whatever that would be used to index each station in the dataset.

class aqua_fetch.rr.CAMELS_DK(path=None, overwrite=False, to_netcdf: bool = True, **kwargs)[source]

Bases: _RainfallRunoff

This is an updated version of :py class:aqua_fetch.rr.Caravan_DK dataset . This dataset was presented by Liu et al., 2024 and is available at dataverse . This dataset consists of 119 static and 13 dynamic features from 3330 Danish catchments. The dynamic (time series) features span from 1989-01-02 to 2023-12-31 with daily timestep. However, the streamflow observations are available for only 304 catchments.

Examples

>>> from aqua_fetch import CAMELS_DK
>>> dataset = CAMELS_DK()
... # get data by station id
>>> _, dynamic = dataset.fetch(stations='54130033', as_dataframe=True)
>>> df = dynamic['54130033'] # dynamic is a dictionary of with keys as station names and values as DataFrames
>>> df.shape
(12782, 13)
...
... # get name of all stations as list
>>> stns = dataset.stations()
>>> len(stns)
   304
... # get data of 10 % of stations as dataframe
>>> _, dynamic = dataset.fetch(0.1, as_dataframe=True)
>>> len(dynamic)  # dynamic has data for 10% of stations (30 out of 304)
   30
...
... # dynamic is a dictionary whose values are dataframes of dynamic features
>>> [df.shape for df in dynamic.values()]
    [(12782, 13), (12782, 13), (12782, 13),... (12782, 13), (12782, 13)]
...
... get the data of a single (randomly selected) station
>>> _, dynamic = dataset.fetch(stations=1, as_dataframe=True)
>>> len(dynamic)  # dynamic has data for 1 station
    1
... # get names of available dynamic features
>>> dataset.dynamic_features
... # get only selected dynamic features
>>> _, dynamic = dataset.fetch('54130033', as_dataframe=True,
...  dynamic_features=['Abstraction', 'pet_mm', 'airtemp_C_mean', 'pcp_mm', 'q_cms_obs'])
>>> dynamic['54130033'].shape
   (12782, 5)
...
... # get names of available static features
>>> dataset.static_features
... # get data of 10 random stations
>>> _, dynamic = dataset.fetch(10, as_dataframe=True)
>>> len(dynamic)  # remember this is a dictionary with values as dataframe
   10
...
# If we get both static and dynamic data
>>> static, dynamic = dataset.fetch(stations='54130033', static_features="all", as_dataframe=True)
>>> static.shape, len(dynamic), dynamic['54130033'].shape
((1, 119), 1, (12782, 13))
...
# If we don't set as_dataframe=True and have xarray installed then the returned data will be a xarray Dataset
>>> _, dynamic = dataset.fetch(10)
... type(dynamic)
xarray.core.dataset.Dataset
...
>>> dynamic.dims
FrozenMappingWarningOnValuesAccess({'time': 12782, 'dynamic_features': 13})
...
>>> len(dynamic.data_vars)
10
...
>>> coords = dataset.stn_coords() # returns coordinates of all stations
>>> coords.shape
    (304, 2)
>>> dataset.stn_coords('54130033')  # returns coordinates of station whose id is 54130033
    55.325242       9.93079
>>> dataset.stn_coords(['54130033', '13210113'])  # returns coordinates of two stations
...
# get area of a single station
>>> dataset.area('54130033')
# get coordinates of two stations
>>> dataset.area(['54130033', '13210113'])
...
# if fiona library is installed we can get the boundary as fiona Geometry
>>> dataset.get_boundary('54130033')
__init__(path=None, overwrite=False, to_netcdf: bool = True, **kwargs)[source]
Parameters:
  • path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless overwrite is set to True.

  • overwrite (bool) – If the data is already down then you can set it to True, to make a fresh download.

  • to_netcdf (bool) – whether to convert all the data into one netcdf file or not. This will fasten repeated calls to fetch etc but will require netCDF4 package as well as xarray.

property dyn_map

A dictionary that maps dynamic features to their names in the dataset.

property dynamic_features: List[str]

returns names of dynamic features

property end: Timestamp

end of data

property static_features: List[str]

returns static features for Denmark catchments

property static_map: Dict[str, str]

A dictionary that maps static features to their names in the dataset.

stations() List[str][source]

Names/ids of stations/catchment/gauges or whatever that would be used to index each station in the dataset.

transform_boundary(boundary)[source]

Transforms the coordinates to the required format.

transform_stn_coords(df: DataFrame) DataFrame[source]

transforms coordinates from geographic to projected

must be implemented in base classes

class aqua_fetch.rr.CAMELS_FI(path=None, overwrite=False, to_netcdf: bool = True, **kwargs)[source]

Bases: _RainfallRunoff

Dataset of 320 Finnish catchments with 16 dynamic features and 106 static features. The dynamic features span from 19610101 to 20231231 with daily timestep. The data is downloaded from Zenodo.

Examples

>>> from aqua_fetch import CAMELS_FI
>>> dataset = CAMELS_FI()
... # get data by station id
>>> _, dynamic = dataset.fetch(stations='1156', as_dataframe=True)
>>> df = dynamic['1156'] # dynamic is a dictionary of with keys as station names and values as DataFrames
>>> df.shape
(23010, 16)
...
... # get name of all stations as list
>>> stns = dataset.stations()
>>> len(stns)
   320
... # get data of 10 % of stations as dataframe
>>> _, dynamic = dataset.fetch(0.1, as_dataframe=True)
>>> len(dynamic)  # dynamic has data for 10% of stations (32)
   32
...
... # dynamic is a dictionary whose values are dataframes of dynamic features
>>> [df.shape for df in dynamic.values()]
    [(23010, 16), (23010, 16), (23010, 16),... (23010, 16), (23010, 16)]
...
... get the data of a single (randomly selected) station
>>> _, dynamic = dataset.fetch(stations=1, as_dataframe=True)
>>> len(dynamic)  # dynamic has data for 1 station
    1
... # get names of available dynamic features
>>> dataset.dynamic_features
... # get only selected dynamic features
>>> _, dynamic = dataset.fetch('1156', as_dataframe=True,
...  dynamic_features=['pcp_mm', 'snowdepth_m', 'airtemp_C_mean', 'pet_mm', 'q_cms_obs'])
>>> dynamic['1156'].shape
   (23010, 5)
...
... # get names of available static features
>>> dataset.static_features
... # get data of 10 random stations
>>> _, dynamic = dataset.fetch(10, as_dataframe=True)
>>> len(dynamic)  # remember this is a dictionary with values as dataframe
   10
...
# If we get both static and dynamic data
>>> static, dynamic = dataset.fetch(stations='1156', static_features="all", as_dataframe=True)
>>> static.shape, len(dynamic), dynamic['1156'].shape
((1, 106), 1, (23010, 5))
...
# If we don't set as_dataframe=True and have xarray installed then the returned data will be a xarray Dataset
>>> _, dynamic = dataset.fetch(10)
... type(dynamic)
xarray.core.dataset.Dataset
...
>>> dynamic.dims
FrozenMappingWarningOnValuesAccess({'time': 23010, 'dynamic_features': 16})
...
>>> len(dynamic.data_vars)   # -> 10
...
>>> coords = dataset.stn_coords() # returns coordinates of all stations
>>> coords.shape
    (320, 2)
>>> dataset.stn_coords('1156')  # returns coordinates of station whose id is 1156
    62.253101       24.444099
>>> dataset.stn_coords(['1156', '1116'])  # returns coordinates of two stations
...
# get area of a single station
>>> dataset.area('1156')
# get coordinates of two stations
>>> dataset.area(['1156', '1116'])
...
# if fiona library is installed we can get the boundary as fiona Geometry
>>> dataset.get_boundary('1156')
__init__(path=None, overwrite=False, to_netcdf: bool = True, **kwargs)[source]
Parameters:
  • path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.

  • timestep (str) – This can only be set for datasets which are available at multiple timesteps such as LamaHCE or LamaHIce etc.

  • to_netcdf (bool) – whether the data should be saved in netCDF format or not If set to true, the data will be saved in netCDF format which can take time for the first time it is created. However, it leads to faster I/O operations in subsequent accesses.

  • overwrite (bool) – whether to overwrite existing files or not. If set to True, the data will be redownloaded.

  • verbosity (int) –

    This parameter determines the level of verbosity for logging messages.

    • 0: no message will be printed

    • 1: only important messages will be printed

    • >1: any higher value greater than 1 will result in more verbose output

  • kwargs – Any other keyword arguments for the parent Datasets class

property dyn_map: Dict[str, str]

dynamic features map for CAMELS-FI catchments

property end: Timestamp

end of data

property start: Timestamp

start of data

property static_factors: Dict[str, float]

static factors for CAMELS-LUX catchments

property static_map: Dict[str, str]

A dictionary that maps static features to their names in the dataset.

stations() List[str][source]

Names/ids of stations/catchment/gauges or whatever that would be used to index each station in the dataset.

class aqua_fetch.rr.CAMELS_FR(path=None, overwrite=False, **kwargs)[source]

Bases: _RainfallRunoff

Dataset of 654 catchments from France following the works of Delaigue et al., 2024. The dataset consists of 344 static catchment features and 22 dynamic features. The dynamic features span from 1970101 to 20211231 with daily timestep.

Examples

>>> from aqua_fetch import CAMELS_FR
>>> dataset = CAMELS_FR()
... # get data by station id
>>> _, dynamic = dataset.fetch(stations='J421191001', as_dataframe=True)
>>> df = dynamic['J421191001'] # dynamic is a dictionary of with keys as station names and values as DataFrames
>>> df.shape
(12782, 22)
...
... # get name of all stations as list
>>> stns = dataset.stations()
>>> len(stns)
   654
... # get data of 10 % of stations as dataframe
>>> _, dynamic = dataset.fetch(0.1, as_dataframe=True)
>>> len(dynamic)  # dynamic has data for 10% of stations (65 out of 654)
   65
...
... # dynamic is a dictionary whose values are dataframes of dynamic features
>>> [df.shape for df in dynamic.values()]
    [(12782, 22), (12782, 22), (12782, 22),... (12782, 22), (12782, 22)]
...
... get the data of a single (randomly selected) station
>>> _, dynamic = dataset.fetch(stations=1, as_dataframe=True)
>>> len(dynamic)  # dynamic has data for 1 station
    1
... # get names of available dynamic features
>>> dataset.dynamic_features
... # get only selected dynamic features
>>> _, dynamic = dataset.fetch('J421191001', as_dataframe=True,
...  dynamic_features=['pcp_mm', 'spechum_gkg', 'airtemp_C_mean', 'pet_mm_pm', 'q_cms_obs'])
>>> dynamic['J421191001'].shape
   (12782, 5)
...
... # get names of available static features
>>> dataset.static_features
... # get data of 10 random stations
>>> _, dynamic = dataset.fetch(10, as_dataframe=True)
>>> len(dynamic)  # remember this is a dictionary with values as dataframe
   10
...
# If we get both static and dynamic data
>>> static, dynamic = dataset.fetch(stations='J421191001', static_features="all", as_dataframe=True)
>>> static.shape, len(dynamic), dynamic['J421191001'].shape
((1, 344), 1, (12782, 22))
...
# If we don't set as_dataframe=True and have xarray installed then the returned data will be a xarray Dataset
>>> _, dynamic = dataset.fetch(10)
... type(dynamic)
xarray.core.dataset.Dataset
...
>>> dynamic.dims
FrozenMappingWarningOnValuesAccess({'time': 12782, 'dynamic_features': 22})
...
>>> len(dynamic.data_vars)
10
...
>>> coords = dataset.stn_coords() # returns coordinates of all stations
>>> coords.shape
    (654, 2)
>>> dataset.stn_coords('J421191001')  # returns coordinates of station whose id is J421191001
    48.006298   -4.063848
>>> dataset.stn_coords(['J421191001', '802'])  # returns coordinates of two stations
...
# get area of a single station
>>> dataset.area('J421191001')
# get coordinates of two stations
>>> dataset.area(['J421191001', '802'])
...
# if fiona library is installed we can get the boundary as fiona Geometry
>>> dataset.get_boundary('J421191001')
__init__(path=None, overwrite=False, **kwargs)[source]
Parameters:
  • path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.

  • timestep (str) – This can only be set for datasets which are available at multiple timesteps such as LamaHCE or LamaHIce etc.

  • to_netcdf (bool) – whether the data should be saved in netCDF format or not If set to true, the data will be saved in netCDF format which can take time for the first time it is created. However, it leads to faster I/O operations in subsequent accesses.

  • overwrite (bool) – whether to overwrite existing files or not. If set to True, the data will be redownloaded.

  • verbosity (int) –

    This parameter determines the level of verbosity for logging messages.

    • 0: no message will be printed

    • 1: only important messages will be printed

    • >1: any higher value greater than 1 will result in more verbose output

  • kwargs – Any other keyword arguments for the parent Datasets class

property dyn_map: Dict[str, str]

A dictionary that maps dynamic features to their names in the dataset.

property dynamic_features: List[str]

returns names of dynamic features

property end: Timestamp

end of data

static_attrs() DataFrame[source]

combination of topographic + soil + landuse + geology + climate + hydro + climate + anthropogenic features

Returns:

a pandas.DataFrame of static features of all catchments of shape (654, xxxx)

Return type:

pd.DataFrame

property static_features: List[str]

returns static features for Denmark catchments

property static_map: Dict[str, str]

A dictionary that maps static features to their names in the dataset.

stations() List[str][source]

Names/ids of stations/catchment/gauges or whatever that would be used to index each station in the dataset.

ts_attrs() DataFrame[source]

daily_timeseries statistics of all catchments

Returns:

a pandas.DataFrame of static features of all catchments of shape (654, xxxx)

Return type:

pd.DataFrame

class aqua_fetch.rr.CAMELS_GB(path=None, **kwargs)[source]

Bases: _RainfallRunoff

This is a dataset of 671 catchments with 145 static features and 10 dyanmic features for each catchment following the work of Coxon et al., 2020. The dyanmic features are timeseries from 1970-10-01 to 2015-09-30. The data is downloaded from ceh website

Examples

>>> from aqua_fetch import CAMELS_GB
>>> dataset = CAMELS_GB()
... # get data by station id
>>> _, dynamic = dataset.fetch(stations='38017', as_dataframe=True)
>>> df = dynamic['38017'] # dynamic is a dictionary of with keys as station names and values as DataFrames
>>> df.shape
(26388, 28)
...
... # get name of all stations as list
>>> stns = dataset.stations()
>>> len(stns)
   671
... # get data of 10 % of stations as dataframe
>>> _, dynamic = dataset.fetch(0.1, as_dataframe=True)
>>> len(dynamic)  # dynamic has data for 10% of stations (67 out of 671)
   67
...
... # dynamic is a dictionary whose values are dataframes of dynamic features
>>> [df.shape for df in dynamic.values()]
    [(26388, 28), (26388, 28), (26388, 28),... (26388, 28), (26388, 28)]
...
... get the data of a single (randomly selected) station
>>> _, dynamic = dataset.fetch(stations=1, as_dataframe=True)
>>> len(dynamic)  # dynamic has data for 1 station
    1
... # get names of available dynamic features
>>> dataset.dynamic_features
... # get only selected dynamic features
>>> _, dynamic = dataset.fetch('38017', as_dataframe=True,
...  dynamic_features=['windspeed_mps', 'airtemp_C_mean', 'pet_mm', 'pcp_mm', 'q_cms_obs'])
>>> dynamic['38017'].shape
   (26388, 4)
...
... # get names of available static features
>>> dataset.static_features
... # get data of 10 random stations
>>> _, dynamic = dataset.fetch(10, as_dataframe=True)
>>> len(dynamic)  # remember this is a dictionary with values as dataframe
   10
...
# If we get both static and dynamic data
>>> static, dynamic = dataset.fetch(stations='38017', static_features="all", as_dataframe=True)
>>> static.shape, len(dynamic), dynamic['38017'].shape
((1, 145), 1, (26388, 28))
...
# If we don't set as_dataframe=True and have xarray installed then the returned data will be a xarray Dataset
>>> _, dynamic = dataset.fetch(10)
... type(dynamic)
xarray.core.dataset.Dataset
...
>>> dynamic.dims
FrozenMappingWarningOnValuesAccess({'time': 26388, 'dynamic_features': 28})
...
>>> len(dynamic.data_vars)
10
...
>>> coords = dataset.stn_coords() # returns coordinates of all stations
>>> coords.shape
    (671, 2)
>>> dataset.stn_coords('38017')  # returns coordinates of station whose id is 38017
    51.880001       -0.28
>>> dataset.stn_coords(['38017', '42001'])  # returns coordinates of two stations
...
# get area of a single station
>>> dataset.area('38017')
# get coordinates of two stations
>>> dataset.area(['38017', '42001'])
...
# if fiona library is installed we can get the boundary as fiona Geometry
>>> dataset.get_boundary('38017')
__init__(path=None, **kwargs)[source]
Parameters:

path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless overwrite is set to True.

property dyn_map

A dictionary that maps dynamic features to their names in the dataset.

property dynamic_features: List[str]

Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.

Returns:

a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using fetch_dynamic_features().

Return type:

List[str]

property end

end of data

property static_features: List[str]

Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.

Returns:

a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using fetch_static_features().

Return type:

List[str]

property static_map: Dict[str, str]

A dictionary that maps static features to their names in the dataset.

stations(to_exclude=None)[source]

Names/ids of stations/catchment/gauges or whatever that would be used to index each station in the dataset.

class aqua_fetch.CAMELS_IND(path=None, overwrite=False, to_netcdf: bool = True, **kwargs)[source]

Bases: _RainfallRunoff

Dataset of 472 catchments from Republic of India following the works of Mangukiya et al., 2024. The dataset consists of 210 static catchment features and 20 dynamic features. The dynamic features span from 19800101 to 20201231 with daily timestep.

Examples

>>> from aqua_fetch import CAMELS_IND
>>> dataset = CAMELS_IND()
... # get data by station id
>>> _, dynamic = dataset.fetch(stations='3001', as_dataframe=True)
>>> df = dynamic['3001'] # dynamic is a dictionary of with keys as station names and values as DataFrames
>>> df.shape
(14976, 20)
...
... # get name of all stations as list
>>> stns = dataset.stations()
>>> len(stns)
   472
... # get data of 10 % of stations as dataframe
>>> _, dynamic = dataset.fetch(0.1, as_dataframe=True)
>>> len(dynamic)  # dynamic has data for 10% of stations (47 out of 472)
   47
...
... # dynamic is a dictionary whose values are dataframes of dynamic features
>>> [df.shape for df in dynamic.values()]
    [(14976, 20), (14976, 20), (14976, 20),... (14976, 20), (14976, 20)]
...
... get the data of a single (randomly selected) station
>>> _, dynamic = dataset.fetch(stations=1, as_dataframe=True)
>>> len(dynamic)  # dynamic has data for 1 station
    1
... # get names of available dynamic features
>>> dataset.dynamic_features
... # get only selected dynamic features
>>> _, dynamic = dataset.fetch('3001', as_dataframe=True,
...  dynamic_features=['pcp_mm', 'rh_%', 'airtemp_C_mean', 'pet_mm', 'q_cms_obs'])
>>> dynamic['3001'].shape
   (14976, 5)
...
... # get names of available static features
>>> dataset.static_features
... # get data of 10 random stations
>>> _, dynamic = dataset.fetch(10, as_dataframe=True)
>>> len(dynamic)  # remember this is a dictionary with values as dataframe
   10

# If we get both static and dynamic data >>> static, dynamic = dataset.fetch(stations=’3001’, static_features=”all”, as_dataframe=True) >>> static.shape, len(dynamic), dynamic[‘3001’].shape ((1, 210), 1, (14976, 20)) … # If we don’t set as_dataframe=True and have xarray installed then the returned data will be a xarray Dataset >>> _, dynamic = dataset.fetch(10) … type(dynamic) xarray.core.dataset.Dataset … >>> dynamic.dims FrozenMappingWarningOnValuesAccess({‘time’: 14976, ‘dynamic_features’: 20}) … >>> len(dynamic.data_vars) 10 … >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape

(472, 2)

>>> dataset.stn_coords('3001')  # returns coordinates of station whose id is 3001
    48.006298   -4.063848
>>> dataset.stn_coords(['3001', '17021'])  # returns coordinates of two stations
...
# get area of a single station
>>> dataset.area('3001')
# get coordinates of two stations
>>> dataset.area(['3001', '17021'])
...
# if fiona library is installed we can get the boundary as fiona Geometry
>>> dataset.get_boundary('3001')
__init__(path=None, overwrite=False, to_netcdf: bool = True, **kwargs)[source]
Parameters:
  • path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.

  • timestep (str) – This can only be set for datasets which are available at multiple timesteps such as LamaHCE or LamaHIce etc.

  • to_netcdf (bool) – whether the data should be saved in netCDF format or not If set to true, the data will be saved in netCDF format which can take time for the first time it is created. However, it leads to faster I/O operations in subsequent accesses.

  • overwrite (bool) – whether to overwrite existing files or not. If set to True, the data will be redownloaded.

  • verbosity (int) –

    This parameter determines the level of verbosity for logging messages.

    • 0: no message will be printed

    • 1: only important messages will be printed

    • >1: any higher value greater than 1 will result in more verbose output

  • kwargs – Any other keyword arguments for the parent Datasets class

property dyn_map

A dictionary that maps dynamic features to their names in the dataset.

property dynamic_features: List[str]

returns names of dynamic features

property end: Timestamp

end of data

property static_features: List[str]

returns static features for Denmark catchments

property static_map: Dict[str, str]

A dictionary that maps static features to their names in the dataset.

stations() List[str][source]

returns names of stations a list

Node: 0s are omitted from the start of the station names which means 03001 is returned as 3001

class aqua_fetch.rr.CAMELS_LUX(path=None, timestep: str = 'D', overwrite=False, to_netcdf: bool = True, **kwargs)[source]

Bases: _RainfallRunoff

Dataset of 56 catchments from Luxembourg following the work of Nijzink et al., 2025. The dataset consists of 61 static catchment features and 25 dynamic features. The dynamic features span from 20040101 to 20211231 with daily, hourly, and 15-minute timesteps. The data is downloaded from Zenodo.

Examples

>>> from aqua_fetch import CAMELS_LUX
>>> dataset = CAMELS_LUX()
... # get data by station id
>>> _, dynamic = dataset.fetch(stations='ID_02', as_dataframe=True)
>>> df = dynamic['ID_02'] # dynamic is a dictionary of with keys as station names and values as DataFrames
>>> df.shape
(6209, 25)
...
... # get name of all stations as list
>>> stns = dataset.stations()
>>> len(stns)
   56
... # get data of 10 % of stations as dataframe
>>> _, dynamic = dataset.fetch(0.1, as_dataframe=True)
>>> len(dynamic)  # dynamic has data for 10% of stations (5)
   5
...
... # dynamic is a dictionary whose values are dataframes of dynamic features
>>> [df.shape for df in dynamic.values()]
    [(6209, 25), (6209, 25), (6209, 25),... (6209, 25), (6209, 25)]
...
... get the data of a single (randomly selected) station
>>> _, dynamic = dataset.fetch(stations=1, as_dataframe=True)
>>> len(dynamic)  # dynamic has data for 1 station
    1
... # get names of available dynamic features
>>> dataset.dynamic_features
... # get only selected dynamic features
>>> _, dynamic = dataset.fetch('ID_02', as_dataframe=True,
...  dynamic_features=['pcp_mm_station', 'rh_%', 'airtemp_C_mean', 'pet_mm_pm', 'q_cms_obs'])
>>> dynamic['ID_02'].shape
   (6209, 5)
...
... # get names of available static features
>>> dataset.static_features
... # get data of 10 random stations
>>> _, dynamic = dataset.fetch(10, as_dataframe=True)
>>> len(dynamic)  # remember this is a dictionary with values as dataframe
   10
...
# If we get both static and dynamic data
>>> static, dynamic = dataset.fetch(stations='ID_02', static_features="all", as_dataframe=True)
>>> static.shape, len(dynamic), dynamic['ID_02'].shape
((1, 61), 1, (6209, 25))
...
# If we don't set as_dataframe=True and have xarray installed then the returned data will be a xarray Dataset
>>> _, dynamic = dataset.fetch(10)
... type(dynamic)
xarray.core.dataset.Dataset
...
>>> dynamic.dims
FrozenMappingWarningOnValuesAccess({'time': 6209, 'dynamic_features': 25})
...
>>> len(dynamic.data_vars)
10
...
>>> coords = dataset.stn_coords() # returns coordinates of all stations
>>> coords.shape
    (56, 2)
>>> dataset.stn_coords('ID_02')  # returns coordinates of station whose id is ID_02
    49.586288       6.14908
>>> dataset.stn_coords(['ID_02', 'ID_01'])  # returns coordinates of two stations
...
# get area of a single station
>>> dataset.area('ID_02')
# get coordinates of two stations
>>> dataset.area(['ID_02', 'ID_01'])
...
# if fiona library is installed we can get the boundary as fiona Geometry
>>> dataset.get_boundary('ID_02')
...
# if we want to get hourly data we can do as below
>>> dataset = CAMELS_LUX(timestep='H')
>>> _, dynamic = dataset.fetch(stations='ID_02', as_dataframe=True)
>>> df.shape
(149016, 25)
...
# if we want to get 15Min data we can do as below
>>> dataset = CAMELS_LUX(timestep='15Min')
>>> _, dynamic = dataset.fetch(stations='ID_02', as_dataframe=True)
>>> df.shape
(596061, 25)
__init__(path=None, timestep: str = 'D', overwrite=False, to_netcdf: bool = True, **kwargs)[source]
Parameters:
  • path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.

  • timestep (str) – This can only be set for datasets which are available at multiple timesteps such as LamaHCE or LamaHIce etc.

  • to_netcdf (bool) – whether the data should be saved in netCDF format or not If set to true, the data will be saved in netCDF format which can take time for the first time it is created. However, it leads to faster I/O operations in subsequent accesses.

  • overwrite (bool) – whether to overwrite existing files or not. If set to True, the data will be redownloaded.

  • verbosity (int) –

    This parameter determines the level of verbosity for logging messages.

    • 0: no message will be printed

    • 1: only important messages will be printed

    • >1: any higher value greater than 1 will result in more verbose output

  • kwargs – Any other keyword arguments for the parent Datasets class

property dyn_map: Dict[str, str]

dynamic features map for CAMELS-LUX catchments

property dynamic_features: List[str]

Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.

Returns:

a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using fetch_dynamic_features().

Return type:

List[str]

property end: Timestamp

end of data

property static_factors: Dict[str, float]

static factors for CAMELS-LUX catchments

property static_features: List[str]

Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.

Returns:

a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using fetch_static_features().

Return type:

List[str]

property static_map: Dict[str, str]

A dictionary that maps static features to their names in the dataset.

stations() List[str][source]

returns names of stations a list

class aqua_fetch.rr.CAMELS_NZ(path: str | PathLike = None, **kwargs)[source]

Bases: _RainfallRunoff

Dataset of 369 catchments from New Zealand following the works of Harrigan et al., 2025. The dataset consists of 40 static catchment features and 5 dynamic features. The dynamic features span from 19720101 to 20240802 with hourly timestep. The data is downloaded from figshare. This data comes with daily and hourly timesteps and the each can be accessed by specifying value of tiemstep argument to D or H respectively during initialization.

Examples

>>> from aqua_fetch import CAMELS_NZ
>>> dataset = CAMELS_NZ()
... # get data by station id
>>> _, dynamic = dataset.fetch(stations='74321', as_dataframe=True)
>>> df = dynamic['74321'] # dynamic is a dictionary of with keys as station names and values as DataFrames
>>> df.shape
(19208, 5)
...
... # get name of all stations as list
>>> stns = dataset.stations()
>>> len(stns)
   347
... # get data of 10 % of stations as dataframe
>>> _, dynamic = dataset.fetch(0.1, as_dataframe=True)
>>> len(dynamic)  # dynamic has data for 10% of stations (34 out of 347)
   34
...
... # dynamic is a dictionary whose values are dataframes of dynamic features
>>> [df.shape for df in dynamic.values()]
    [(19208, 5), (19208, 5), (19208, 5),... (19208, 5), (19208, 5)]
...
... get the data of a single (randomly selected) station
>>> _, dynamic = dataset.fetch(stations=1, as_dataframe=True)
>>> len(dynamic)  # dynamic has data for 1 station
    1
... # get names of available dynamic features
>>> dataset.dynamic_features
... # get only selected dynamic features
>>> _, dynamic = dataset.fetch('74321', as_dataframe=True,
...  dynamic_features=['pcp_mm', 'rh_%', 'airtemp_C_mean', 'pet_mm', 'q_cms_obs'])
>>> dynamic['74321'].shape
   (19208, 4)
...
... # get names of available static features
>>> dataset.static_features
... # get data of 10 random stations
>>> _, dynamic = dataset.fetch(10, as_dataframe=True)
>>> len(dynamic)  # remember this is a dictionary with values as dataframe
   10
...
# If we get both static and dynamic data
>>> static, dynamic = dataset.fetch(stations='74321', static_features="all", as_dataframe=True)
>>> static.shape, len(dynamic), dynamic['74321'].shape
((1, 40), 1, (19208, 5))
...
# If we don't set as_dataframe=True and have xarray installed then the returned data will be a xarray Dataset
>>> _, dynamic = dataset.fetch(10)
... type(dynamic)
xarray.core.dataset.Dataset
...
>>> dynamic.dims
FrozenMappingWarningOnValuesAccess({'time': 19208, 'dynamic_features': 5})
...
>>> len(dynamic.data_vars)
10
...
>>> coords = dataset.stn_coords() # returns coordinates of all stations
>>> coords.shape
    (347, 2)
>>> dataset.stn_coords('74321')  # returns coordinates of station whose id is 74321
    -45.945599      170.101486
>>> dataset.stn_coords(['74321', '802'])  # returns coordinates of two stations
...
# get area of a single station
>>> dataset.area('74321')
# get coordinates of two stations
>>> dataset.area(['74321', '802'])
...
# if fiona library is installed we can get the boundary as fiona Geometry
>>> dataset.get_boundary('74321')
# The hourly data can be accessed by specifyng the timestep to 'H'
>>> dataset = CAMELS_NZ(timestep='H')
... # get data by station id
>>> _, dynamic = dataset.fetch(stations='74321', as_dataframe=True)
>>> df = dynamic['74321'] # dynamic is a dictionary of with keys as station names and values as DataFrames
>>> df.shape
(460928, 5)
__init__(path: str | PathLike = None, **kwargs)[source]
Parameters:
  • path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.

  • timestep (str) – This can only be set for datasets which are available at multiple timesteps such as LamaHCE or LamaHIce etc.

  • to_netcdf (bool) – whether the data should be saved in netCDF format or not If set to true, the data will be saved in netCDF format which can take time for the first time it is created. However, it leads to faster I/O operations in subsequent accesses.

  • overwrite (bool) – whether to overwrite existing files or not. If set to True, the data will be redownloaded.

  • verbosity (int) –

    This parameter determines the level of verbosity for logging messages.

    • 0: no message will be printed

    • 1: only important messages will be printed

    • >1: any higher value greater than 1 will result in more verbose output

  • kwargs – Any other keyword arguments for the parent Datasets class

property dyn_map: Dict[str, str]

A dictionary that maps dynamic features to their names in the dataset.

property dynamic_features: List[str]

returns names of dynamic features

property end: Timestamp

end of data

property static_features: List[str]

returns static features for New Zealand catchments

property static_map: Dict[str, str]

A dictionary that maps static features to their names in the dataset.

stations() List[str][source]

Names/ids of stations/catchment/gauges or whatever that would be used to index each station in the dataset.

class aqua_fetch.rr.CAMELS_SE(path: str = None, to_netcdf: bool = True, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]

Bases: _RainfallRunoff

Dataset of 50 Swedish catchments following the works of Teutschbein et al., 2024 . The data is downloaded from Swedish National Data Service website . The dataset consists of 76 static catchment features and 4 dynamic features. The dynamic features span from 19610101 to 20201231 with daily timestep.

Examples

>>> from aqua_fetch import CAMELS_SE
>>> dataset = CAMELS_SE()
... # get data by station id
>>> _, dynamic = dataset.fetch(stations='5', as_dataframe=True)
>>> df = dynamic['5'] # dynamic is a dictionary of with keys as station names and values as DataFrames
>>> df.shape
(21915, 4)
...
... # get name of all stations as list
>>> stns = dataset.stations()
>>> len(stns)
   50
... # get data of 10 % of stations as dataframe
>>> _, dynamic = dataset.fetch(0.1, as_dataframe=True)
>>> len(dynamic)  # dynamic has data for 10% of stations (5 out of 50)
   5
...
... # dynamic is a dictionary whose values are dataframes of dynamic features
>>> [df.shape for df in dynamic.values()]
    [(21915, 4), (21915, 4), (21915, 4),... (21915, 4), (21915, 4)]
...
... get the data of a single (randomly selected) station
>>> _, dynamic = dataset.fetch(stations=1, as_dataframe=True)
>>> len(dynamic)  # dynamic has data for 1 station
    1
... # get names of available dynamic features
>>> dataset.dynamic_features
... # get only selected dynamic features
>>> _, dynamic = dataset.fetch('5', as_dataframe=True,
...  dynamic_features=['q_cms_obs', 'q_mm_obs', 'pcp_mm', 'airtemp_C_mean'])
>>> dynamic['5'].shape
   (21915, 5)
...
... # get names of available static features
>>> dataset.static_features
... # get data of 10 random stations
>>> _, dynamic = dataset.fetch(10, as_dataframe=True)
>>> len(dynamic)  # remember this is a dictionary with values as dataframe
   10
...
# If we get both static and dynamic data
>>> static, dynamic = dataset.fetch(stations='5', static_features="all", as_dataframe=True)
>>> static.shape, len(dynamic), dynamic['5'].shape
((1, 76), 1, (21915, 4))
...
# If we don't set as_dataframe=True and have xarray installed then the returned data will be a xarray Dataset
>>> _, dynamic = dataset.fetch(10)
... type(dynamic)
xarray.core.dataset.Dataset
...
>>> dynamic.dims
FrozenMappingWarningOnValuesAccess({'time': 21915, 'dynamic_features': 4})
...
>>> len(dynamic.data_vars)
10
...
>>> coords = dataset.stn_coords() # returns coordinates of all stations
>>> coords.shape
    (50, 2)
>>> dataset.stn_coords('5')  # returns coordinates of station whose id is 5
    68.0356 21.9758
>>> dataset.stn_coords(['5', '200'])  # returns coordinates of two stations
...
# get area of a single station
>>> dataset.area('5')
# get coordinates of two stations
>>> dataset.area(['5', '200'])
...
# if fiona library is installed we can get the boundary as fiona Geometry
>>> dataset.get_boundary('5')
__init__(path: str = None, to_netcdf: bool = True, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]
Parameters:
  • path – path where the CAMELS_SE dataset has been downloaded. This path must contain five zip files and one xlsx file. If None, then the data will be downloaded.

  • to_netcdf

property dyn_map

A dictionary that maps dynamic features to their names in the dataset.

property dynamic_features: List[str]

Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.

Returns:

a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using fetch_dynamic_features().

Return type:

List[str]

property end

end of data

property static_features

Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.

Returns:

a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using fetch_static_features().

Return type:

List[str]

property static_map: Dict[str, str]

A dictionary that maps static features to their names in the dataset.

stations() List[str][source]

Names/ids of stations/catchment/gauges or whatever that would be used to index each station in the dataset.

class aqua_fetch.rr.CAMELS_SK(path=None, timestep: str = 'H', to_netcdf: bool = True, **kwargs)[source]

Bases: _RainfallRunoff

Dataset of 178 catchments from South Korea following the work of Kim et al., 2025. The dataset consists of 215 static catchment features and 17 dynamic features. The dynamic features span from 20000101 to 20191231 with hourly timestep.

Examples

>>> from aqua_fetch import CAMELS_SK
>>> dataset = CAMELS_SK()
... # get data by station id
>>> _, dynamic = dataset.fetch(stations='2013615', as_dataframe=True)
>>> df = dynamic['2013615'] # dynamic is a dictionary of with keys as station names and values as DataFrames
>>> df.shape
(175320, 17)
...
... # get name of all stations as list
>>> stns = dataset.stations()
>>> len(stns)
   178
... # get data of 10 % of stations as dataframe
>>> _, dynamic = dataset.fetch(0.1, as_dataframe=True)
>>> len(dynamic)  # dynamic has data for 10% of stations (17 out of 178)
   17
...
... # dynamic is a dictionary whose values are dataframes of dynamic features
>>> [df.shape for df in dynamic.values()]
    [(175320, 17), (175320, 17), (175320, 17),... (175320, 17), (175320, 17)]
...
... get the data of a single (randomly selected) station
>>> _, dynamic = dataset.fetch(stations=1, as_dataframe=True)
>>> len(dynamic)  # dynamic has data for 1 station
    1
... # get names of available dynamic features
>>> dataset.dynamic_features
... # get only selected dynamic features
>>> _, dynamic = dataset.fetch('2013615', as_dataframe=True,
...  dynamic_features=['total_precipitation', 'snow_depth', 'air_temp_obs', 'potential_evaporation', 'q_cms_obs'])
>>> dynamic['2013615'].shape
   (175320, 17)
...
... # get names of available static features
>>> dataset.static_features
... # get data of 10 random stations
>>> _, dynamic = dataset.fetch(10, as_dataframe=True)
>>> len(dynamic)  # remember this is a dictionary with values as dataframe
   10
...
# If we get both static and dynamic data
>>> static, dynamic = dataset.fetch(stations='2013615', static_features="all", as_dataframe=True)
>>> static.shape, len(dynamic), dynamic['2013615'].shape
((1, 215), 1, (175320, 17))
...
# If we don't set as_dataframe=True and have xarray installed then the returned data will be a xarray Dataset
>>> _, dynamic = dataset.fetch(10)
... type(dynamic)
xarray.core.dataset.Dataset
...
>>> dynamic.dims
FrozenMappingWarningOnValuesAccess({'time': 175320, 'dynamic_features': 17})
...
>>> len(dynamic.data_vars)
10
...
>>> coords = dataset.stn_coords() # returns coordinates of all stations
>>> coords.shape
    (178, 2)
>>> dataset.stn_coords('2013615')  # returns coordinates of station whose id is 2013615
    35.880798       128.173096
>>> dataset.stn_coords(['2013615', '2017620'])  # returns coordinates of two stations
...
# get area of a single station
>>> dataset.area('2013615')
# get coordinates of two stations
>>> dataset.area(['2013615', '2017620'])
...
# if fiona library is installed we can get the boundary as fiona Geometry
>>> dataset.get_boundary('2013615')
__init__(path=None, timestep: str = 'H', to_netcdf: bool = True, **kwargs)[source]
Parameters:
  • path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.

  • timestep (str) – This can only be set for datasets which are available at multiple timesteps such as LamaHCE or LamaHIce etc.

  • to_netcdf (bool) – whether the data should be saved in netCDF format or not If set to true, the data will be saved in netCDF format which can take time for the first time it is created. However, it leads to faster I/O operations in subsequent accesses.

  • overwrite (bool) – whether to overwrite existing files or not. If set to True, the data will be redownloaded.

  • verbosity (int) –

    This parameter determines the level of verbosity for logging messages.

    • 0: no message will be printed

    • 1: only important messages will be printed

    • >1: any higher value greater than 1 will result in more verbose output

  • kwargs – Any other keyword arguments for the parent Datasets class

property dyn_map: Dict[str, str]

dynamic features map for CAMELS-SK catchments

property dynamic_features: List[str]

Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.

Returns:

a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using fetch_dynamic_features().

Return type:

List[str]

property end: Timestamp

end of data

property start: Timestamp

start of data

property static_features: List[str]

Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.

Returns:

a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using fetch_static_features().

Return type:

List[str]

property static_map: Dict[str, str]

A dictionary that maps static features to their names in the dataset.

stations() List[str][source]

returns names of stations as a list

class aqua_fetch.rr.CAMELS_US(path: str | PathLike = None, data_source: str = 'daymet', **kwargs)[source]

Bases: _RainfallRunoff

This is a dataset of 671 US catchments with 59 static catchment features and 8 catchment averaged dynamic features for each catchment. The dynamic features are daily timeseries from 1980-01-01 to 2014-12-31. The data is downloaded from its zenodo repository . For more details on data refer to Newman et al., 2015 , Newman et al., 2022 and Addor et al., 2017.

Please note this data is also known as “CAMELS” however, we have named it CAMELS_US to differentiate it from other CAMELS like datasts from other parts of the world.

Examples

>>> from aqua_fetch import CAMELS_US
>>> dataset = CAMELS_US()
... # get data by station id
>>> _, dynamic = dataset.fetch(stations='11478500', as_dataframe=True)
>>> df = dynamic['11478500'] # dynamic is a dictionary of with keys as station names and values as DataFrames
>>> df.shape
(12784, 8)
...
... # get name of all stations as list
>>> stns = dataset.stations()
>>> len(stns)
   671
... # get data of 10 % of stations as dataframe
>>> _, dynamic = dataset.fetch(0.1, as_dataframe=True)
>>> len(dynamic)  # dynamic has data for 10% of stations (67 out of 671)
   67
...
... # dynamic is a dictionary whose values are dataframes of dynamic features
>>> [df.shape for df in dynamic.values()]
    [(12784, 8), (12784, 8), (12784, 8),... (12784, 8), (12784, 8)]
...
... get the data of a single (randomly selected) station
>>> _, dynamic = dataset.fetch(stations=1, as_dataframe=True)
>>> len(dynamic)  # dynamic has data for 1 station
    1
... # get names of available dynamic features
>>> dataset.dynamic_features
... # get only selected dynamic features
>>> _, dynamic = dataset.fetch('11478500', as_dataframe=True,
...  dynamic_features=['pcp_mm', 'solrad_wm2', 'airtemp_C_max', 'airtemp_C_min', 'q_cms_obs'])
>>> dynamic['11478500'].shape
   (12784, 5)
...
... # get names of available static features
>>> dataset.static_features
... # get data of 10 random stations
>>> _, dynamic = dataset.fetch(10, as_dataframe=True)
>>> len(dynamic)  # remember this is a dictionary with values as dataframe
   10
...
# If we get both static and dynamic data
>>> static, dynamic = dataset.fetch(stations='11478500', static_features="all", as_dataframe=True)
>>> static.shape, len(dynamic), dynamic['11478500'].shape
((1, 59), 1, (12784, 8))
...
# If we don't set as_dataframe=True and have xarray installed then the returned data will be a xarray Dataset
>>> _, dynamic = dataset.fetch(10)
... type(dynamic)
xarray.core.dataset.Dataset
...
>>> dynamic.dims
FrozenMappingWarningOnValuesAccess({'time': 12784, 'dynamic_features': 8})
...
>>> len(dynamic.data_vars)
10
...
>>> coords = dataset.stn_coords() # returns coordinates of all stations
>>> coords.shape
    (671, 2)
>>> dataset.stn_coords('11478500')  # returns coordinates of station whose id is 11478500
    40.480419       -123.890877
>>> dataset.stn_coords(['11478500', '14020000'])  # returns coordinates of two stations
...
# get area of a single station
>>> dataset.area('11478500')
# get coordinates of two stations
>>> dataset.area(['11478500', '14020000'])
...
# if fiona library is installed we can get the boundary as fiona Geometry
>>> dataset.get_boundary('11478500')
__init__(path: str | PathLike = None, data_source: str = 'daymet', **kwargs)[source]
Parameters:
  • path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless overwrite is set to True.

  • data_source (str) –

    source of meteorological timeseries data. Allowed values are

    • daymet

    • maurer

    • nldas

    • v1p15_daymet

    • v1p15_nldas

property dyn_map

A dictionary that maps dynamic features to their names in the dataset.

property dynamic_features: List[str]

Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.

Returns:

a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using fetch_dynamic_features().

Return type:

List[str]

property end

end of data

property static_features: List[str]

Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.

Returns:

a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using fetch_static_features().

Return type:

List[str]

property static_map: Dict[str, str]

A dictionary that maps static features to their names in the dataset.

stations() list[source]

Names/ids of stations/catchment/gauges or whatever that would be used to index each station in the dataset.

class aqua_fetch.rr.Caravan_DK(path=None, overwrite=False, to_netcdf: bool = True, **kwargs)[source]

Bases: _RainfallRunoff

Reads Caravan extension Denmark - Danish dataset for large-sample hydrology following the works of Koch and Schneider 2022 . The dataset is downloaded from zenodo . This dataset consists of static and dynamic features from 308 danish catchments. There are 38 dynamic (time series) features from 1981-01-02 to 2020-12-31 with daily timestep and 211 static features for each of 308 catchments.

Please note that there is an updated version of this dataset following the works of Liu et al., 2024 . This dataset is associated with the aqua_fetch.CAMELS_DK class which can be imported as follows:

>>> from aqua_fetch import CAMELS_DK

Examples

>>> from aqua_fetch import Caravan_DK
>>> dataset = Caravan_DK()
... # get data by station id
>>> _, dynamic = dataset.fetch(stations='80001', as_dataframe=True)
>>> df = dynamic['80001'] # dynamic is a dictionary of with keys as station names and values as DataFrames
>>> df.shape
(14609, 39)
...
... # get name of all stations as list
>>> stns = dataset.stations()
>>> len(stns)
   308
... # get data of 10 % of stations as dataframe
>>> _, dynamic = dataset.fetch(0.1, as_dataframe=True)
>>> len(dynamic)  # dynamic has data for 10% of stations (31 out of 308)
   31
...
... # dynamic is a dictionary whose values are dataframes of dynamic features
>>> [df.shape for df in dynamic.values()]
    [(14609, 39), (14609, 39), (14609, 39),... (14609, 39), (14609, 39)]
...
... get the data of a single (randomly selected) station
>>> _, dynamic = dataset.fetch(stations=1, as_dataframe=True)
>>> len(dynamic)  # dynamic has data for 1 station
    1
... # get names of available dynamic features
>>> dataset.dynamic_features
... # get only selected dynamic features
>>> _, dynamic = dataset.fetch('80001', as_dataframe=True,
...  dynamic_features=['snow_depth_water_equivalent_mean', 'temperature_2m_mean',  'q_cms_obs'])
>>> dynamic['80001'].shape
   (14609, 3)
...
... # get names of available static features
>>> dataset.static_features
... # get data of 10 random stations
>>> _, dynamic = dataset.fetch(10, as_dataframe=True)
>>> len(dynamic)  # remember this is a dictionary with values as dataframe
   10
...
# If we get both static and dynamic data
>>> static, dynamic = dataset.fetch(stations='80001', static_features="all", as_dataframe=True)
>>> static.shape, len(dynamic), dynamic['80001'].shape
((1, 211), 1, (14609, 39))
...
# If we don't set as_dataframe=True and have xarray installed then the returned data will be a xarray Dataset
>>> _, dynamic = dataset.fetch(10)
... type(dynamic)
xarray.core.dataset.Dataset
...
>>> dynamic.dims
FrozenMappingWarningOnValuesAccess({'time': 14609, 'dynamic_features': 39})
...
>>> len(dynamic.data_vars)
10
...
>>> coords = dataset.stn_coords() # returns coordinates of all stations
>>> coords.shape
    (308, 2)
>>> dataset.stn_coords('80001')  # returns coordinates of station whose id is 80001
    57.10371        10.3516
>>> dataset.stn_coords(['80001', '240001'])  # returns coordinates of two stations
...
# get area of a single station
>>> dataset.area('80001')
# get coordinates of two stations
>>> dataset.area(['80001', '240001'])
...
# if fiona library is installed we can get the boundary as fiona Geometry
>>> dataset.get_boundary('80001')
__init__(path=None, overwrite=False, to_netcdf: bool = True, **kwargs)[source]
Parameters:
  • path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless overwrite is set to True.

  • overwrite (bool) – If the data is already down then you can set it to True, to make a fresh download.

  • to_netcdf (bool) – whether to convert all the data into one netcdf file or not. This will fasten repeated calls to fetch etc but will require netCDF4 package as well as xarry.

property boundary_id_map: str

Name of the attribute in the boundary (shapefile/.gpkg) file that will be used to map the catchment/station id to the geometry of the catchment/station. This is used to create the boundary id map. if not given, then the first attribute in the boundary file will be used.

property caravan_attr_fpath

returns path to attributes_caravan_camelsdk.csv file

caravan_static_attributes(stations='all') DataFrame[source]
Return type:

a pandas.DataFrame of shape (308, 10)

property dyn_map: Dict[str, str]

A dictionary that maps dynamic features to their names in the dataset.

property dynamic_features: List[str]

returns names of dynamic features

property end: Timestamp

end of data

hyd_atlas_attributes(stations='all') DataFrame[source]
Return type:

a pandas.DataFrame of shape (308, 196)

property other_attr_fpath

returns path to attributes_other_camelsdk.csv file

other_static_attributes(stations='all') DataFrame[source]
Return type:

a pandas.DataFrame of shape (308, 5)

property static_features: List[str]

returns static features for Denmark catchments

property static_map: Dict[str, str]

A dictionary that maps static features to their names in the dataset.

stations() List[str][source]

Names/ids of stations/catchment/gauges or whatever that would be used to index each station in the dataset.

stn_coords(stations: str | List[str] = 'all') DataFrame[source]

returns coordinates of stations as DataFrame with long and lat as columns.

Parameters:

stations – name/names of stations. If not given, coordinates of all stations will be returned.

Returns:

pandas.DataFrame with long and lat columns. The length of dataframe will be equal to number of stations wholse coordinates are to be fetched.

Return type:

pd.DataFrame

Examples

>>> dataset = Caravan_DK()
>>> dataset.stn_coords() # returns coordinates of all stations
>>> dataset.stn_coords('100010')  # returns coordinates of station whose id is 912101A
>>> dataset.stn_coords(['100010', '210062'])  # returns coordinates of two stations
class aqua_fetch.rr.CCAM(path=None, overwrite: bool = False, to_netcdf: bool = True, **kwargs)[source]

Bases: _RainfallRunoff

Dataset for Yellow River (China) catchments. The CCAM dataset was published by Hao et al., 2021 and has two sets. One set consists of catchment attributes, meteorological data, catchment boundaries of over 4000 catchments. However this data does not have streamflow data. The second set consists of streamflow, catchment attributes, catchment boundaries and meteorological data for 102 catchments of Yellow River. Since this second set conforms to the norms of CAMELS, this class uses this second set. Therefore, the fetch, stations and other methods/attributes of this class return data of only Yellow River catchments and not for whole china. However, the first set of data is can also be fetched using fetch_meteo method of this class. The temporal extent of both sets is from 1999 to 2020. However, the streamflow time series in first set has very large number of missing values. The data of Yellow river consists fo 16 dynamic features (time series) and 124 static features (catchment attributes).

Examples

>>> from aqua_fetch import CCAM
>>> dataset = CCAM()
... # get data by station id
>>> _, dynamic = dataset.fetch(stations='0010', as_dataframe=True)
>>> df = dynamic['0010'] # dynamic is a dictionary of with keys as station names and values as DataFrames
>>> df.shape
(8035, 16)
...
... # get name of all stations as list
>>> stns = dataset.stations()
>>> len(stns)
   102
... # get data of 10 % of stations as dataframe
>>> _, dynamic = dataset.fetch(0.1, as_dataframe=True)
>>> len(dynamic)  # dynamic has data for 10% of stations (10 out of 102)
   10
...
... # dynamic is a dictionary whose values are dataframes of dynamic features
>>> [df.shape for df in dynamic.values()]
    [(8035, 16), (8035, 16), (8035, 16),... (8035, 16), (8035, 16)]
...
... get the data of a single (randomly selected) station
>>> _, dynamic = dataset.fetch(stations=1, as_dataframe=True)
>>> len(dynamic)  # dynamic has data for 1 station
    1
... # get names of available dynamic features
>>> dataset.dynamic_features
... # get only selected dynamic features
>>> _, dynamic = dataset.fetch('0010', as_dataframe=True,
...  dynamic_features=['pcp_mm', 'airtemp_C_mean', 'evap_mm', 'rh_%', 'q_cms_obs'])
>>> dynamic['0010'].shape
   (8035, 5)
...
... # get names of available static features
>>> dataset.static_features
... # get data of 10 random stations
>>> _, dynamic = dataset.fetch(10, as_dataframe=True)
>>> len(dynamic)  # remember this is a dictionary with values as dataframe
   10
...
# If we get both static and dynamic data
>>> static, dynamic = dataset.fetch(stations='0010', static_features="all", as_dataframe=True)
>>> static.shape, len(dynamic), dynamic['0010'].shape
((1, 124), 1, (8035, 8))
...
# If we don't set as_dataframe=True and have xarray installed then the returned data will be a xarray Dataset
>>> _, dynamic = dataset.fetch(10)
... type(dynamic)
xarray.core.dataset.Dataset
...
>>> dynamic.dims
FrozenMappingWarningOnValuesAccess({'time': 8035, 'dynamic_features': 16})
...
>>> len(dynamic.data_vars)
10
...
>>> coords = dataset.stn_coords() # returns coordinates of all stations
>>> coords.shape
    (102, 2)
>>> dataset.stn_coords('0010')  # returns coordinates of station whose id is 0010
    36.059803       112.3638
>>> dataset.stn_coords(['0010', '0104'])  # returns coordinates of two stations
...
# get area of a single station
>>> dataset.area('0010')
# get coordinates of two stations
>>> dataset.area(['0010', '0104'])
...
# if fiona library is installed we can get the boundary as fiona Geometry
>>> dataset.get_boundary('0010')
__init__(path=None, overwrite: bool = False, to_netcdf: bool = True, **kwargs)[source]
Parameters:
  • path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless overwrite is set to True.

  • overwrite (bool) – If the data is already down then you can set it to True, to make a fresh download.

  • to_netcdf (bool) – whether to convert all the data into one netcdf file or not. This will fasten repeated calls to fetch etc but will require netCDF4 package as well as xarry.

property dyn_map

A dictionary that maps dynamic features to their names in the dataset.

property dynamic_features: List[str]

names of hydro-meteorological time series data for Yellow River catchments

property end

end of data

fetch_meteo(station: str | List[str] = 'all', features: str | List[str] = 'all', st='1990-01-01', en='2021-03-31', as_dataframe: bool = True)[source]

fetches meteorological data of 4902 chinese catchments

Examples

>>> from aqua_fetch import CCAM
>>> dataset = CCAM()
>>> dynamic_features = ['PRE', 'TEM', 'PRS', 'RHU', 'EVP', 'WIN', 'PET']
>>> st = '1999-01-01'
>>> en = '2020-03-31'
>>> xds = dataset.fetch_meteo(features=features, st=st, en=en)
property meteo_path

path where daily meteorological data of stations is present

property static_features: List[str]

names of static features for Yellow River catchments

property static_map: Dict[str, str]

A dictionary that maps static features to their names in the dataset.

stations()[source]

Returns station ids for catchments on Yellow River

class aqua_fetch.rr.Finland(path: str | PathLike = None, estreams_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]

Bases: _EStreams

Data of 669 catchments of Finland. The observed streamflow data is downloaded from https://wwwi3.ymparisto.fi . The meteorological data, stattic catchment features and catchment boundaries are taken from aqua_fetch.EStreams follwoing the works of Nascimento et al., 2024 . Therefore, the number of static features are 214 and dynamic features are 10 and the data is available from 2012-01-01 to 2023-06-30.

Examples

>>> from aqua_fetch import Finland
>>> dataset = Finland()
>>> _, data = dataset.fetch(0.1)  # the returned data will be a xarray Dataset
>>> type(data)
    xarray.core.dataset.Dataset
>>> data.dims
FrozenMappingWarningOnValuesAccess({'time': 4199, 'dynamic_features': 10})
>>> len(data.data_vars)  # number of stations for which data has been fetched
    66
>>> _, data = dataset.fetch(stations=1)  # get data of only one random station
# get name of all stations as list
>>> stns = dataset.stations()
>>> len(stns)
669
# get data by station id
>>> _, data = dataset.fetch(stations='FI000001')
# get names of available dynamic features
>>> dataset.dynamic_features
# get only selected dynamic features
>>> _, data = dataset.fetch(1,
... dynamic_features=['pcp_mm', 'rh_%', 'airtemp_C_mean', 'pet_mm', 'q_cms_obs'])
# get names of available static features
>>> dataset.static_features
# get data of 10 random stations
>>> _, data = dataset.fetch(10)
>>> len(data.data_vars)
10
# If we want to get both static and dynamic data
>>> static, dynamic = dataset.fetch(stations='FI000001', static_features="all")
>>> static.shape, len(dynamic.data_vars)
((1, 214), 1)
>>> coords = dataset.stn_coords() # returns coordinates of all stations
>>> coords.shape
    (669, 2)
>>> dataset.stn_coords('FI000001')  # returns coordinates of station whose id is FI000001
    64.226288       27.736528
>>> dataset.stn_coords(['FI000001', 'FI000002'])  # returns coordinates of two stations
FI000001    64.226288       27.736528
FI000002    64.226288       27.736528
__init__(path: str | PathLike = None, estreams_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
Parameters:
  • path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.

  • timestep (str) – This can only be set for datasets which are available at multiple timesteps such as LamaHCE or LamaHIce etc.

  • to_netcdf (bool) – whether the data should be saved in netCDF format or not If set to true, the data will be saved in netCDF format which can take time for the first time it is created. However, it leads to faster I/O operations in subsequent accesses.

  • overwrite (bool) – whether to overwrite existing files or not. If set to True, the data will be redownloaded.

  • verbosity (int) –

    This parameter determines the level of verbosity for logging messages.

    • 0: no message will be printed

    • 1: only important messages will be printed

    • >1: any higher value greater than 1 will result in more verbose output

  • kwargs – Any other keyword arguments for the parent Datasets class

fetch_q(as_dataframe: bool = True, overwrite: bool = False)[source]

downloads (if not already downloaded) and returns the daily streamflow data of Finland. either as pandas.DataFrame or as xarray dataset.

gauge_id_basin_id_map() dict[source]

For example for Portugal, it is guage_id : ‘03J/02H’ basin_id ‘PT000001’ ‘03J/02H’ -> ‘PT000001’

for Slovenia, it is gauge id : 1060 basin_id : SI000001 ‘1060’ -> ‘SI000001’

class aqua_fetch.rr.GRDCCaravan(path=None, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]

Bases: _RainfallRunoff

This is a dataset of 5357 catchments from around the globe following the works of Faerber et al., 2023 . The dataset consists of 39 dynamic (timeseries) features and 211 static features. The dynamic (timeseries) data spands from 1950-01-02 to 2019-05-19.

if xarray + netCDF4 packages are installed then netcdf files will be downloaded otherwise csv files will be downloaded and used.

Examples

>>> from aqua_fetch import GRDCCaravan
>>> dataset = GRDCCaravan()
... # get data by station id
>>> _, dynamic = dataset.fetch(stations='GRDC_3664802', as_dataframe=True)
>>> df = dynamic['GRDC_3664802'] # dynamic is a dictionary of with keys as station names and values as DataFrames
>>> df.shape
(26801, 39)
...
... # get name of all stations as list
>>> stns = dataset.stations()
>>> len(stns)
   5357
... # get data of 10 % of stations as dataframe
>>> _, dynamic = dataset.fetch(0.1, as_dataframe=True)
>>> len(dynamic)  # dynamic has data for 10% of stations (535 out of 5357)
   535
...
... # dynamic is a dictionary whose values are dataframes of dynamic features
>>> [df.shape for df in dynamic.values()]
    [(26801, 39), (26801, 39), (26801, 39),... (26801, 39), (26801, 39)]
...
... get the data of a single (randomly selected) station
>>> _, dynamic = dataset.fetch(stations=1, as_dataframe=True)
>>> len(dynamic)  # dynamic has data for 1 station
    1
... # get names of available dynamic features
>>> dataset.dynamic_features
... # get only selected dynamic features
>>> _, dynamic = dataset.fetch('GRDC_3664802', as_dataframe=True,
...  dynamic_features=['total_precipitation_sum', 'potential_evaporation_sum', 'temperature_2m_mean', 'q_cms_obs'])
>>> dynamic['GRDC_3664802'].shape
   (26801, 4)
...
... # get names of available static features
>>> dataset.static_features
... # get data of 10 random stations
>>> _, dynamic = dataset.fetch(10, as_dataframe=True)
>>> len(dynamic)  # remember this is a dictionary with values as dataframe
   10
...
# If we get both static and dynamic data
>>> static, dynamic = dataset.fetch(stations='GRDC_3664802', static_features="all", as_dataframe=True)
>>> static.shape, len(dynamic), dynamic['GRDC_3664802'].shape
((1, 211), 1, (26801, 39))
...
# If we don't set as_dataframe=True and have xarray installed then the returned data will be a xarray Dataset
>>> _, dynamic = dataset.fetch(10)
... type(dynamic)
xarray.core.dataset.Dataset
...
>>> dynamic.dims
FrozenMappingWarningOnValuesAccess({'time': 26801, 'dynamic_features': 39})
...
>>> len(dynamic.data_vars)
10
...
>>> coords = dataset.stn_coords() # returns coordinates of all stations
>>> coords.shape
    (5357, 2)
>>> dataset.stn_coords('GRDC_3664802')  # returns coordinates of station whose id is GRDC_3664802
    -26.2271        -51.0771
>>> dataset.stn_coords(['GRDC_3664802', 'GRDC_1159337'])  # returns coordinates of two stations
...
# get area of a single station
>>> dataset.area('GRDC_3664802')
# get coordinates of two stations
>>> dataset.area(['GRDC_3664802', 'GRDC_1159337'])
...
# if fiona library is installed we can get the boundary as fiona Geometry
>>> dataset.get_boundary('GRDC_3664802')
...
__init__(path=None, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]
Parameters:
  • path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.

  • timestep (str) – This can only be set for datasets which are available at multiple timesteps such as LamaHCE or LamaHIce etc.

  • to_netcdf (bool) – whether the data should be saved in netCDF format or not If set to true, the data will be saved in netCDF format which can take time for the first time it is created. However, it leads to faster I/O operations in subsequent accesses.

  • overwrite (bool) – whether to overwrite existing files or not. If set to True, the data will be redownloaded.

  • verbosity (int) –

    This parameter determines the level of verbosity for logging messages.

    • 0: no message will be printed

    • 1: only important messages will be printed

    • >1: any higher value greater than 1 will result in more verbose output

  • kwargs – Any other keyword arguments for the parent Datasets class

property dyn_map

A dictionary that maps dynamic features to their names in the dataset.

property dynamic_features

Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.

Returns:

a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using fetch_dynamic_features().

Return type:

List[str]

property end

end of data

fetch_station_features(station: str, dynamic_features: str | list | None = 'all', static_features: str | list | None = None, st: str | None = None, en: str | None = None, **kwargs) tuple[DataFrame, DataFrame][source]

Fetches features for one station.

Parameters:
  • station – station id/gauge id for which the data is to be fetched.

  • dynamic_features (str/list, optional) – names of dynamic features/attributes to fetch

  • static_features – names of static features/attributes to be fetches

  • st (str,optional) – starting point from which the data to be fetched. By default, the data will be fetched from where it is available.

  • en (str, optional) – end point of data to be fetched. By default the dat will be fetched

Returns:

A tuple of static and dynamic features, both as pandas.DataFrame. The dataframe of static features will be of single row while the dynamic features will be of shape (time, dynamic features).

Return type:

tuple

Examples

>>> from aqua_fetch import GRDCCaravan
>>> dataset = GRDCCaravan()
>>> dataset.fetch_station_features('912101A')
property static_features

Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.

Returns:

a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using fetch_static_features().

Return type:

List[str]

property static_map: Dict[str, str]

A dictionary that maps static features to their names in the dataset.

stations() List[str][source]

Names/ids of stations/catchment/gauges or whatever that would be used to index each station in the dataset.

class aqua_fetch.rr.HYSETS(path: str, sources: Dict[str, str] = None, **kwargs)[source]

Bases: _RainfallRunoff

database for hydrometeorological modeling of 14,425 North American watersheds from 1950-2023 following the work of Arsenault et al., 2020 This data has 20 dynamic features and 30 static features. Most of the dynamic features have more than one source. The data is available in netcdf format therefore, this package requires xarray and netCDF4 to be installed..

Following data_source are available.

sources

dynamic_features

SNODAS_SWE

dscharge, swe

SCDNA

discharge, pr, tasmin, tasmax

nonQC_stations

discharge, pr, tasmin, tasmax

Livneh

discharge, pr, tasmin, tasmax

ERA5

discharge, pr, tasmax, tasmin

ERAS5Land_SWE

discharge, swe

ERA5Land

discharge, pr, tasmax, tasmin

all sources contain one or more following dynamic_features with following shapes

dynamic_features

shape

time

(25202,)

watershedID

(14425,)

drainage_area

(14425,)

drainage_area_GSIM

(14425,)

flag_GSIM_boundaries

(14425,)

flag_artificial_boundaries

(14425,)

centroid_lat

(14425,)

centroid_lon

(14425,)

elevation

(14425,)

slope

(14425,)

discharge

(14425, 25202)

pr

(14425, 25202)

tasmax

(14425, 25202)

tasmin

(14425, 25202)

Examples

>>> from aqua_fetch import HYSETS
>>> dataset = HYSETS()
... # get data by station id
>>> _, dynamic = dataset.fetch(stations='5', as_dataframe=True)
>>> df = dynamic['5'] # dynamic is a dictionary of with keys as station names and values as DataFrames
>>> df.shape
(27028, 20)
...
... # get name of all stations as list
>>> stns = dataset.stations()
>>> len(stns)
   14425
... # get data of 10 % of stations as dataframe
>>> _, dynamic = dataset.fetch(0.1, as_dataframe=True)
>>> len(dynamic)  # dynamic has data for 10% of stations (1442 out of 14425)
   1442
...
... # dynamic is a dictionary whose values are dataframes of dynamic features
>>> [df.shape for df in dynamic.values()]
    [(27028, 20), (27028, 20), (27028, 20),... (27028, 20), (27028, 20)]
...
... get the data of a single (randomly selected) station
>>> _, dynamic = dataset.fetch(stations=1, as_dataframe=True)
>>> len(dynamic)  # dynamic has data for 1 station
    1
... # get names of available dynamic features
>>> dataset.dynamic_features
... # get only selected dynamic features
>>> _, dynamic = dataset.fetch('5', as_dataframe=True,
...  dynamic_features=['evap_mm', 'pcp_mm', 'snowmelt_mm', 'swe_mm', 'q_cms_obs'])
>>> dynamic['5'].shape
   (27028, 5)
...
... # get names of available static features
>>> dataset.static_features
... # get data of 10 random stations
>>> _, dynamic = dataset.fetch(10, as_dataframe=True)
>>> len(dynamic)  # remember this is a dictionary with values as dataframe
   10
...
# If we get both static and dynamic data
>>> static, dynamic = dataset.fetch(stations='5', static_features="all", as_dataframe=True)
>>> static.shape, len(dynamic), dynamic['5'].shape
((1, 30), 1, (27028, 20))
...
# If we don't set as_dataframe=True and have xarray installed then the returned data will be a xarray Dataset
>>> _, dynamic = dataset.fetch(10)
... type(dynamic)
xarray.core.dataset.Dataset
...
>>> dynamic.dims
FrozenMappingWarningOnValuesAccess({'time': 27028, 'dynamic_features': 20})
...
>>> len(dynamic.data_vars)
10
...
>>> coords = dataset.stn_coords() # returns coordinates of all stations
>>> coords.shape
    (14425, 2)
>>> dataset.stn_coords('5')  # returns coordinates of station whose id is 5
    47.091389       -67.731392
>>> dataset.stn_coords(['5', '12'])  # returns coordinates of two stations
...
# get area of a single station
>>> dataset.area('5')
# get coordinates of two stations
>>> dataset.area(['5', '12'])
...
# if fiona library is installed we can get the boundary as fiona Geometry
>>> dataset.get_boundary('5')
__init__(path: str, sources: Dict[str, str] = None, **kwargs)[source]
Parameters:
  • path (str) – The path under which the data is to be saved or is saved already. If the data is alredy downloaded then provide the path under which HYSETS data is located. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless overwrite is set to True.

  • sources (dict) –

    sources for each dynamic feature. The keys should be dynamic features and values should be sources. Available sources for the dynamic features are as below

    • 10m_u_component_of_wind: [‘ERA5’, ‘ERA5Land’]

    • 10m_v_component_of_wind: [‘ERA5’, ‘ERA5Land’]

    • 2m_dewpoint: [‘ERA5’, ‘ERA5Land’]

    • 2m_tasmax: [‘NRCAN’, ‘Livneh’, ‘QC_stations’, ‘ERA5’, ‘nonQC_stations’, ‘ERA5Land’, ‘SCDNA’]

    • 2m_tasmin: [‘NRCAN’, ‘Livneh’, ‘QC_stations’, ‘ERA5’, ‘nonQC_stations’, ‘ERA5Land’, ‘SCDNA’]

    • discharge: [‘NRCAN’, ‘ERA5’, ‘ERA5Land’, ‘Livneh’, ‘nonQC_stations’, ‘SCDNA’, ‘SNODAS’, ‘QC_stations’]

    • evaporation: [‘ERA5’, ‘ERA5Land’]

    • snow_density: [‘ERA5’, ‘ERA5Land’]

    • snow_evaporation: [‘ERA5’, ‘ERA5Land’]

    • snow_water_equivalent: [‘ERA5’, ‘ERA5Land’, ‘SNODAS’]

    • snowfall: [‘ERA5’, ‘ERA5Land’]

    • snowmelt: [‘ERA5’, ‘ERA5Land’]

    • surface_downwards_solar_radiation: [‘ERA5’, ‘ERA5Land’]

    • surface_downwards_thermal_radiation: [‘ERA5’, ‘ERA5Land’]

    • surface_net_solar_radiation: [‘ERA5’, ‘ERA5Land’]

    • surface_net_thermal_radiation: [‘ERA5’, ‘ERA5Land’]

    • surface_pressure: [‘ERA5’, ‘ERA5Land’]

    • surface_runoff: [‘ERA5’, ‘ERA5Land’]

    • total_cloud_cover: [‘ERA5’]

    • total_precipitation: [‘NRCAN’, ‘Livneh’, ‘QC_stations’, ‘ERA5’, ‘nonQC_stations’, ‘ERA5Land’, ‘SCDNA’]

  • kwargs – arguments for _RainfallRunoff base class

property OfficialID_WatershedID_map

A dictionary mapping Official_ID to Watershed_ID. For example ‘1’: ‘01AD002’

property WatershedID_OfficialID_map

A dictionary mapping Watershed_ID to Official_ID. For example ‘01AD002’: ‘1’

area(stations: str | List[str] = 'all', source: str = 'other') Series[source]

Returns area_gov (Km2) of all catchments as pandas.Series

Parameters:
  • stations (str/list) – name/names of stations. Default is None, which will return area of all stations

  • source (str) – source of area calculation. It should be either gsim or other

Returns:

a pandas.Series whose indices are catchment ids and values are areas of corresponding catchments.

Return type:

pd.Series

Examples

>>> from aqua_fetch import HYSETS
>>> dataset = HYSETS()
>>> dataset.area()  # returns area of all stations
>>> dataset.area('92')  # returns area of station whose id is 912101A
>>> dataset.area(['92', '142'])  # returns area of two stations
property boundary_id_map: str

Name of the attribute in the boundary (.shp/.gpkg) file that will be used to map the catchment/station id to the geometry of the catchment/station. This is used to create the boundary id map.

property dyn_map: Dict[str, str]

A dictionary that maps dynamic features to their names in the dataset.

property dynamic_features: List[str]

Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.

Returns:

a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using fetch_dynamic_features().

Return type:

List[str]

property end: Timestamp

end of data

fetch_dynamic_features(station, dynamic_features='all', st=None, en=None, as_dataframe=False)[source]

Fetches dynamic features of one station.

Examples

>>> from aqua_fetch import HYSETS
>>> dataset = HYSETS()
>>> dyn_features = dataset.fetch_dynamic_features('station_name')
fetch_stations_features(stations: list, dynamic_features: str | list | None = 'all', static_features: str | list | None = None, st=None, en=None, as_dataframe: bool = False, **kwargs) Tuple[DataFrame, DataFrame | Dataset][source]

returns features of multiple stations .. rubric:: Examples

>>> from aqua_fetch import HYSETS
>>> dataset = HYSETS()
>>> stations = dataset.stations()[0:3]
>>> features = dataset.fetch_stations_features(stations)
property static_features: List[str]

Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.

Returns:

a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using fetch_static_features().

Return type:

List[str]

property static_map: Dict[str, str]

A dictionary that maps static features to their names in the dataset.

stations() List[str][source]

retuns a list of station names. The Watershed_ID of the station is used as station name instead of Official_ID. This is because in .nc files watershed_ID is used for stations instead of Official_ID. Official_ID starts with 1, 2, 3 and so on while Watershed_ID is a code from meteo agency such as 01AD002 for station 1.

Returns:

a list of ids of stations

Return type:

list

Examples

>>> from aqua_fetch import HYSETS
>>> dataset = HYSETS()
... # get name of all stations as list
>>> dataset.stations()
usgs_stations() List[str][source]

Returns the names of stations which are taken from USGS as list

class aqua_fetch.rr.HYPE(time_step: str = 'daily', path=None, **kwargs)[source]

Bases: _RainfallRunoff

Downloads and preprocesses HYPE [1] dataset from Lindstroem et al., 2010 [2] . This is a rainfall-runoff dataset of Costa Rica of 564 stations from 1985 to 2019 at daily, monthly and yearly time steps.

Examples

>>> from aqua_fetch import HYPE
>>> dataset = HYPE()
... # get data by station id
>>> _, dynamic = dataset.fetch(stations='564', as_dataframe=True)
>>> df = dynamic['564'] # dynamic is a dictionary of with keys as station names and values as DataFrames
>>> df.shape
(12783, 9)
...
... # get name of all stations as list
>>> stns = dataset.stations()
>>> len(stns)
   564
... # get data of 10 % of stations as dataframe
>>> _, dynamic = dataset.fetch(0.1, as_dataframe=True)
>>> len(dynamic)  # dynamic has data for 10% of stations (67 out of 671)
   67
...
... # dynamic is a dictionary whose values are dataframes of dynamic features
>>> [df.shape for df in dynamic.values()]
    [(12783, 9), (12783, 9), (12783, 9),... (12783, 9), (12783, 9)]
...
... get the data of a single (randomly selected) station
>>> _, dynamic = dataset.fetch(stations=1, as_dataframe=True)
>>> len(dynamic)  # dynamic has data for 1 station
    1
... # get names of available dynamic features
>>> dataset.dynamic_features
... # get only selected dynamic features
>>> _, dynamic = dataset.fetch('564', as_dataframe=True,
...  dynamic_features=['AET_mm', 'Prec_mm',  'Streamflow_mm'])
>>> dynamic['564'].shape
   (12783, 3)
...
... # get names of available static features
>>> dataset.static_features
... # get data of 10 random stations
>>> _, dynamic = dataset.fetch(10, as_dataframe=True)
>>> len(dynamic)  # remember this is a dictionary with values as dataframe
   10
...
# If we get both static and dynamic data
>>> static, dynamic = dataset.fetch(stations='564', static_features="all", as_dataframe=True)
>>> static.shape, len(dynamic), dynamic['564'].shape
((1, 59), 1, (12783, 9))
...
# If we don't set as_dataframe=True and have xarray installed then the returned data will be a xarray Dataset
>>> _, dynamic = dataset.fetch(10)
... type(dynamic)
xarray.core.dataset.Dataset
...
>>> dynamic.dims
FrozenMappingWarningOnValuesAccess({'time': 12783, 'dynamic_features': 9})
...
>>> len(dynamic.data_vars)
10
...
>>> coords = dataset.stn_coords() # returns coordinates of all stations
>>> coords.shape
    (564, 2)
>>> dataset.stn_coords('564')  # returns coordinates of station whose id is 564
    40.480419       -123.890877
>>> dataset.stn_coords(['564', '563'])  # returns coordinates of two stations
...
# get area of a single station
>>> dataset.area('564')
# get coordinates of two stations
>>> dataset.area(['564', '563'])
...
# if fiona library is installed we can get the boundary as fiona Geometry
>>> dataset.get_boundary('564')
__init__(time_step: str = 'daily', path=None, **kwargs)[source]
Parameters:
  • path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless overwrite is set to True.

  • time_step (str) – one of daily, month or year

  • **kwargs – key word arguments

area(stations: str | List[str] = 'all') Series[source]

Returns area (Km2) of all catchments as pandas.Series

Parameters:

stations (str/list) – name/names of stations. Default is None, which will return area of all stations

Returns:

a pandas.Series whose indices are catchment ids and values are areas of corresponding catchments.

Return type:

pd.Series

Examples

>>> from aqua_fetch import HYPE
>>> dataset = HYPE()
>>> dataset.area()  # returns area of all stations
>>> dataset.stn_coords('2')  # returns area of station whose id is 912101A
>>> dataset.stn_coords(['2', '605'])  # returns area of two stations
property end

end of data

fetch_static_features(station, static_features=None)[source]

static data for HYPE is not available.

property static_features

Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.

Returns:

a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using fetch_static_features().

Return type:

List[str]

stations() list[source]

Names/ids of stations/catchment/gauges or whatever that would be used to index each station in the dataset.

stn_coords(stations: str | List[str] = 'all') DataFrame[source]

returns coordinates of stations as DataFrame with long and lat as columns.

Parameters:

stations – name/names of stations. If not given, coordinates of all stations will be returned.

Examples

>>> dataset = HYPE()
>>> dataset.stn_coords() # returns coordinates of all stations
>>> dataset.stn_coords('2')  # returns coordinates of station whose id is 912101A
>>> dataset.stn_coords(['2', '605'])  # returns coordinates of two stations
class aqua_fetch.Ireland(path: str | PathLike = None, estreams_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]

Bases: _EStreams

Data of 464 catchments of Ireland. Out of these 464 catchments, 280 are from OPW and 184 are from EPA. The observed streamflow data for EPA stations is downloaded from https://epawebapp.epa.ie/Hydronet/#Flow while the observed streamflow for OPW stations is downloaded from https://waterlevel.ie/hydro-data/#/overview/Waterlevel. It should be that out of 280 OPW stations, streamflow data is available for only 129 stations. The meteorological data, static catchment features and catchment boundaries are taken from aqua_fetch.EStreams follwoing the works of Nascimento et al., 2024 project. Therefore, the number of static features are 214 and dynamic features are 10 and the data is available from 1992-01-01 to 2020-06-31.

Examples

>>> from aqua_fetch import Ireland
>>> dataset = Ireland()
>>> _, data = dataset.fetch(0.1)  # the returned data will be a xarray Dataset
>>> type(data)
    xarray.core.dataset.Dataset
>>> data.dims
FrozenMappingWarningOnValuesAccess({'time': 26844, 'dynamic_features': 10})
>>> len(data.data_vars)  # number of stations for which data has been fetched
    46
>>> _, data = dataset.fetch(stations=1)  # get data of only one random station
# get name of all stations as list
>>> stns = dataset.stations()
>>> len(stns)
464
# get data by station id
>>> _, data = dataset.fetch(stations='IEEP0281')
# get names of available dynamic features
>>> dataset.dynamic_features
# get only selected dynamic features
>>> _, data = dataset.fetch(1,
... dynamic_features=['pcp_mm', 'rh_%', 'airtemp_C_mean', 'pet_mm', 'q_cms_obs'])
# get names of available static features
>>> dataset.static_features
# get data of 10 random stations
>>> _, data = dataset.fetch(10)
>>> len(data.data_vars)
10
# If we want to get both static and dynamic data
>>> static, dynamic = dataset.fetch(stations='IEEP0281', static_features="all")
>>> static.shape, len(dynamic.data_vars)
((1, 214), 1)
>>> coords = dataset.stn_coords() # returns coordinates of all stations
>>> coords.shape
    (464, 2)
>>> dataset.stn_coords('IEEP0281')  # returns coordinates of station whose id is IEEP0281
    52.217434       -8.494649
>>> dataset.stn_coords(['IEEP0281', 'IEEP0282'])  # returns coordinates of two stations
IEEP0281    52.217434       -8.494649
IEEP0282    54.284546       -6.921607
__init__(path: str | PathLike = None, estreams_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
Parameters:
  • path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.

  • timestep (str) – This can only be set for datasets which are available at multiple timesteps such as LamaHCE or LamaHIce etc.

  • to_netcdf (bool) – whether the data should be saved in netCDF format or not If set to true, the data will be saved in netCDF format which can take time for the first time it is created. However, it leads to faster I/O operations in subsequent accesses.

  • overwrite (bool) – whether to overwrite existing files or not. If set to True, the data will be redownloaded.

  • verbosity (int) –

    This parameter determines the level of verbosity for logging messages.

    • 0: no message will be printed

    • 1: only important messages will be printed

    • >1: any higher value greater than 1 will result in more verbose output

  • kwargs – Any other keyword arguments for the parent Datasets class

download_epa_data_seq()[source]

Examples

>>> epa_df = download_epa_data()
download_opw_data_seq()[source]

Examples

>>> opw_df = download_opw_data()
gauge_id_basin_id_map() dict[source]

A dictionary whose keys are gauge_id and values are basin_id. Supposing guage_id is ‘18118’ and basin_id is ‘IEEP0281’ then ‘18118’ -> ‘IEEP0281’

class aqua_fetch.rr.Italy(path: str | PathLike = None, estreams_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]

Bases: _EStreams

Data of 294 catchments of Italy. The observed streamflow data is downloaded from http://www.hiscentral.isprambiente.gov.it/hiscentral/hydromap.aspx?map=obsclient . The meteorological data, static catchment features and catchment boundaries are taken from aqua_fetch.EStreams follwoing the works of Nascimento et al., 2024 . Therefore, the number of static features are 214 and dynamic features are 10 and the data is available from 1992-01-01 to 2020-06-31.

Examples

>>> from aqua_fetch import Italy
>>> dataset = Italy()
>>> _, data = dataset.fetch(0.1)  # the returned data will be a xarray Dataset
>>> type(data)
    xarray.core.dataset.Dataset
>>> data.dims
FrozenMappingWarningOnValuesAccess({'time': 26844, 'dynamic_features': 10})
>>> len(data.data_vars)  # number of stations for which data has been fetched
    29
>>> _, data = dataset.fetch(stations=1)  # get data of only one random station
# get name of all stations as list
>>> stns = dataset.stations()
>>> len(stns)
294
# get data by station id
>>> _, data = dataset.fetch(stations='ITIS0001')
# get names of available dynamic features
>>> dataset.dynamic_features
# get only selected dynamic features
>>> _, data = dataset.fetch(1,
... dynamic_features=['pcp_mm', 'rh_%', 'airtemp_C_mean', 'pet_mm', 'q_cms_obs'])
# get names of available static features
>>> dataset.static_features
# get data of 10 random stations
>>> _, data = dataset.fetch(10)
>>> len(data.data_vars)
10
# If we want to get both static and dynamic data
>>> static, dynamic = dataset.fetch(stations='ITIS0001', static_features="all")
>>> static.shape, len(dynamic.data_vars)
((1, 214), 1)
>>> coords = dataset.stn_coords() # returns coordinates of all stations
>>> coords.shape
    (294, 2)
>>> dataset.stn_coords('ITIS0001')  # returns coordinates of station whose id is ITIS0001
    42.835835       13.919167
>>> dataset.stn_coords(['ITIS0001', 'ITIS0002'])  # returns coordinates of two stations
ITIS0001    42.835835       13.919167
ITIS0002    42.783890       13.905833
__init__(path: str | PathLike = None, estreams_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
Parameters:
  • path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.

  • timestep (str) – This can only be set for datasets which are available at multiple timesteps such as LamaHCE or LamaHIce etc.

  • to_netcdf (bool) – whether the data should be saved in netCDF format or not If set to true, the data will be saved in netCDF format which can take time for the first time it is created. However, it leads to faster I/O operations in subsequent accesses.

  • overwrite (bool) – whether to overwrite existing files or not. If set to True, the data will be redownloaded.

  • verbosity (int) –

    This parameter determines the level of verbosity for logging messages.

    • 0: no message will be printed

    • 1: only important messages will be printed

    • >1: any higher value greater than 1 will result in more verbose output

  • kwargs – Any other keyword arguments for the parent Datasets class

gauge_id_basin_id_map() dict[source]

For example for Portugal, it is guage_id : ‘03J/02H’ basin_id ‘PT000001’ ‘03J/02H’ -> ‘PT000001’

for Slovenia, it is gauge id : 1060 basin_id : SI000001 ‘1060’ -> ‘SI000001’

class aqua_fetch.Japan(path: str | PathLike = None, gsha_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]

Bases: _GSHA

Data of 694 catchments of Japan from river.go.jp website . The meteorological data static catchment features and catchment boundaries taken from GSHA project. Therefore, the number of static features are 35 and dynamic features are 27 and the data is available from 1979-01-01 to 2022-12-31.

__init__(path: str | PathLike = None, gsha_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
Parameters:
  • path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.

  • timestep (str) – This can only be set for datasets which are available at multiple timesteps such as LamaHCE or LamaHIce etc.

  • to_netcdf (bool) – whether the data should be saved in netCDF format or not If set to true, the data will be saved in netCDF format which can take time for the first time it is created. However, it leads to faster I/O operations in subsequent accesses.

  • overwrite (bool) – whether to overwrite existing files or not. If set to True, the data will be redownloaded.

  • verbosity (int) –

    This parameter determines the level of verbosity for logging messages.

    • 0: no message will be printed

    • 1: only important messages will be printed

    • >1: any higher value greater than 1 will result in more verbose output

  • kwargs – Any other keyword arguments for the parent Datasets class

fetch_q(as_dataframe: bool = True) DataFrame[source]

reads daily streamflow for all stations and puts them in a single file named data.csv. If data.csv is already present, then it is read and its contents are returned as dataframe.

property static_map: Dict[str, str]

A dictionary that maps static features to their names in the dataset.

class aqua_fetch.rr.LamaHCE(path=None, *, timestep: str = 'D', data_type: str = 'total_upstrm', to_netcdf: bool = False, overwrite=False, **kwargs)[source]

Bases: _RainfallRunoff

Large-Sample Data for Hydrology and Environmental Sciences for Central Europe (mainly Austria). The dataset is downloaded from zenodo following the work of Klingler et al., 2021 . For total_upstrm data, there are 859 stations with 61 static features and 17 dynamic features. The temporal extent of data is from 1981-01-01 to 2019-12-31.

__init__(path=None, *, timestep: str = 'D', data_type: str = 'total_upstrm', to_netcdf: bool = False, overwrite=False, **kwargs)[source]
Parameters:
  • path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless overwrite is set to True.

  • timestep – possible values are D for daily or H for hourly timestep

  • data_type – possible values are total_upstrm, intermediate_all or intermediate_lowimp

Examples

>>> from aqua_fetch import LamaHCE
# by default the timestep is daily and data_type is 'total_upstrm'
>>> dataset = LamaHCE()
... # get data by station id
>>> _, dynamic = dataset.fetch(stations='826', as_dataframe=True)
>>> df = dynamic['826'] # dynamic is a dictionary of with keys as station names and values as DataFrames
>>> df.shape
(14244, 22)
...
... # get name of all stations as list
>>> stns = dataset.stations()
>>> len(stns)
   859
... # get data of 10 % of stations as dataframe
>>> _, dynamic = dataset.fetch(0.1, as_dataframe=True)
>>> len(dynamic)  # dynamic has data for 10% of stations (85 out of 859)
   85
...
... # dynamic is a dictionary whose values are dataframes of dynamic features
>>> [df.shape for df in dynamic.values()]
    [(14244, 22), (14244, 22), (14244, 22),... (14244, 22), (14244, 22)]
...
... get the data of a single (randomly selected) station
>>> _, dynamic = dataset.fetch(stations=1, as_dataframe=True)
>>> len(dynamic)  # dynamic has data for 1 station
    1
... # get names of available dynamic features
>>> dataset.dynamic_features
... # get only selected dynamic features
>>> _, dynamic = dataset.fetch('826', as_dataframe=True,
...  dynamic_features=['airtemp_C_mean', 'total_et', 'pcp_mm', 'q_cms_obs'])
>>> dynamic['826'].shape
   (14244, 4)
...
... # get names of available static features
>>> dataset.static_features
... # get data of 10 random stations
>>> _, dynamic = dataset.fetch(10, as_dataframe=True)
>>> len(dynamic)  # remember this is a dictionary with values as dataframe
   10
...
# If we get both static and dynamic data
>>> static, dynamic = dataset.fetch(stations='826', static_features="all", as_dataframe=True)
>>> static.shape, len(dynamic), dynamic['826'].shape
((1, 84), 1, (14244, 22))
...
# If we don't set as_dataframe=True and have xarray installed then the returned data will be a xarray Dataset
>>> _, dynamic = dataset.fetch(10)
... type(dynamic)
xarray.core.dataset.Dataset
...
>>> dynamic.dims
FrozenMappingWarningOnValuesAccess({'time': 14244, 'dynamic_features': 22})
...
>>> len(dynamic.data_vars)
10
...
>>> coords = dataset.stn_coords() # returns coordinates of all stations
>>> coords.shape
    (859, 2)
>>> dataset.stn_coords('826')  # returns coordinates of station whose id is 826
    2995596.0       4811891.0
>>> dataset.stn_coords(['826', '819'])  # returns coordinates of two stations
...
# get area of a single station
>>> dataset.area('826')
# get coordinates of two stations
>>> dataset.area(['826', '819'])
...
# if fiona library is installed we can get the boundary as fiona Geometry
>>> dataset.get_boundary('826')
...
# the data_type can also be 'intermediate_all'
>>> dataset = LamaHCE(data_type='intermediate_all')
...
# or 'intermediate_lowimp'
>>> dataset = LamaHCE(data_type='intermediate_lowimp')
>>> len(dataset.stations())
454
...
# the timestep can also be hourly i.e. 'H'
>>> dataset = LamaHCE(timestep='H')
>>> _, dynamic = dataset.fetch(stations='79', as_dataframe=True)
>>> dynamic['79'].shape
(341856, 16)  # there are 16 dynamic features for hourly data
property dyn_fname: str | PathLike

name of the .nc file which contains dynamic features. This file is created during dataset initialization only if to_netcdf is True and xarray is installed and the file does not already exists. The creation of this file can take some time however it leads to faster I/O operations.

property dyn_map

A dictionary that maps dynamic features to their names in the dataset.

property dynamic_features

Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.

Returns:

a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using fetch_dynamic_features().

Return type:

List[str]

property end

end of data

fetch_static_features(stations: str | List[str] = 'all', static_features: str | List[str] = None) DataFrame[source]

static features of LamaHCE

Parameters:
  • stations (str) – name/id of station of which to extract the data

  • static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.

Examples

>>> from aqua_fetch import LamaHCE
>>> dataset = LamaHCE(timestep='D', data_type='total_upstrm')
>>> df = dataset.fetch_static_features('99')  # (1, 61)
...  # get list of all static features
>>> dataset.static_features
>>> dataset.fetch_static_features('99',
>>> static_features=['area_calc', 'elev_mean', 'agr_fra', 'sand_fra'])  # (1, 4)
fetch_stations_features(stations: list, dynamic_features='all', static_features=None, st=None, en=None, as_dataframe: bool = False, **kwargs)[source]

Reads attributes of more than one stations.

This function checks of .nc files exist, then they are not prepared and saved otherwise first nc files are prepared and then the data is read again from nc files. Upon subsequent calls, the nc files are used for reading the data.

Parameters:
  • stations – list of stations for which data is to be fetched.

  • dynamic_features – list of dynamic attributes to be fetched. if ‘all’, then all dynamic attributes will be fetched.

  • static_features – list of static attributes to be fetched. If all, then all static attributes will be fetched. If None, then no static attribute will be fetched.

  • st – start of data to be fetched.

  • en – end of data to be fetched.

  • as_dataframe – whether to return the data as pandas dataframe. default is xarray.Dataset object

  • dict (kwargs) – additional keyword arguments

Returns:

  • tuple – A tuple of static and dynamic features. Static features are always returned as pandas.DataFrame with shape (stations, static features). The index of static features’ DataFrame is the station/gauge ids while the columns are names of the static features. Dynamic features are returned either as xarray.Dataset or a dictionary with keys as station names and values as pandas.DataFrame depending upon whether as_dataframe is True or False and whether the xarray library is installed or not. If dynamic features are xarray.Dataset, then this dataset consists of data_vars equal to the number of stations and station names as xarray.Dataset.variables and time and dynamic_features as dimensions and coordinates.

  • Raises – ValueError, if both dynamic_features and static_features are None

Examples

>>> from aqua_fetch import CAMELS_AUS
>>> dataset = CAMELS_AUS()
... # find out station ids
>>> dataset.stations()
... # get data of selected stations
>>> dataset.fetch_stations_features(['912101A', '912105A', '915011A'],
...  as_dataframe=True)
static_data() DataFrame[source]

returns all static attributes of LamaHCE dataset

property static_features: List[str]

Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.

Returns:

a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using fetch_static_features().

Return type:

List[str]

property static_map: Dict[str, str]

A dictionary that maps static features to their names in the dataset.

stations() list[source]

Names/ids of stations/catchment/gauges or whatever that would be used to index each station in the dataset.

transform_stn_coords(df: DataFrame) DataFrame[source]

transforms coordinates from EPSG:3035 (LAEA Europe) to projected

class aqua_fetch.rr.LamaHIce(path=None, overwrite=False, *, timestep: str = 'D', data_type: str = 'total_upstrm', to_netcdf: bool = False, **kwargs)[source]

Bases: LamaHCE

Daily and hourly hydro-meteorological time series data of river basins of Iceland following Helgason et al., 2024. The total period of dataset is from 1950 to 2021 from 111 catchments for daily and from 1976-2023 for hourly timestep. The average length of daily data is 33 years while for that of hourly it is 11 years. The dataset is available on hydroshare

Examples

>>> from aqua_fetch import LamaHIce
# by default the timestep is daily and data_type is 'total_upstrm'
>>> dataset = LamaHIce()
... # get data by station id
>>> _, dynamic = dataset.fetch(stations='92', as_dataframe=True)
>>> df = dynamic['92'] # dynamic is a dictionary of with keys as station names and values as DataFrames
>>> df.shape
(26298, 36)
...
... # get name of all stations as list
>>> stns = dataset.stations()
>>> len(stns)
   111
... # get data of 10 % of stations as dataframe
>>> _, dynamic = dataset.fetch(0.1, as_dataframe=True)
>>> len(dynamic)  # dynamic has data for 10% of stations (11 out of 111)
   11
...
... # dynamic is a dictionary whose values are dataframes of dynamic features
>>> [df.shape for df in dynamic.values()]
    [(26298, 36), (26298, 36), (26298, 36),... (26298, 36), (26298, 36)]
...
... get the data of a single (randomly selected) station
>>> _, dynamic = dataset.fetch(stations=1, as_dataframe=True)
>>> len(dynamic)  # dynamic has data for 1 station
    1
... # get names of available dynamic features
>>> dataset.dynamic_features
... # get only selected dynamic features
>>> _, dynamic = dataset.fetch('92', as_dataframe=True,
...  dynamic_features=['swe', 'pet_mm', 'pcp_mm', 'q_cms_obs'])
>>> dynamic['92'].shape
   (26298, 4)
...
... # get names of available static features
>>> dataset.static_features
... # get data of 10 random stations
>>> _, dynamic = dataset.fetch(10, as_dataframe=True)
>>> len(dynamic)  # remember this is a dictionary with values as dataframe
   10
...
# If we get both static and dynamic data
>>> static, dynamic = dataset.fetch(stations='92', static_features="all", as_dataframe=True)
>>> static.shape, len(dynamic), dynamic['92'].shape
((1, 154), 1, (26298, 36))
...
# If we don't set as_dataframe=True and have xarray installed then the returned data will be a xarray Dataset
>>> _, dynamic = dataset.fetch(10)
... type(dynamic)
xarray.core.dataset.Dataset
...
>>> dynamic.dims
FrozenMappingWarningOnValuesAccess({'time': 26298, 'dynamic_features': 36})
...
>>> len(dynamic.data_vars)
10
...
>>> coords = dataset.stn_coords() # returns coordinates of all stations
>>> coords.shape
    (111, 2)
>>> dataset.stn_coords('92')  # returns coordinates of station whose id is 92
    571777.0        309737.0
>>> dataset.stn_coords(['92', '5'])  # returns coordinates of two stations
...
# get area of a single station
>>> dataset.area('92')
# get coordinates of two stations
>>> dataset.area(['92', '5'])
...
# if fiona library is installed we can get the boundary as fiona Geometry
>>> dataset.get_boundary('92')
...
# the data_type can also be 'intermediate_all'
>>> dataset = LamaHIce(data_type='intermediate_all')
...
# or 'intermediate_lowimp'
>>> dataset = LamaHIce(data_type='intermediate_lowimp')
>>> len(dataset.stations())
86
...
# the timestep can also be 'H'
>>> dataset = LamaHIce(timestep='H')
>>> _, dynamic = dataset.fetch(stations='79', as_dataframe=True)
>>> dynamic['79'].shape
(412848, 28)  # there are 28 dynamic features for hourly data
__init__(path=None, overwrite=False, *, timestep: str = 'D', data_type: str = 'total_upstrm', to_netcdf: bool = False, **kwargs)[source]
Parameters:
  • path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless overwrite is set to True.

  • timestep – possible values are D for daily or H for hourly timestep

  • data_type – possible values are total_upstrm, intermediate_all or intermediate_lowimp

basin_attributes() DataFrame[source]

returns basin attributes which are catchment attributes, water balance all attributes and water balance filtered attributes

Returns:

a dataframe of shape (111, 104) where 104 are the static catchment/basin attributes

Return type:

pd.DataFrame

catchment_attributes() DataFrame[source]

returns catchment attributes as DataFrame with 90 columns

property dyn_map

A dictionary that maps dynamic features to their names in the dataset.

property end

end of data

fetch_clim_features(stations: str | List[str] = None)[source]

Returns climate time series data for one or more stations

Return type:

pd.DataFrame

fetch_q(stations: str | List[str] = None, qc_flag: int = None)[source]

returns streamflow for one or more stations

Parameters:
  • stations (str/List[str]) – name or names of stations for which streamflow is to be fetched

  • qc_flag (int) – following flags are available 40 Good 80 Fair 100 Estimated 120 suspect 200 unchecked 250 missing

Returns:

a pandas.DataFrame whose index is the time and columns are names of stations For daily timestep, the dataframe has shape of 32630 rows and 111 columns

Return type:

pd.DataFrame

fetch_static_features(stations: str | list = 'all', static_features: str | list = None) DataFrame[source]

fetches static features of one or more stations

fetch_stn_meteo(stn: str, nrows: int = None) DataFrame[source]

returns climate/meteorological time series data for one station

Returns:

a pandas.DataFrame with 23 columns

Return type:

pd.DataFrame

fetch_stn_q(stn: str, qc_flag: int = None) Series[source]

returns streamflow for single station

gauge_attributes() DataFrame[source]

returns gauge attributes from following two files

  • Gauge_attributes.csv

  • hydro_indices_1981_2018.csv

Returns:

a dataframe of shape (111, 28)

Return type:

pd.DataFrame

property gauges_path

returns the path where gauge data files are located

property q_dir

returns the path where q files are located

q_mm(stations: str | List[str] = None) DataFrame[source]

returns streamflow in the units of milimeter per timestep (e.g. mm/day or mm/hour). This is obtained by diving q_cms/area

Parameters:

stations (str/list) – name/names of stations. Default is None, which will return area of all stations

Returns:

a pandas.DataFrame whose indices are time-steps and columns are catchment/station ids.

Return type:

pd.DataFrame

property q_path

path where all q files are located

static_data() DataFrame[source]

returns static data of all stations

property static_map: Dict[str, str]

A dictionary that maps static features to their names in the dataset.

stations() List[str][source]

returns names of stations as a list

transform_stn_coords(df: DataFrame) DataFrame[source]

transforms coordinates from EPSG:3057 (Lambert 1993) to EPSG:4326 (WGS84)

wat_bal_attrs() DataFrame[source]

water balance attributes

wat_bal_unfiltered() DataFrame[source]

water balance attributes from unfiltered q

class aqua_fetch.rr.NPCTRCatchments(path=None, timestep: str = 'Hourly', qflag=['AV', 'EV'], **kwargs)[source]

Bases: _RainfallRunoff

High-resolution streamflow and weather data (2013–2019) for seven small coastal watersheds in the northeast Pacific coastal temperate rainforest, Canada following Korver et al., 2022 . The data include 8 dynamic features at hourly and 5 min timestep and 14 static features. The dynamic features include streamflow, precipitation, temperature, relative humidity, wind speed, wind direction, and solar radiation.

Examples

>>> from aqua_fetch import NPCTRCatchments
>>> ds = NPCTRCatchments()
>>> ds.stations
['626', '693', '703', '708', '819', '844', '1015']
>>> len(ds.static_features)
12
>>> area = ds.area()
>>> area.shape
(7,)
>>> coords = ds.stn_coords()
>>> coords.shape
(7, 2)
__init__(path=None, timestep: str = 'Hourly', qflag=['AV', 'EV'], **kwargs)[source]
Parameters:
  • path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.

  • timestep (str) – This can only be set for datasets which are available at multiple timesteps such as LamaHCE or LamaHIce etc.

  • to_netcdf (bool) – whether the data should be saved in netCDF format or not If set to true, the data will be saved in netCDF format which can take time for the first time it is created. However, it leads to faster I/O operations in subsequent accesses.

  • overwrite (bool) – whether to overwrite existing files or not. If set to True, the data will be redownloaded.

  • verbosity (int) –

    This parameter determines the level of verbosity for logging messages.

    • 0: no message will be printed

    • 1: only important messages will be printed

    • >1: any higher value greater than 1 will result in more verbose output

  • kwargs – Any other keyword arguments for the parent Datasets class

all_stn_coords() DataFrame[source]

Using coordinate information of Stream Sensor Nodes, assuming that stream sensors would be closer to the stream gauge. The values are taken from Table A1 of paper

property dynamic_features: List[str]

Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.

Returns:

a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using fetch_dynamic_features().

Return type:

List[str]

property end

end of data

fetch_static_features(stations: str | list = 'all', static_features: str | list = 'all') DataFrame[source]

Fetches all or selected static features of one or more stations.

Parameters:
  • stations (str) – name/id of station of which to extract the data

  • static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.

Returns:

a pandas.DataFrame

Return type:

pd.DataFrame

Examples

>>> from aqua_fetch import NPCTRCatchments
>>> dataset = NPCTRCatchments()
>>> dataset.fetch_static_features('626')
>>> dataset.static_features
>>> dataset.fetch_static_features('626',
... static_features=['area_km2', 'elev_catch_m', 'slope_%'])
read_pcp()[source]

Examples

>>> ds = NPCTRCatchments()
>>> pcp = ds.read_pcp()
>>> pcp.shape
(849472, 5)
>>> pcp['Site'].nunique()
15
pcp.index[0], pcp.index[-1]
(Timestamp('2013-09-09 21:00:00'), Timestamp('2019-10-01 00:00:00'))
# A is accepted and E is estimated
>>> pcp['Qflags'].unique()
[nan, 'AV', 'EV', 'EV: Sensor malfunction due to wolf bite']
>>> ds = NPCTRCatchments(timestep='5min')
>>> pcp = ds.read_pcp()
>>> pcp.shape
(8712098, 5)
>>> pcp['Site'].nunique()
14
>>> pcp.index[0], pcp.index[-1]
(Timestamp('2013-09-05 00:00:00'), Timestamp('2019-10-01 00:00:00'))
read_rel_hum()[source]

Examples

>>> ds = NPCTRCatchments()
>>> rh = ds.read_rel_hum()
>>> rh.shape
(849472, 4)
>>> rh['Site'].nunique()
15
>>> rh.index[0], rh.index[-1]
(Timestamp('2013-09-10 00:00:00'), NaT)
... getting data for 5min timestep
>>> ds = NPCTRCatchments(timestep='5min')
>>> rh_5m = ds.read_rel_hum()
>>> rh_5m.shape
(8281767, 3)
>>> rh_5m['Site'].nunique()
13
>>> rh_5m.index[0], rh.index[-1]
(Timestamp('2013-09-10 00:00:00'), NaT)
>>> rh_5m['Qlevel'].unique()
['1', '2', '3', nan]
read_snow_depth()[source]

Examples

>>> from aqua_fetch import NPCTRCatchments
>>> ds = NPCTRCatchments()
>>> snowdepth = ds.read_snow_depth()
>>> snowdepth.shape
(105016, 15)
... get 5min timestep data
>>> ds = NPCTRCatchments(timestep='5min')
>>> snowdepth = ds.read_snow_depth()
>>> snowdepth.shape
(105016, 15)
read_sol_rad()[source]

Solar radiation is common among all stations so no ‘Site’ column is present in the dataframe.

Examples

>>> from aqua_fetch import NPCTRCatchments
>>> ds = NPCTRCatchments()
>>> solrad = ds.read_sol_rad()
>>> solrad.shape
(53072, 3)
>>> solrad['Qflags_SolarRad'].unique()
['AV', 'EV']
>>> ds = NPCTRCatchments(timestep='5min')
>>> solrad = ds.read_sol_rad()
>>> solrad.shape
(637108, 3)
>>> solrad['SolarRadQ_flags'].nunique()
4
read_temp()[source]

Examples

>>> from aqua_fetch import NPCTRCatchments
>>> ds = NPCTRCatchments()
>>> temp = ds.read_temp()
>>> temp.shape
(745836, 4)
>>> temp['Site'].nunique()
14
>>> temp['Qflag'].unique()
[nan, 'AV', 'EV']
>>> temp['Qlevel'].unique()
[nan,  2.,  3.,  1.]
>>> ds = NPCTRCatchments(timestep='5min')
>>> temp_5m = ds.read_temp()
>>> temp_5m.shape
(8957388, 3)
>>> temp_5m['Site'].nunique()
14
>>> temp_5m['Qlevel'].unique()
[1, 2]
>>> temp_5m['Qflags'].nunique()
5344
read_wind_dir()[source]
>>> from aqua_fetch import NPCTRCatchments
>>> ds = NPCTRCatchments()
>>> winddir = ds.read_wind_dir()
>>> winddir.shape
(371651, 4)
>>> winddir['Site'].nunique()
7
>>> winddir['Site'].unique()
['WSN626', 'SSN693', 'WSN693703', 'WSN703708', 'WSN8191015',
‘BuxtonEast’, ‘RefStn’]

… getting data for 5min timestep >>> ds = NPCTRCatchments(timestep=’5min’) >>> winddir = ds.read_wind_dir() >>> winddir.shape (5096864, 4) >>> winddir[‘Site’].nunique() 8 >>> winddir[‘Site’].unique() [‘WSN626’, ‘SSN693’, ‘WSN693703’, ‘WSN703708’, ‘WSN8191015’,

‘BuxtonEast’, ‘Hecate’, ‘RefStn’]

read_wind_speed()[source]

Examples

>>> from aqua_fetch import NPCTRCatchments
>>> ds = NPCTRCatchments()
>>> ws = ds.read_wind_speed()
>>> ws.shape
(424744, 4)
>>> ws['Site'].nunique()
8
>>> ws['Site'].unique()
['WSN626', 'SSN693', 'WSN693703', 'WSN703708', 'WSN8191015', 'BuxtonEast', 'Hecate', 'RefStn']
>>> ws.index[0], ws.index[-1]
(Timestamp('2013-09-09 20:00:00'), Timestamp('2019-10-01 00:00:00'))
... getting data for 5min timestep
>>> ds = NPCTRCatchments(timestep='5min')
>>> ws = ds.read_wind_speed()
>>> ws.shape
(5096864, 4)
>>> ws['Site'].nunique()
8
property static_features: List[str]

Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.

Returns:

a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using fetch_static_features().

Return type:

List[str]

stations() List[str][source]

Names/ids of stations/catchment/gauges or whatever that would be used to index each station in the dataset.

stn_coords(stations='all', sensor='SSN') DataFrame[source]

By default uses coordinate information of Stream Sensor Nodes, assuming that stream sensors would be closer to the stream gauge. The values are taken from Table A1 of paper

class aqua_fetch.rr.Poland(path: str | PathLike = None, estreams_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]

Bases: _EStreams

Data of 1287 catchments of Poland. The observed streamflow data is downloaded from https://danepubliczne.imgw.pl . The meteorological data, static catchment features and catchment boundaries are taken from aqua_fetch.EStreams follwoing the works of Nascimento et al., 2024 . Therefore, the number of static features are 214 and dynamic features are 10 and the data is available from 1951-01-01 to 2023-06-30.

Examples

>>> from aqua_fetch import Poland
>>> dataset = Poland()
>>> _, data = dataset.fetch(0.1)  # the returned data will be a xarray Dataset
>>> type(data)
    xarray.core.dataset.Dataset
>>> data.dims
FrozenMappingWarningOnValuesAccess({'time': 26844, 'dynamic_features': 10})
>>> len(data.data_vars)  # number of stations for which data has been fetched
    128
>>> _, data = dataset.fetch(stations=1)  # get data of only one random station
# get name of all stations as list
>>> stns = dataset.stations()
>>> len(stns)
1287
# get data by station id
>>> _, data = dataset.fetch(stations='PL000001')
# get names of available dynamic features
>>> dataset.dynamic_features
# get only selected dynamic features
>>> _, data = dataset.fetch(1,
... dynamic_features=['pcp_mm', 'rh_%', 'airtemp_C_mean', 'pet_mm', 'q_cms_obs'])
# get names of available static features
>>> dataset.static_features
# get data of 10 random stations
>>> _, data = dataset.fetch(10)
>>> len(data.data_vars)
10
# If we want to get both static and dynamic data
>>> static, dynamic = dataset.fetch(stations='PL000001', static_features="all")
>>> static.shape, len(dynamic.data_vars)
((1, 214), 1)
>>> coords = dataset.stn_coords() # returns coordinates of all stations
>>> coords.shape
    (1287, 2)
>>> dataset.stn_coords('PL000001')  # returns coordinates of station whose id is PL000001
    49.921848       18.327913
>>> dataset.stn_coords(['PL000001', 'PL000002'])  # returns coordinates of two stations
PL000001    49.921848       18.327913
PL000002    49.954769       18.326323
__init__(path: str | PathLike = None, estreams_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
Parameters:
  • path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.

  • timestep (str) – This can only be set for datasets which are available at multiple timesteps such as LamaHCE or LamaHIce etc.

  • to_netcdf (bool) – whether the data should be saved in netCDF format or not If set to true, the data will be saved in netCDF format which can take time for the first time it is created. However, it leads to faster I/O operations in subsequent accesses.

  • overwrite (bool) – whether to overwrite existing files or not. If set to True, the data will be redownloaded.

  • verbosity (int) –

    This parameter determines the level of verbosity for logging messages.

    • 0: no message will be printed

    • 1: only important messages will be printed

    • >1: any higher value greater than 1 will result in more verbose output

  • kwargs – Any other keyword arguments for the parent Datasets class

property csv_files_dir: str

path where csv (obtained after extracting zip files) files will be stored

gauge_id_basin_id_map() dict[source]

For example for Portugal, it is guage_id : ‘03J/02H’ basin_id ‘PT000001’ ‘03J/02H’ -> ‘PT000001’

for Slovenia, it is gauge id : 1060 basin_id : SI000001 ‘1060’ -> ‘SI000001’

property zip_files_dir: str

path where zip files will be stored

class aqua_fetch.rr.Portugal(path: str | PathLike = None, estreams_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]

Bases: _EStreams

Data of 280 catchments of Portugal. The observed streamflow data is downloaded from https://snirh.apambiente.pt . The meteorological data, static catchment features and catchment boundaries for the 280 catchments are taken from aqua_fetch.EStreams follwoing the works of Nascimento et al., 2024 project. Therefore, the number of static features are 214 and dynamic features are 10 and the data is available from 1972-01-01 to 2022-12-31.

Examples

>>> from aqua_fetch import Portugal
>>> dataset = Portugal()
>>> _, data = dataset.fetch(0.1)  # the returned data will be a xarray Dataset
>>> type(data)
    xarray.core.dataset.Dataset
>>> data.dims
FrozenMappingWarningOnValuesAccess({'time': 18628, 'dynamic_features': 10})
>>> len(data.data_vars)  # number of stations for which data has been fetched
    28
>>> _, data = dataset.fetch(stations=1)  # get data of only one random station
# get name of all stations as list
>>> stns = dataset.stations()
>>> len(stns)
280
# get data by station id
>>> _, data = dataset.fetch(stations='PT000001')
# get names of available dynamic features
>>> dataset.dynamic_features
# get only selected dynamic features
>>> _, data = dataset.fetch(1,
... dynamic_features=['pcp_mm', 'rh_%', 'airtemp_C_mean', 'pet_mm', 'q_cms_obs'])
# get names of available static features
>>> dataset.static_features
# get data of 10 random stations
>>> _, data = dataset.fetch(10)
>>> len(data.data_vars)
10
# If we want to get both static and dynamic data
>>> static, dynamic = dataset.fetch(stations='PT000001', static_features="all")
>>> static.shape, len(dynamic.data_vars)
((1, 214), 1)
>>> coords = dataset.stn_coords() # returns coordinates of all stations
>>> coords.shape
    (280, 2)
>>> dataset.stn_coords('PT000001')  # returns coordinates of station whose id is PT000001
    41.794998       -7.969
>>> dataset.stn_coords(['PT000001', 'PT000002'])  # returns coordinates of two stations
PT000001    41.794998       -7.969
PT000002    39.679001       -8.437
__init__(path: str | PathLike = None, estreams_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
Parameters:
  • path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.

  • timestep (str) – This can only be set for datasets which are available at multiple timesteps such as LamaHCE or LamaHIce etc.

  • to_netcdf (bool) – whether the data should be saved in netCDF format or not If set to true, the data will be saved in netCDF format which can take time for the first time it is created. However, it leads to faster I/O operations in subsequent accesses.

  • overwrite (bool) – whether to overwrite existing files or not. If set to True, the data will be redownloaded.

  • verbosity (int) –

    This parameter determines the level of verbosity for logging messages.

    • 0: no message will be printed

    • 1: only important messages will be printed

    • >1: any higher value greater than 1 will result in more verbose output

  • kwargs – Any other keyword arguments for the parent Datasets class

download_q_data_parallel(cpus: int = 4)[source]

downloads q data in parallel

download_q_data_seq()[source]

downloads q data sequentially

property end: Timestamp

end of data

fetch_q(as_dataframe: bool = True)[source]

returns the streamflow data of Portugal as xarray.Dataset or pandas.DataFrame

Returns:

  • xarray.Dataset or pandas.DataFrame. If as_dataframe is True, returns pandas.DataFrame

  • with columns as station codes and index as time. If as_dataframe is False, returns

  • xarray.Dataset with station codes as variables and time as dimension.

gauge_id_basin_id_map() dict[source]

For example for Portugal, it is guage_id : ‘03J/02H’ basin_id ‘PT000001’ ‘03J/02H’ -> ‘PT000001’

for Slovenia, it is gauge id : 1060 basin_id : SI000001 ‘1060’ -> ‘SI000001’

class aqua_fetch.RRLuleaSweden(path=None, **kwargs)[source]

Bases: Datasets

Rainfall runoff data for an urban catchment from 2016-2019 following the work of Broekhuizen et al., 2020 .

__init__(path=None, **kwargs)[source]
Parameters:
  • name – str (default=None) name of dataset

  • units – str, (default=None) the unit system being used

  • path – str (default=None) path where the data is available (manually downloaded). If None, it will be downloaded

  • processes – int number of processes to use for parallel processing

  • verbosity – int determines the amount of information to be printed

  • remove_zip – bool whether to remove the zip files after unz

fetch(st: str | int | DatetimeIndex = None, en: str | int | DatetimeIndex = None)[source]

fetches rainfall runoff data

Parameters:
  • st (optional) – start of data to be fetched. By default the data starts from 2016-06-16 20:50:00

  • en (optional) – end of data to be fetched. By default the end is 2019-09-15 18:41

fetch_flow(st: str | int | DatetimeIndex = None, en: str | int | DatetimeIndex = None) DataFrame[source]

fetches flow data

Parameters:
  • st (optional) – start of data to be fetched. By default the data starts from 2016-06-16 20:50:00

  • en (optional) – end of data to be fetched. By default the end is 2019-09-15 18:35:00

Returns:

a dataframe of shape (37_618, 3) where the columns are velocity, level and flow rate

Return type:

pd.DataFrame

Examples

>>> from aqua_fetch import RRLuleaSweden
>>> dataset = RRLuleaSweden()
>>> flow = dataset.fetch_flow()
>>> flow.shape
(37618, 3)
fetch_pcp(st: str | int | DatetimeIndex = None, en: str | int | DatetimeIndex = None) DataFrame[source]

fetches precipitation data

Parameters:
  • st (optional) – start of data to be fetched. By default the data starts from 2016-06-16 19:48:00

  • en (optional) – end of data to be fetched. By default the end is 2019-10-26 23:59:00

Returns:

a dataframe of shape (967_080, 1)

Return type:

pd.DataFrame

Examples

>>> from aqua_fetch import RRLuleaSweden
>>> dataset = RRLuleaSweden()
>>> pcp = dataset.fetch_pcp()
>>> pcp.shape
(967080, 1)
class aqua_fetch.rr.ShyftNorway(*args, **kwargs)[source]

Bases: _RainfallRunoff

The dataset contains observed streamflow data from 111 Norwegian catchments, as well as catchment boundaries and some catchment specific static data. For more information on this data see Silantyeva et al., 2025. Note that currently only streamflow data is included, other dynamic features may be added in future releases. Also note that observed streamflow data may slightly differ from the data from seriekart.nve.no since data at seriekart is updated regularly based upon updated rating curves.

Examples

>>> from aqua_fetch import ShyftNorway
>>> dataset = ShyftNorway()
... # get data by station id
>>> _, dynamic = dataset.fetch(stations='2.11.0', as_dataframe=True)
>>> df = dynamic['2.11.0'] # dynamic is a dictionary of with keys as station names and values as DataFrames
>>> df.shape
(23376, 1)
...
... # get name of all stations as list
>>> stns = dataset.stations()
>>> len(stns)
   111
... # get data of 10 % of stations as dataframe
>>> _, dynamic = dataset.fetch(0.1, as_dataframe=True)
>>> len(dynamic)  # dynamic has data for 10% of stations (11 out of 111)
   11
...
... # dynamic is a dictionary whose values are dataframes of dynamic features
>>> [df.shape for df in dynamic.values()]
    [(23376, 1), (23376, 1), (23376, 1),... (23376, 1), (23376, 1)]
...
... get the data of a single (randomly selected) station
>>> _, dynamic = dataset.fetch(stations=1, as_dataframe=True)
>>> len(dynamic)  # dynamic has data for 1 station
    1
... # get names of available dynamic features
>>> dataset.dynamic_features
    ['observed_streamflow_cms']
... # get names of available static features
>>> dataset.static_features
... # get data of 10 random stations
>>> _, dynamic = dataset.fetch(10, as_dataframe=True)
>>> len(dynamic)  # remember this is a dictionary with values as dataframe
   10
...
# If we get both static and dynamic data
>>> static, dynamic = dataset.fetch(stations='2.11.0', static_features="all", as_dataframe=True)
>>> static.shape, len(dynamic), dynamic['2.11.0'].shape
((1, 10), 1, (23376, 1))
...
# If we don't set as_dataframe=True and have xarray installed then the returned data will be a xarray Dataset
>>> _, dynamic = dataset.fetch(10)
... type(dynamic)
xarray.core.dataset.Dataset
...
>>> dynamic.dims
FrozenMappingWarningOnValuesAccess({'time': 23376, 'dynamic_features': 1})
...
>>> len(dynamic.data_vars)
10
# get area of a single station
>>> dataset.area('2.11.0')
# get coordinates of two stations
>>> dataset.area(['2.11.0', '2.28.0'])
...
>>> dataset.get_boundary('2.11.0')
__init__(*args, **kwargs)[source]
Parameters:
  • path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.

  • timestep (str) – This can only be set for datasets which are available at multiple timesteps such as LamaHCE or LamaHIce etc.

  • to_netcdf (bool) – whether the data should be saved in netCDF format or not If set to true, the data will be saved in netCDF format which can take time for the first time it is created. However, it leads to faster I/O operations in subsequent accesses.

  • overwrite (bool) – whether to overwrite existing files or not. If set to True, the data will be redownloaded.

  • verbosity (int) –

    This parameter determines the level of verbosity for logging messages.

    • 0: no message will be printed

    • 1: only important messages will be printed

    • >1: any higher value greater than 1 will result in more verbose output

  • kwargs – Any other keyword arguments for the parent Datasets class

property boundary_id_map

Name of the attribute in the boundary (shapefile/.gpkg) file that will be used to map the catchment/station id to the geometry of the catchment/station. This is used to create the boundary id map. if not given, then the first attribute in the boundary file will be used.

fetch_q(as_dataframe: bool = True)[source]

returns the streamflow data of Norway as xarray.Dataset or pandas.DataFrame

Returns:

  • xarray.Dataset or pandas.DataFrame. If as_dataframe is True, returns pandas.DataFrame

  • with columns as station codes and index as time. If as_dataframe is False, returns

  • xarray.Dataset with station codes as variables and time as dimension.

stations()[source]

Names/ids of stations/catchment/gauges or whatever that would be used to index each station in the dataset.

class aqua_fetch.rr.Simbi(path: str = None, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]

Bases: _RainfallRunoff

monthly rainfall from 1905 - 2005, daily rainfall from 1920-1940, 70 daily streamflow series, and 23 monthly temperature series for 24 catchments of Haiti

Data is obtained from Bathelemy et al., 2023 while related publication is Bathelemy et al., 2024

Examples

>>> from aqua_fetch import Simbi
>>> simbi = Simbi()
>>> len(simbi.stations())
24
__init__(path: str = None, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]
Parameters:
  • path – path where the Simbi dataset has been downloaded. This path must contain five zip files and one xlsx file. If None, then the data will be downloaded.

  • to_netcdf

all_stations() List[str][source]

Not all stations have all data.

aquifer_class() DataFrame[source]

Read the aquifer class values.

property boundary_id_map: str

Name of the attribute in the boundary (shapefile/.gpkg) file that will be used to map the catchment/station id to the geometry of the catchment/station. This is used to create the boundary id map. if not given, then the first attribute in the boundary file will be used.

boundary_stations() List[str][source]

Returns names/IDs of 24 stations with boundary data.

carb_sed_magma() DataFrame[source]

Read the carbonated sedimentary and magmatic values.

clim_sigs() DataFrame[source]

Read the climate signatures.

daily_bsi() DataFrame[source]

Read the daily BSI values.

daily_clim_sigs() DataFrame[source]

Read the daily climate signatures.

daily_high_q_dur() DataFrame[source]

Read the daily high flow values.

daily_high_q_freq() DataFrame[source]

Read the daily flow frequency values.

daily_low_q_dur() DataFrame[source]

Read the daily low flow values.

daily_low_q_freq() DataFrame[source]

Read the daily low flow frequency values.

daily_q_mean() DataFrame[source]

Read the daily mean flow values.

daily_quantile_5() DataFrame[source]

Read the daily 5th quantile flow values.

daily_quantile_95() DataFrame[source]

Read the daily 95th quantile flow values.

property dyn_map: Dict[str, str]

A dictionary that maps dynamic features to their names in the dataset.

property dynamic_features

Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.

Returns:

a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using fetch_dynamic_features().

Return type:

List[str]

property end

end of data

hypsometric_curve() DataFrame[source]

Read the hyposometric curve values.

monthly_QMNA5() DataFrame[source]

Read the monthly QMNA5 flow values.

monthly_QMXA10() DataFrame[source]

Read the monthly QMNA10 flow values.

monthly_aridity_runoff() DataFrame[source]

Read the monthly aridity runoff values.

monthly_average() DataFrame[source]

Read the monthly average flow values.

monthly_clim_sigs() DataFrame[source]

Read the monthly climate signatures.

monthly_quantile_5() DataFrame[source]

Read the monthly 5th quantile flow values.

monthly_quantile_95() DataFrame[source]

Read the monthly 95th quantile flow values.

other_attributes() DataFrame[source]

Read the other attributes.

pcp_stations() List[str][source]

Returns IDs of 74 stations with daily rainfall data.

percent_geology() DataFrame[source]

Read the geology percentage values.

percent_lc_95() DataFrame[source]

Read the 95th land cover percentage values.

percent_lc_98() DataFrame[source]

Read the land cover percentage values.

q_stations() List[str][source]

Returns names/IDs of 70 stations with daily streamflow data.

read_stn_pcp(stn: str) DataFrame[source]

Read the daily rainfall data for a station.

read_stn_q(stn: str) DataFrame[source]

Read the daily streamflow data for a station.

read_stn_temp(stn: str) DataFrame[source]

Read the daily temperature data for a station.

static_data_stations() List[str][source]

Returns names/IDs of 24 stations with static data.

property static_features

Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.

Returns:

a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using fetch_static_features().

Return type:

List[str]

property static_map: Dict[str, str]

A dictionary that maps static features to their names in the dataset.

stations() List[str][source]

Returns names/IDs of 24 stations which have all (boundary, streamflow, static features) data. Although there are 70 stations which have daily streamflow data, only 24 of them have static + boundary data.

stream_density() DataFrame[source]

Read the stream density values.

temp_stations() List[str][source]

Returns names/IDs of 21 stations with daily temperature data.

topography() DataFrame[source]

Read the topography values.

class aqua_fetch.rr.Slovenia(path: str | PathLike = None, estreams_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]

Bases: _EStreams

Data of 117 catchments of Slovenia. The observed streamflow data is downloaded from https://vode.arso.gov.si . The meteorological data, static catchment features and catchment boundaries for the 117 catchments are taken from aqua_fetch.EStreams follwoing the works of Nascimento et al., 2024 project. Therefore, the number of static features are 214 and dynamic features are 10 and the data is available from 1950-01-01 to 2023-12-31 .

Examples

>>> from aqua_fetch import Slovenia
>>> dataset = Slovenia()
>>> _, data = dataset.fetch(0.1)  # the returned data will be a xarray Dataset
>>> type(data)
    xarray.core.dataset.Dataset
>>> data.dims
FrozenMappingWarningOnValuesAccess({'time': 27028, 'dynamic_features': 10})
>>> len(data.data_vars)
    10
>>> _, df = dataset.fetch(stations=1)  # get data of only one random station
# get name of all stations as list
>>> stns = dataset.stations()
>>> len(stns)
117
# get data by station id
>>> _, data = dataset.fetch(stations='SI000090')
# get names of available dynamic features
>>> dataset.dynamic_features
# get only selected dynamic features
>>> _, data = dataset.fetch(1,
... dynamic_features=['pcp_mm', 'rh_%', 'airtemp_C_mean', 'pet_mm', 'q_cms_obs'])
# get names of available static features
>>> dataset.static_features
# get data of 10 random stations
>>> _, data = dataset.fetch(10)
# If we want to get both static and dynamic data
>>> static, dynamic = dataset.fetch(stations='SI000090', static_features="all")
>>> static.shape, len(dynamic.data_vars)
((1, 214), 1)
>>> coords = dataset.stn_coords() # returns coordinates of all stations
>>> coords.shape
    (117, 2)
>>> dataset.stn_coords('SI000090')  # returns coordinates of station whose id is SI000090
    45.865093       15.460184
>>> dataset.stn_coords(['SI000090', 'SI000002'])  # returns coordinates of two stations
SI000090    45.865093       15.460184
SI000002    46.648823       16.059244
__init__(path: str | PathLike = None, estreams_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
Parameters:
  • path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.

  • timestep (str) – This can only be set for datasets which are available at multiple timesteps such as LamaHCE or LamaHIce etc.

  • to_netcdf (bool) – whether the data should be saved in netCDF format or not If set to true, the data will be saved in netCDF format which can take time for the first time it is created. However, it leads to faster I/O operations in subsequent accesses.

  • overwrite (bool) – whether to overwrite existing files or not. If set to True, the data will be redownloaded.

  • verbosity (int) –

    This parameter determines the level of verbosity for logging messages.

    • 0: no message will be printed

    • 1: only important messages will be printed

    • >1: any higher value greater than 1 will result in more verbose output

  • kwargs – Any other keyword arguments for the parent Datasets class

property end: Timestamp

end of data

fetch_q(as_dataframe: bool = True)[source]

returns the streamflow data of Portugal as xarray.Dataset or pandas.DataFrame

Returns:

  • xarray.Dataset or pandas.DataFrame. If as_dataframe is True, returns pandas.DataFrame

  • with columns as station codes and index as time. If as_dataframe is False, returns

  • xarray.Dataset with station codes as variables and time as dimension.

class aqua_fetch.rr.Spain(path: str | PathLike = None, gsha_path: str | PathLike = None, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]

Bases: _GSHA

Data of 889 catchments of Spain from ceh-es website. The meteorological data static catchment features and catchment boundaries taken from GSHA project. Therefore, the number of static features are 35 and dynamic features are 27 and the data is available from 1979-01-01 to 2020-09-30.

__init__(path: str | PathLike = None, gsha_path: str | PathLike = None, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]
Parameters:
  • path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.

  • timestep (str) – This can only be set for datasets which are available at multiple timesteps such as LamaHCE or LamaHIce etc.

  • to_netcdf (bool) – whether the data should be saved in netCDF format or not If set to true, the data will be saved in netCDF format which can take time for the first time it is created. However, it leads to faster I/O operations in subsequent accesses.

  • overwrite (bool) – whether to overwrite existing files or not. If set to True, the data will be redownloaded.

  • verbosity (int) –

    This parameter determines the level of verbosity for logging messages.

    • 0: no message will be printed

    • 1: only important messages will be printed

    • >1: any higher value greater than 1 will result in more verbose output

  • kwargs – Any other keyword arguments for the parent Datasets class

daily_q_all_areas() DataFrame[source]

Daily data of gauging stations in river from all areas

Retuns

16_806_305 rows x 3

daily_q_area(area: str) DataFrame[source]

Reads Daily data of gauging stations in river which is in afliq.csv file

property end: Timestamp

end of data

fetch_q(as_dataframe: bool = True)[source]

returns daily q of all stations

Returns:

a pandas.DataFrame of shape (39721, 1447)

Return type:

pd.DataFrame

property static_map: Dict[str, str]

A dictionary that maps static features to their names in the dataset.

class aqua_fetch.Thailand(path: str | PathLike = None, gsha_path: str | PathLike = None, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]

Bases: _GSHA

Data of 73 catchments of Thailand from RID project . The meteorological data static catchment features and catchment boundaries taken from GSHA project. Therefore, the number of static features are 35 and dynamic features are 27 and the data is available from 1980-01-01 to 1999-12-31.

__init__(path: str | PathLike = None, gsha_path: str | PathLike = None, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]
Parameters:
  • path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.

  • timestep (str) – This can only be set for datasets which are available at multiple timesteps such as LamaHCE or LamaHIce etc.

  • to_netcdf (bool) – whether the data should be saved in netCDF format or not If set to true, the data will be saved in netCDF format which can take time for the first time it is created. However, it leads to faster I/O operations in subsequent accesses.

  • overwrite (bool) – whether to overwrite existing files or not. If set to True, the data will be redownloaded.

  • verbosity (int) –

    This parameter determines the level of verbosity for logging messages.

    • 0: no message will be printed

    • 1: only important messages will be printed

    • >1: any higher value greater than 1 will result in more verbose output

  • kwargs – Any other keyword arguments for the parent Datasets class

property end: Timestamp

end of data

fetch_q(as_dataframe: bool = True)[source]

reads q

property static_map: Dict[str, str]

A dictionary that maps static features to their names in the dataset.

class aqua_fetch.USGS(path: str | PathLike = None, hysets_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]

Bases: _RainfallRunoff

This class handles the hydrometeorological data for the USA. The daily and hourly discharge data is downloaded from usgs/nwis website . The data is optionally stored in a netCDF file if xarray is available. Currently the data is downloaded for only those sites/catchments that are in the HYSETS database. This is because the catchment boundaries are taken from HYSETS database using aqua_fetch.HYSETS.

For hourly timestep, “iv” service is used to download the instantaneous data which is then resampled to hourly data. Data with only A, [92], A, [91], A, [93], A, e, A flags is used. For daily streamflow, “dv” service is used to download the data. In this case, the data with only A and A, e flags is used.

Examples

>>> from aqua_fetch import USGS
>>> dataset = USGS()
... # get data by station id
>>> _, dynamic = dataset.fetch(stations='01010000', as_dataframe=True)
>>> df = dynamic['01010000'] # dynamic is a dictionary of with keys as station names and values as DataFrames
>>> df.shape
(27028, 20)
...
... # get name of all stations as list
>>> stns = dataset.stations()
>>> len(stns)
   12004
... # get data of 10 % of stations as dataframe
>>> _, dynamic = dataset.fetch(0.1, as_dataframe=True)
>>> len(dynamic)  # dynamic has data for 10% of stations (1200 out of 12004)
   1200
...
... # dynamic is a dictionary whose values are dataframes of dynamic features
>>> [df.shape for df in dynamic.values()]
    [(27028, 20), (27028, 20), (27028, 20),... (27028, 20), (27028, 20)]
...
... get the data of a single (randomly selected) station
>>> _, dynamic = dataset.fetch(stations=1, as_dataframe=True)
>>> len(dynamic)  # dynamic has data for 1 station
    1
... # get names of available dynamic features
>>> dataset.dynamic_features
... # get only selected dynamic features
>>> _, dynamic = dataset.fetch('01010000', as_dataframe=True,
...  dynamic_features=['pcp_mm', 'snowmelt_mm', 'airtemp_C_2m_min', 'swe_mm', 'q_cms_obs'])
>>> dynamic['01010000'].shape
   (27028, 4)
...
... # get names of available static features
>>> dataset.static_features
... # get data of 10 random stations
>>> _, dynamic = dataset.fetch(10, as_dataframe=True)
>>> len(dynamic)  # remember this is a dictionary with values as dataframe
   10
...
# If we get both static and dynamic data
>>> static, dynamic = dataset.fetch(stations='01010000', static_features="all", as_dataframe=True)
>>> static.shape, len(dynamic), dynamic['01010000'].shape
((1, 29), 1, (27028, 20))
...
# If we don't set as_dataframe=True and have xarray installed then the returned data will be a xarray Dataset
>>> _, dynamic = dataset.fetch(10)
... type(dynamic)
xarray.core.dataset.Dataset
...
>>> dynamic.dims
FrozenMappingWarningOnValuesAccess({'time': 27028, 'dynamic_features': 20})
...
>>> len(dynamic.data_vars)
10
...
>>> coords = dataset.stn_coords() # returns coordinates of all stations
>>> coords.shape
    (671, 2)
>>> dataset.stn_coords('01010000')  # returns coordinates of station whose id is 01010000
    -69.715556      46.700556
>>> dataset.stn_coords(['01010000', '01010070'])  # returns coordinates of two stations
...
# get area of a single station
>>> dataset.area('01010000')
# get coordinates of two stations
>>> dataset.area(['01010000', '01010070'])
...
# if fiona library is installed we can get the boundary as fiona Geometry
>>> dataset.get_boundary('01010000')
__init__(path: str | PathLike = None, hysets_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
Parameters:

path (str) – Path to store the data

area(stations: str | List[str] = 'all') Series[source]

Returns area_gov (Km2) of all catchments as pandas.Series

Parameters:

stations (str/list) – name/names of stations. Default is None, which will return area of all stations

Returns:

a pandas.Series whose indices are catchment ids and values are areas of corresponding catchments.

Return type:

pd.Series

Examples

>>> from aqua_fetch import USGS
>>> dataset = USGS()
>>> dataset.area()  # returns area of all stations
>>> dataset.area('912101A')  # returns area of station whose id is 912101A
>>> dataset.area(['912101A', '12388200'])  # returns area of two stations
property dynamic_features: List[str]

Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.

Returns:

a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using fetch_dynamic_features().

Return type:

List[str]

property end: str

end of data

fetch_static_features(stations: str | List[str] = 'all', static_features: str | List[str] = 'all') DataFrame[source]

returns static atttributes of one or multiple stations

Parameters:
  • stations (str) – name/id of station of which to extract the data

  • static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.

Examples

>>> from aqua_fetch import USGS
>>> dataset = USGS()
get the names of stations
>>> stns = dataset.stations()
>>> len(stns)
    12004
get all static data of all stations
>>> static_data = dataset.fetch_static_features(stns)
>>> static_data.shape
   (12004, 27)
get static data of one station only
>>> static_data = dataset.fetch_static_features('01010070')
>>> static_data.shape
   (1, 27)
get the names of static features
>>> dataset.static_features
get only selected features of all stations
>>> static_data = dataset.fetch_static_features(stns, ['area_km2', 'Elevation_m'])
>>> static_data.shape
   (12004, 2)
fetch_stations_features(stations: list, dynamic_features: str | list | None = 'all', static_features: str | list | None = None, st=None, en=None, as_dataframe: bool = False, **kwargs) Tuple[DataFrame, Dict[str, DataFrame] | Dataset][source]

returns features of multiple stations

Examples

>>> from aqua_fetch import USGS
>>> dataset = USGS()
>>> stations = dataset.stations()[0:3]
>>> features = dataset.fetch_stations_features(stations)
property static_features: List[str]

Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.

Returns:

a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using fetch_static_features().

Return type:

List[str]

property static_map: Dict[str, str]

A dictionary that maps static features to their names in the dataset.

stations() List[str][source]

Names/ids of stations/catchment/gauges or whatever that would be used to index each station in the dataset.

stn_coords(stations: str | List[str] = 'all') DataFrame[source]

returns coordinates of stations as DataFrame with long and lat as columns.

Parameters:

stations – name/names of stations. If not given, coordinates of all stations will be returned.

Returns:

pandas.DataFrame with long and lat columns. The length of dataframe will be equal to number of stations wholse coordinates are to be fetched.

Return type:

pd.DataFrame

Examples

>>> dataset = USGS()
>>> dataset.stn_coords() # returns coordinates of all stations
>>> dataset.stn_coords('01010000')  # returns coordinates of station whose id is 912101A
>>> dataset.stn_coords(['01010000', '01010070'])  # returns coordinates of two stations
class aqua_fetch.rr.WaterBenchIowa(path=None, **kwargs)[source]

Bases: _RainfallRunoff

Rainfall run-off dataset for Iowa (US) following the work of Demir et al., 2022 This is hourly dataset of 125 catchments with 7 static features and 3 dynamic features (pcp, et, discharge) for each catchment. The dynamic features are timeseries from 2011-10-01 12:00 to 2018-09-30 11:00.

**Note: ** Currently the coordinates and catchment boundary files are not available for this dataset.

Examples

>>> from aqua_fetch import WaterBenchIowa
>>> ds = WaterBenchIowa()
... # fetch static and dynamic features of 5 stations
>>> static, dynamic = ds.fetch(5, static_features='all', as_dataframe=True)
>>> len(dynamic)  # it is a dictionary with DataFrame
5
... # keys of dynamic are station names and values are DataFrames
>>> data = dynamic.popitem()[1]
>>> data.shape
(61344, 3)
>>> static.shape
(5, 7)
...
... # using another method
>>> dynamic = ds.fetch_dynamic_features('644', as_dataframe=True)
>>> dynamic['644'].shape
(61344, 3)
...
>>> static, dynamic = ds.fetch(stations='644', static_features="all", as_dataframe=True)
>>> static.shape, dynamic['644'].shape
>>> ((1, 7), (61344, 3))
__init__(path=None, **kwargs)[source]
Parameters:
  • path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.

  • timestep (str) – This can only be set for datasets which are available at multiple timesteps such as LamaHCE or LamaHIce etc.

  • to_netcdf (bool) – whether the data should be saved in netCDF format or not If set to true, the data will be saved in netCDF format which can take time for the first time it is created. However, it leads to faster I/O operations in subsequent accesses.

  • overwrite (bool) – whether to overwrite existing files or not. If set to True, the data will be redownloaded.

  • verbosity (int) –

    This parameter determines the level of verbosity for logging messages.

    • 0: no message will be printed

    • 1: only important messages will be printed

    • >1: any higher value greater than 1 will result in more verbose output

  • kwargs – Any other keyword arguments for the parent Datasets class

property dyn_map

A dictionary that maps dynamic features to their names in the dataset.

property end

end of data

fetch_static_features(stations: str | List[str] = 'all', static_features: str | List[str] = 'all') DataFrame[source]
Parameters:
  • stations (str) – name/id of station of which to extract the data

  • static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.

Examples

>>> from aqua_fetch import WaterBenchIowa
>>> dataset = WaterBenchIowa()
get the names of stations
>>> stns = dataset.stations()
>>> len(stns)
    125
get all static data of all stations
>>> static_data = dataset.fetch_static_features(stns)
>>> static_data.shape
   (125, 7)
get static data of one station only
>>> static_data = dataset.fetch_static_features('592')
>>> static_data.shape
   (1, 7)
get the names of static features
>>> dataset.static_features
get only selected features of all stations
>>> static_data = dataset.fetch_static_features(stns, ['slope', 'area_km2'])
>>> static_data.shape
   (125, 2)
>>> data = dataset.fetch_static_features('592', static_features=['slope', 'area_km2'])
>>> data.shape
   (1, 2)
property static_features: List[str]

Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.

Returns:

a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using fetch_static_features().

Return type:

List[str]

property static_map: Dict[str, str]

A dictionary that maps static features to their names in the dataset.

stations() List[str][source]

Names/ids of stations/catchment/gauges or whatever that would be used to index each station in the dataset.

The following datasets are very much similar to RainfallRunoff datasets, but they do not have observed streamflow data. They are used to provide static and dynamic features to other datasets.

class aqua_fetch.GSHA(path=None, overwrite: bool = False, to_netcdf: bool = True, **kwargs)[source]

Bases: _RainfallRunoff

Global streamflow characteristics, hydrometeorology and catchment attributes following Peirong et al., 2023. The data is downloaded from its zenodo repository. It should be noted that this dataset does not contain observed streamflow data. It has 21568 stations, 26 dynamic (meteorological + storage) features with daily timestep, 21 dynamic features (landcover + streamflow indices + reservoir) with yearly timestep and 35 static features.

Examples

>>> from aqua_fetch import GSHA
>>> dataset = GSHA()
>>> len(dataset.stations())
21568
>>> dataset.agencies
['arcticnet', 'AFD', 'GRDC', 'IWRIS', 'MLIT', 'HYDAT', 'ANA', 'BOM', 'CCRR', 'China', 'CHP', 'RID', 'USGS']
>>> dataset.start
Timestamp('1979-01-01 00:00:00')
>>> dataset.end
Timestamp('2022-12-31 00:00:00')
>>> dataset.static_features
['ele_mt_uav', 'slp_dg_uav', 'lat', 'long', 'area_km2', 'agency', ...]
>>> len(dataset.dynamic_features)
26
>>> len(dataset.daily_dynamic_features)
26
>>> len(dataset.yearly_dynamic_features)
21
>>> dataset.fetch_static_features('1001_arcticnet')
fetch static features for all stations of arcticnet agency
>>> dataset.fetch_static_features(agency='arcticnet')
fetch static features for all stations of arcticnet agency
>>> ds.fetch_dynamic_features(agency='arcticnet')
__init__(path=None, overwrite: bool = False, to_netcdf: bool = True, **kwargs)[source]
Parameters:

to_netcdf (bool) – whether to convert all the data into one netcdf file or not. This will fasten repeated calls to fetch etc but will require netCDF4 package as well as xarry.

property agencies: List[str]

returns the names of agencies as list

  • arcticnet : Antarctica

  • AFD : Spain

  • GRDC : Global

  • IWRIS : India

  • MLIT : Japan

  • HYDAT : Canada

  • ANA: Brazil

  • BOM : Australia

  • CCRR : Chile

  • China

  • CHP : China

  • RID : Thailand

  • USGS

agency_of_stn(stn: str) str[source]

find the agency to which a station belongs

agency_stations(agency: str) List[str][source]

returns the station ids from a particular agency

area(stations: List[str] = 'all', agency: List[str] = 'all') Series[source]

area of catchments

atlas(stations: List[str] = 'all', agency: List[str] = 'all') DataFrame[source]

The link table between GSHA watershed IDs and RiverATLAS river reach IDs, as well as the selected static attributes

Returns:

a pandas.DataFrame of shape (n, 24) where n is the number of stations

Return type:

pd.DataFrame

property boundary_id_map: str

Name of the attribute in the boundary (shapefile/.gpkg) file that will be used to map the catchment/station id to the geometry of the catchment/station. This is used to create the boundary id map. if not given, then the first attribute in the boundary file will be used.

property dyn_map

A dictionary that maps dynamic features to their names in the dataset.

property dynamic_features: List[str]

Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.

Returns:

a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using fetch_dynamic_features().

Return type:

List[str]

property end: Timestamp

end of data

fetch_dynamic_features(stations: List[str] | str = 'all', dynamic_features='all', st=None, en=None, as_dataframe=False, agency: List[str] = 'all') Dataset[source]

Fetches all or selected dynamic features of one station.

Parameters:
  • stations (str) – name/id of station of which to extract the data

  • features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available dynamic features are returned.

  • st (Optional (default=None)) – start time from where to fetch the data.

  • en (Optional (default=None)) – end time untill where to fetch the data

  • as_dataframe (bool, optional (default=False)) – if true, the returned data is pandas.DataFrame otherwise it is xarray dataset

Examples

>>> from aqua_fetch import GSHA
>>> dataset = GSHA()
>>> data = dataset.fetch_dynamic_features('1001_arcticnet', as_dataframe=True)
>>> data.shape
(16071, 26)
>>> dataset.dynamic_features
>>> stns = ['1001_arcticnet', '10062_arcticnet']
>>> data = dataset.fetch_dynamic_features(stns,
... dynamic_features=['airtemp_C_mean_era5', 'pcp_mm_mswep'])
fetch_lai(stations: List[str] = 'all', agency: List[str] = 'all')[source]

Leaf Area Index timeseries for one or more than one station either as xarray.Dataset or pandas.DataFrame. The data has daily timestep.

fetch_meteo_vars(stations: List[str] = 'all', agency: List[str] = 'all')[source]

Meteorological variables from 1979-01-01 to 2022-12-31 for one or more than one station either as xarray.Dataset or dictionary. The data has daily timestep.

fetch_static_features(stations: str | List[str] = 'all', static_features: str | List[str] = 'all', agency: List[str] = 'all') DataFrame[source]

Returns static features of one or more stations.

Parameters:
  • stations (str) – name/id of station/stations of which to extract the data

  • static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.

Returns:

a pandas.DataFrame of shape (stations, features)

Return type:

pd.DataFrame

Examples

>>> from aqua_fetch import GSHA
>>> dataset = GSHA()
get the names of stations
>>> stns = dataset.stations()
>>> len(stns)
    21568
get all static data of all stations
>>> static_data = dataset.fetch_static_features(stns)
>>> static_data.shape
   (21568, 35)
get static data of one station only
>>> static_data = dataset.fetch_static_features('1001_arcticnet')
>>> static_data.shape
   (1, 35)
get the names of static features
>>> dataset.static_features
get only selected features of all stations
>>> static_data = dataset.fetch_static_features(stns, ['ele_mt_uav', 'slp_dg_uav'])
>>> static_data.shape
   (21568, 2)
>>> data = dataset.fetch_static_features('1001_arcticnet', static_features=['slp_dg_uav', 'slp_dg_uav'])
>>> data.shape
   (1, 2)
>>> out = ds.fetch_static_features(agency='arcticnet')
>>> out.shape
(106, 35
fetch_stn_dynamic_features(station: str, dynamic_features='all', st: str | Timestamp = None, en: str | Timestamp = None) DataFrame[source]

Fetches all or selected dynamic features of one station.

Parameters:
  • station (str) – name/id of station of which to extract the data

  • features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available dynamic features are returned.

Returns:

a pandas.DataFrame of shape (n, features) where n is the number of days

Return type:

pd.DataFrame

Examples

>>> from aqua_fetch import GSHA
>>> dataset = GSHA()
>>> data = dataset.fetch_stn_dynamic_features('1001_arcticnet')
>>> data.shape
(16071, 26)
>>> dataset.dynamic_features
>>> data = dataset.fetch_stn_dynamic_features('1001_arcticnet',
... dynamic_features=['airtemp_C_mean_era5', 'pcp_mm_mswep'])
>>> data.shape
(16071, 2)
fetch_storage_vars(stations: List[str] = 'all', agency: List[str] = 'all')[source]

Water storage term variables from 1979-01-01 to 2021-12-31 for one or more than one station either as xarray.Dataset or dictionary. The data has daily timestep.

lai_stn(stn: str) Series[source]

Daily leaf area index. As per documentation, due to satellite data quality, some watersheds might have relatively serious data missing issue. The data is from 1981-01-01 to 2020-12-31.

Returns:

a pandas.Series of shape (14571,) where 14571 is the number of days

Return type:

pd.Series

lc_variables(stations: List[str] = 'all', agency: List[str] = 'all')[source]

Landcover variables for one or more than one station either as xarray.Dataset or dictionary. The data has yearly timestep.

lc_variables_stn(stn: str) DataFrame[source]

Landcover variables for a given station which have yearly timestep. Following three landcover variables are provided:

  • urban_fraction(%): Ratio of urban extent to the entire watershed area (percentage).

  • forest_fraction(%): Ratio of forest extent to the entire watershed area (percentage).

  • cropland_fraction(%): Ratio of cropland extent to the entire watershed area (percentage).

Returns:

a pandas.DataFrame of shape (n, 3) where n is the number of years

Return type:

pd.DataFrame

meteo_vars() List[str][source]

returns names of meteorological variables

meteo_vars_all_stns()[source]

Meteorological variables from 1979-01-01 to 2022-12-31 for all stations either as xarray.Dataset or dictionary. The data has daily timestep.

meteo_vars_stn(stn: str) DataFrame[source]

Daily meteorological variables from 1979-01-01 to 2022-12-31 for a given station.

Returns:

a pandas.DataFrame of shape (16071, 19) where n is the number of days

Return type:

pd.DataFrame

reservoir_variables(stations: List[str] = 'all', agency: List[str] = 'all')[source]

Reservoir variables for one or more than one station either as xarray.Dataset or dictionary. The data has yearly timestep.

reservoir_variables_stn(stn: str) DataFrame[source]

Reservoir variables for a given station from 1979 to 2020 with yearly timestep. Following two reservoir variables are provided:

  • capacity: Reservoir capacity of the year in the watershed (m3). To avoid including too many missing values, we use the ICOLD capacity in the linked table of the GeoDAR dataset.

  • dor: Degree of regulation of the watershed (yearly reservoir capacity/yearly mean flow). If yearly mean flow is missing, the value is substituted with the average of all mean flow values.

Returns:

a pandas.DataFrame of shape (42, 2) where 42 is the number of years

Return type:

pd.DataFrame

property static_features: List[str]

Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.

Returns:

a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using fetch_static_features().

Return type:

List[str]

property static_map: Dict[str, str]

A dictionary that maps static features to their names in the dataset.

stations(agency: str = 'all') List[str][source]

returns names of stations as list

stn_coords(stations: List[str] = 'all', agency: List[str] = 'all') DataFrame[source]

returns the latitude and longitude of stations

Returns:

a pandas.DataFrame of shape (n, 2) where n is the number of stations

Return type:

pd.DataFrame

Examples

>>> from aqua_fetch import GSHA
>>> dataset = GSHA()
>>> dataset.stn_coords('1001_arcticnet')
>>> dataset.stn_coords(['1001_arcticnet', '1002_arcticnet'])
get coordinates for all stations of arcticnet agency
>>> dataset.stn_coords(agency='arcticnet')
storage_vars() List[str][source]

returns names of storage variables

storage_vars_all_stns()[source]

Water storage term variables from 1979-01-01 to 2021-12-31 for all stations either as xarray.Dataset or dictionary. The data has daily timestep.

storage_vars_stn(stn: str) DataFrame[source]

Daily Water storage term variables from 1979-01-01 to 2021-12-31 for a given station.

  • SM_layer1: 0-7 cm soil moisture from ERA5 land soil water layer 1 (m3/m3) for 1979-2021.

  • SM_layer2: 7-28 cm soil moisture from ERA5 land soil water layer 2 (m3/m3) for 1979-2021.

  • SM_layer3: 28-100 cm soil moisture from ERA5 land soil water layer 3 (m3/m3) for 1979-2021.

  • SM_layer4: 100-289 cm soil moisture from ERA5 land soil water layer 4 (m3/m3) for 1979-2021.

  • SWDE: Snow water equivalent from ERA5 snow depth water equivalent (m of water equivalent) for 1979-2021.

  • groundwater(%): Groundwater percentage from GRACE-FO data assimilation (%) for 2003-2021 (weekly).

Returns:

a pandas.DataFrame of shape (15706, 6) where n is the number of days

Return type:

pd.DataFrame

streamflow_indices(stations: List[str] = 'all', agency: List[str] = 'all')[source]

Landcover variables for one or more than one station either as xarray.Dataset or dictionary. The data has yearly timestep.

streamflow_indices_stn(stn: str) DataFrame[source]

Streamflow indices for a given station which have yearly timestep.

Returns:

a pandas.DataFrame of shape (n, 16) where n is the number of years

Return type:

pd.DataFrame

uncertainty(stations: List[str] = 'all', agency: List[str] = 'all') DataFrame[source]

Uncertainty estimates of all meteorological variables over all watersheds

  • P_uncertainty (%) Precipitation uncertainty estimates (in percentage). Uncertainties are calculated from EM-Earth deterministic and MSWEP datasets.

  • T_uncertainty (%) Temperature uncertainty estimates (in percentage). Uncertainties are calculated from EUSTACE, MERRA-2, and ERA5 datasets.

  • EVP_uncertainty (%) Actual evapotranspiration uncertainty estimates (in percentage). Uncertainties are calculated from GLEAM and REA datasets.

  • LRAD_uncertainty (%) Downward longwave radiation uncertainty estimates (in percentage). Uncertainties are calculated from MERRA-2 and ERA5-land datasets.

  • SRAD_uncertainty (%) Downward shortwave radiation uncertainty estimates (in percentage). Uncertainties are calculated from MERRA-2 and ERA5-land datasets.

  • wind_uncertainty (%) Wind speed uncertainty estimates (in percentage). The u- and v- components are aggregated on each grid to obtain wind speed. Uncertainties are calculated from MERRA-2 and ERA5-land datasets.

  • pet_uncertainty (%) Potential evapotranspiration uncertainty estimates (in percentage). Uncertainties are calculated from GLEAM and REA datasets.

Returns:

a pandas.DataFrame of shape (n, 7) where n is the number of stations

Return type:

pd.DataFrame

class aqua_fetch.EStreams(path=None, **kwargs)[source]

Bases: _RainfallRunoff

Handles EStreams data following the work of Nascimento et al., 2024 . The data is available at its zenodo repository . It should be noted that this dataset does not contain observed streamflow data. It has 17130 stations, 9 dynamic (meteorological) features with daily timestep, 27 dynamic features with yearly timestep and 214 static features. The dynamic features are from 1950-01-01 to 2023-06-30.

Examples

>>> from aqua_fetch import EStreams
>>> dataset = EStreams()
__init__(path=None, **kwargs)[source]
Parameters:
  • path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.

  • timestep (str) – This can only be set for datasets which are available at multiple timesteps such as LamaHCE or LamaHIce etc.

  • to_netcdf (bool) – whether the data should be saved in netCDF format or not If set to true, the data will be saved in netCDF format which can take time for the first time it is created. However, it leads to faster I/O operations in subsequent accesses.

  • overwrite (bool) – whether to overwrite existing files or not. If set to True, the data will be redownloaded.

  • verbosity (int) –

    This parameter determines the level of verbosity for logging messages.

    • 0: no message will be printed

    • 1: only important messages will be printed

    • >1: any higher value greater than 1 will result in more verbose output

  • kwargs – Any other keyword arguments for the parent Datasets class

area(stations: List[str] = 'all', countries: List[str] = 'all') Series[source]

area of catchments im km2

property countries: List[str]

returns the names of 39 countries covered by EStreams as list

country_of_stn(stn: str) str[source]

find the agency to which a station belongs

country_stations(country: str) List[str][source]

returns the station ids from a particular country

property dyn_map

A dictionary that maps dynamic features to their names in the dataset.

property dynamic_features: List[str]

Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.

Returns:

a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using fetch_dynamic_features().

Return type:

List[str]

property end: Timestamp

end of data

fetch_dynamic_features(stations: List[str] | str = 'all', dynamic_features='all', st=None, en=None, as_dataframe=False, countries: str | List[str] = 'all')[source]

Fetches all or selected dynamic features of one station.

Parameters:
  • stations (str) – name/id of station of which to extract the data

  • features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available dynamic features are returned.

  • st (Optional (default=None)) – start time from where to fetch the data.

  • en (Optional (default=None)) – end time untill where to fetch the data

  • as_dataframe (bool, optional (default=False)) – if true, the returned data is pandas.DataFrame otherwise it is xarray.Dataset

Examples

>>> from aqua_fetch import EStreams
>>> camels = EStreams()
>>> camels.fetch_dynamic_features('IEEP0281', as_dataframe=True)
>>> camels.dynamic_features
>>> camels.fetch_dynamic_features('IEEP0281',
... features=['p_mean', 't_mean', 'pet_mean'],
... as_dataframe=True)
fetch_stn_dynamic_features(station: str, dynamic_features='all', st: str | Timestamp = None, en: str | Timestamp = None) DataFrame[source]

Fetches all or selected dynamic features of one station.

Parameters:
  • station (str) – name/id of station of which to extract the data

  • features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available dynamic features are returned.

Returns:

a pandas.DataFrame of shape (n, features) where n is the number of days

Return type:

pd.DataFrame

Examples

>>> from aqua_fetch import EStreams
>>> camels = EStreams()
>>> camels.fetch_stn_dynamic_features('IEEP0281')
>>> camels.dynamic_features
>>> camels.fetch_stn_dynamic_features('IEEP0281',
... features=['p_mean', 't_mean', 'pet_mean'])
gauge_stations() DataFrame[source]

reads the file estreams_gauging_stations.csv as dataframe

hydro_clim_sigs(stations: List[str] = 'all', countries: List[str] = 'all') DataFrame[source]

Returns the hydro-climatic signatures of one or more stations

Returns:

a pandas.DataFrame of hydro-climatic signatures of shape (stations, 31)

Return type:

pd.DataFrame

meteo_data(stations: str | List[str] = 'all', countries: List[str] | str = 'all')[source]

Returns the meteorological data of one or more stations either as dictionary of dataframes or xarray Dataset

meteo_data_station(station: str) DataFrame[source]

Returns the meteorological data of a single station.

Parameters:

station (str) – name/id of station of which to extract the data

Returns:

a pandas.DataFrame of meteorological data of shape (time, 9)

Return type:

pd.DataFrame

property static_features

Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.

Returns:

a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using fetch_static_features().

Return type:

List[str]

property static_map: Dict[str, str]

A dictionary that maps static features to their names in the dataset.

stations() List[str][source]

Returns a list of all station names. Note that the basin_id column is used as the station name.

stn_coords(stations: List[str] = 'all', countries: List[str] = 'all') DataFrame[source]

Returns the coordinates of one or more stations

Returns:

a pandas.DataFrame of shape (stations, 2)

Return type:

pd.DataFrame

Examples

>>> from aqua_fetch import EStreams
>>> dataset = EStreams()
>>> dataset.stn_coords('IEEP0281')
>>> dataset.stn_coords(['IEEP0281', 'IEEP0282'])
>>> dataset.stn_coords(countries='IE')