Rainfall Runoff datasets
This section include datasets which can be used for rainfall runoff modeling.
They all contain observed streamflow and meteological data as time series.
These are named as dynamic features. The physical catchment properties
are included as static features as tabular data, where each row corresponds
to one catchment and each column to one static feature.
In addition to published datasets, this package introduces 10 new datasets for rainfall-runoff modeling. These datasets have not yet been published but follow the CAMELS dataset series convention. They include Ireland, Finland, Italy, Poland, Portugal, Japan, Thailand, Arcticnet, Spain, and the USGS. The observed streamflow data are sourced from the national meteorological or hydrological websites of the respective countries. Catchment boundaries and meteorological data for Ireland, Finland, Italy, Poland, and Portugal are obtained from EStreams (Nascimento et al., 2024), and similarly for Japan, Thailand, Arcticnet, and Spain from GSHA (Peirong et al., 2023). For USGS, the catchment boundaries are sourced from HYSETS (Arsenault et al., 2020).
Although each data source has a dedicated, however
all datasets listed in Table List of datasets are accessible via the aqua_fetch.rr.RainfallRunoff
class, which allows for a unified and consistent approach to each dataset. The class
provides several methods to access static features, dynamic features, or catchment
boundaries. Although the raw data files for each dataset may come in different formats,
the methods to access these features through the aqua_fetch.rr.RainfallRunoff class remain the same.
Individual classes for each dataset are also available and may offer more control to
users over specific datasets. However, for most cases, the use of the aqua_fetch.rr.RainfallRunoff
class will suffice.
The naming and units of dynamic features in each dataset may vary. However, we have
standardized these features using the formula name_unit_specifier for each dynamic
feature across all datasets. In this formula, the specifier can indicate the source
(such as ERA5 or MSWEP for precipitation), the method used to calculate the feature
(like makkink or penman for evapotranspiration), or the aggregation type (min, max, mean).
For example, a precipitation dynamic feature from MSWEP would be labeled as pcp_mm_mswep.
This approach ensures that feature names are representative and understandable.
Dynamic features for which this method is inapplicable retain their original names.
Another feature of the AquaFetch is the optional inclusion of static and dynamic features from EStreams and GSHA for all datasets listed in Table List of datasets. This is beneficial as EStreams and GSHA include several static and dynamic features calculated for the catchments, which are not included in other datasets. For instance, EStreams provides information on annual variation in land use for all European catchments, a feature not available in CAMELS-GB (Coxon et al., 2020) or other European datasets. This step is optional since it initiaties the download of GSHA and EStreams datasets which can be time-consuming and may not always be necessary.
Certain datasets in this package feature overlapping stations from the same region.
For example, both the aqua_fetch.Bull and Spain datasets cover Spain.
However, the Bull dataset was introduced by by Aparicio et al., 2024 ,
whereas the Spain dataset was introduced in this work. The Spain dataset contains
more stations, totaling 889, while the Bull dataset includes 484 stations.
Similarly, both the CABra (Almagro et al., 2021) and CAMELS_BR (Chagas et al., 2020) datasets
cover Brazil and have been published in peer-reviewed journals. However, they differ
in their temporal coverage and the number of static and dynamic features. Furthermore,
Denmark is covered by two datasets, Caravan_DK (Koch 2022) and CAMELS_DK (Liu et al., 2024),
which differ in temporal coverage and the number of static and dynamic features.
The HYSETS dataset (Arsenault et al., 2020) covers Mexico, the US, and Canada. However,
we identified issues with the observed streamflow data for the US in HYSETS. As a
result, we introduced the USGS dataset, which focuses specifically on the US region.
The catchment boundaries, static features, and meteorological data for USGS, however,
are still obtained from HYSETS.
List of datasets
Source Name |
Class |
Number of Daily Stations |
Number of Hourly Stations |
Dynamic features |
Static features |
Temporal Coverage |
Spatial Coverage |
Reference |
|---|---|---|---|---|---|---|---|---|
|
|
106 |
27 |
35 |
1979 - 2003 |
Arctic (Russia) |
||
|
484 |
55 |
214 |
1990 - 2020 |
Spain |
|||
|
735 |
12 |
97 |
1980 - 2010 |
Brazil |
|||
|
222, 561 |
26 |
166, 187 |
1900 - 2018 |
Australia |
|||
|
897 |
10 |
67 |
1920 - 2019 |
Brazil |
|||
|
331 |
9 |
209 |
1981 - 2020 |
Switzerland |
|||
|
516 |
12 |
104 |
1913 - 2018 |
Chile |
|||
|
347 |
6 |
255 |
1981 - 2022 |
Columbia |
|||
|
1555 |
21 |
111 |
1951 - 2020 |
Germany |
|||
|
304 |
13 |
119 |
1989 - 2023 |
Denmark |
|||
|
320 |
16 |
111 |
1963 - 2023 |
Finland |
|||
|
654 |
22 |
344 |
1970 - 2021 |
France |
|||
|
671 |
10 |
145 |
1970 - 2015 |
Britain |
|||
|
|
472 |
20 |
210 |
1980 - 2020 |
Republic of India |
||
|
56 |
56 |
25 |
61 |
2004 - 2021 |
Luxumbourg |
||
|
369 |
5 |
39 |
1972 - 2024 |
New Zealand |
|||
|
50 |
4 |
76 |
1961 - 2020 |
Sweden |
|||
|
178 |
17 |
215 |
2000 - 2019 |
South Korea |
|||
|
671 |
8 |
59 |
1980 - 2014 |
United States |
|||
|
304 |
38 |
211 |
1981 - 2020 |
Denmark |
|||
|
111 |
16 |
124 |
1990 - 2020 |
China |
|||
|
669 |
27 |
35 |
2012 - 2023 |
Finland |
|||
|
5357 |
39 |
211 |
1950 - 2023 |
Global |
|||
|
561 |
|||||||
|
14425 |
5 |
28 |
1950 - 2018 |
North America (Mexico, Canada, USA) |
|||
|
|
464 |
27 |
35 |
1992 - 2020 |
Ireland |
||
|
294 |
37 |
35 |
1992 - 2020 |
Italy |
|||
|
|
751 |
696 |
27 |
35 |
1979 - 2022 |
Japan |
|
|
859 |
859 |
22 |
80 |
1981 - 2019 |
Central Europe |
||
|
111 |
111 |
36 |
154 |
1950 - 2021 |
Iceland |
||
|
7 |
8 |
14 |
2013 - 2019 |
Iceland |
|||
|
1287 |
27 |
35 |
1992 - 2020 |
Poland |
|||
|
280 |
27 |
35 |
1992 - 2020 |
Portugal |
|||
|
1 |
2 |
0 |
2016 - 2019 |
Lulea (Sweden) |
|||
|
24 |
3 |
232 |
1920 - 1940 |
Haiti |
|||
|
117 |
3 |
10 |
1950 - 2023 |
Slovenia |
|||
|
889 |
27 |
35 |
1979 - 2020 |
Spain |
|||
|
|
73 |
27 |
35 |
1980 - 1999 |
Thailand |
||
|
|
12004 |
5 |
27 |
1950 - 2018 |
United States |
||
|
125 |
3 |
7 |
2011 - 2018 |
Iowa (USA) |
High Level API
The aqua_fetch.rr.RainfallRunoff class represents high level API
which provides a unified and easy-to-use interface to access all the datasets.
It is recommended to use this class to access the datasets.
- class aqua_fetch.rr.RainfallRunoff(dataset: str, path: str | PathLike = None, overwrite: bool = False, to_netcdf: bool = True, processes: int = None, remove_zip: bool = True, verbosity: int = 1, **kwargs)[source]
Bases:
objectThis class provides access to all the rainfall-runoff datasets. For simiplity and resusability, use this class instead of using the individual dataset classes.
Examples
>>> from aqua_fetch import RainfallRunoff >>> dataset = RainfallRunoff('CAMELS_AUS') # instead of CAMELS_AUS, you can provide any other dataset name >>> _, df = dataset.fetch(stations=1, as_dataframe=True) >>> df = df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it >>> df.columns = df.columns.get_level_values('dynamic_features') >>> df.shape (26388, 28) ... # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 561 ... # get data of 10 % of stations as dataframe >>> _, df = dataset.fetch(0.1, as_dataframe=True) >>> df.shape (738864, 56) ... # The returned dataframe is a multi-indexed data >>> df.index.names == ['time', 'dynamic_features'] True ... # get data by station id >>> _, df = dataset.fetch(stations='912101A', as_dataframe=True) >>> df.unstack().shape (26388, 28) ... # get names of available dynamic features >>> dataset.dynamic_features ... # get only selected dynamic features >>> _, data = dataset.fetch(1, as_dataframe=True, ... dynamic_features=['airtemp_C_silo_max', 'pcp_mm_silo', 'aet_mm_silo_morton', 'q_cms_obs']) >>> data.unstack().shape (26388, 4) ... # get names of available static features >>> dataset.static_features ... # get all static features of all stations >>> df = dataset.fetch_static_features() >>> df.shape (561, 187) ... # get area of a single station >>> area = dataset.area('912101A') >>> type(area), area.shape (pandas.core.series.Series, (1,)) ... # get data of 10 random stations >>> _, df = dataset.fetch(10, as_dataframe=True) >>> df.shape # remember this is a multiindexed dataframe (26388, 280) # If we want to get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='912101A', static_features="all", as_dataframe=True) >>> static.shape, dynamic.unstack().shape ((1, 187), (26388, 28)) >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape (561, 2) >>> dataset.stn_coords('912101A') # returns coordinates of station whose id is 912101A -18.643612 139.253052 >>> dataset.stn_coords(['912101A', '912105A']) # returns coordinates of two stations ... # get boundary of the catchment >>> boundary = dataset.get_boundary('912101A') >>> b.shape >>> (20086, 2)
See sphx_glr_auto_examples_camels_australia.py for more comprehensive usage example.
- __init__(dataset: str, path: str | PathLike = None, overwrite: bool = False, to_netcdf: bool = True, processes: int = None, remove_zip: bool = True, verbosity: int = 1, **kwargs)[source]
Rainfall Runoff datasets
- Parameters:
dataset (str) –
dataset name. This must be one of the following:
ArcticnetBullCABraCCAMCAMELS_AUSCAMELS_BRCAMELS_CHCAMELS_CLCAMELS_COLCAMELS_DECAMELS_DK0CAMELS_DKCAMELS_FICAMELS_FRCAMELS_GBCAMELS_INDCAMELS_LUXCAMELS_NZCAMELS_SECAMELS_SKCAMELS_USEStreamsFinlandGRDCCaravanGSHAHYSETSHYPEIrelandItalyJapanLamaHCELamaHIcePolandPortugalRRLuleaSwedenSimbiSloveniaSpainThailandUSGSWaterBenchIowa
path (str) – path to directory inside which data is located/downloaded. If provided and the path/dataset exists, then the data will be read from this path. If provided and the path/dataset does not exist, then the data will be downloaded at this path. If not provided, then the data will be downloaded in the default path which is
.../aqua_fetch/data/.overwrite (bool) – If the data is already downloaded then you can set it to True, to make a fresh download.
to_netcdf (bool) – whether to convert all the data into one netcdf file or not. This will fasten repeated calls to fetch etc but will require netcdf5 package as well as
xarray.verbosity (int) – 0: no message will be printed
kwargs – additional keyword arguments for the underlying dataset class For example
versionforaqua_fetch.rr.CAMELS_AUSortimestepforaqua_fetch.rr.LamaHCEdataset ormet_srcforaqua_fetch.rr.CAMELS_BR
- area(stations: str | List[str] = 'all') Series[source]
Returns area (Km2) of all/selected catchments as
pandas.Series- Parameters:
stations (str/list (default=``all``)) – name/names of stations. Default is
all, which will return area of all stations. For names of stations, seestations().- Returns:
a
pandas.Serieswhose indices are catchment ids and values are areas of corresponding catchments.- Return type:
pd.Series
Examples
>>> from aqua_fetch import RainfallRunoff >>> dataset = RainfallRunoff('CAMELS_CH') >>> dataset.area() # returns area of all stations >>> dataset.area('2004') # returns area of station whose id is 2004 >>> dataset.area(['2004', '6004']) # returns area of two stations
- property dynamic_features: List[str]
returns names of dynamic features as python list of strings
Examples
>>> from aqua_fetch import RainfallRunoff >>> dataset = RainfallRunoff('CAMELS_AUS') >>> dataset.dynamic_features
- property end: str
returns end date of data
Examples
>>> from aqua_fetch import RainfallRunoff >>> dataset = RainfallRunoff('CAMELS_AUS') >>> dataset.end()
- fetch(stations: str | List[str] | int | float = 'all', dynamic_features: List[str] | str | None = 'all', static_features: str | List[str] | None = None, st: None | str = None, en: None | str = None, as_dataframe: bool = False, **kwargs) tuple[DataFrame, DataFrame | Dataset][source]
Fetches the features of one or more stations.
- Parameters:
stations –
It can have following values:
int: number of (randomly selected) stations to fetchfloat: fraction of (randomly selected) stations to fetchstr: name/id of station to fetch. However, ifallis provided, then all stations will be fetched. For names of stations, seestations().list: list of names/ids of stations to fetch
dynamic_features ((default=``all``)) –
It can have following values:
str: name of dynamic feature to fetch. Ifallis provided, then all dynamic features will be fetched. For names of dynamic features, seedynamic_features().list: list of dynamic features to fetch.None : No dynamic feature will be fetched. The second returned value will be None.
static_features ((default=None)) –
It can have following values:
str: name of static feature to fetch. Ifallis provided, then all static features will be fetched. For names of static features, seestatic_features().list: list of static features to fetch.None : No static feature will be fetched. The first returned value will be None.
st – starting date of data to be returned. If None, the data will be returned from where it is available.
en – end date of data to be returned. If None, then the data will be returned till the date data is available.
as_dataframe – whether to return dynamic attributes as
pandas.DataFrameor asxarray.Dataset. ifxarraylibrary is not installed, then this parameter will be ignored and the data will be returned aspandas.DataFrame.kwargs – keyword arguments
- Returns:
A tuple of static and dynamic features. Static features are always returned as
pandas.DataFramewith shape (stations, static features). The index of static features’ DataFrame is the station/gauge ids while the columns are names of the static features. Dynamic features are returned either asxarray.Datasetorpandas.DataFramedepending upon whether as_dataframe is True or False and whether thexarraylibrary is installed or not. If dynamic features arexarray.Dataset, then this dataset consists of data_vars equal to the number of stations and station names asxarray.Dataset.variablesand time and dynamic_features as dimensions and coordinates. If dynamic features are returned aspandas.DataFrame, then the first index is time and the second index is dynamic_features.- Return type:
Examples
>>> from aqua_fetch import RainfallRunoff >>> dataset = RainfallRunoff('CAMELS_AUS') >>> # get data of 10% of stations >>> _, df = dataset.fetch(stations=0.1, as_dataframe=True) # returns a multiindex dataframe ... # fetch data of 5 (randomly selected) stations >>> _, five_random_stn_data = dataset.fetch(stations=5, as_dataframe=True) ... # fetch data of 3 selected stations >>> _, three_selec_stn_data = dataset.fetch(stations=['912101A','912105A','915011A'], as_dataframe=True) ... # fetch data of a single stations >>> _, single_stn_data = dataset.fetch(stations='318076', as_dataframe=True) ... # get both static and dynamic features as dictionary >>> static, dyanmic = dataset.fetch(1, static_features="all", as_dataframe=True) # -> dict >>> dynamic ... # get only selected dynamic features >>> _, sel_dyn_features = dataset.fetch(stations='318076', ... dynamic_features=['q_cms_obs', 'solrad_wm2_silo'], as_dataframe=True) ... # fetch data between selected periods >>> _, data = dataset.fetch(stations='318076', st="20010101", en="20101231", as_dataframe=True)
- fetch_dynamic_features(station: str, dynamic_features='all', st=None, en=None, as_dataframe=False) DataFrame | Dataset[source]
Fetches all or selected dynamic attributes of one station.
- Parameters:
station (str) – name/id of station of which to extract the data. For names of stations see
stations()dynamic_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available dynamic features are returned. For names of dynamic features, see
dynamic_features()st (Optional (default=None)) – start time from where to fetch the data.
en (Optional (default=None)) – end time untill where to fetch the data
as_dataframe (bool, optional (default=False)) – if true, the returned data is
pandas.DataFrameotherwise it isxarray.Dataset
- Returns:
a
pandas.DataFrameorxarray.Datasetdepending upon the value of as_dataframe and whetherxarrayis installed or not.- Return type:
pd.DataFrame or xr.Dataset
Examples
>>> from aqua_fetch import RainfallRunoff >>> camels = RainfallRunoff('CAMELS_AUS') >>> camels.fetch_dynamic_features('912101A', as_dataframe=True).unstack() >>> camels.dynamic_features >>> camels.fetch_dynamic_features('912101A', ... features=['airtemp_C_silo_max', 'vp_hpa_silo', 'q_cms_obs'], ... as_dataframe=True).unstack()
- fetch_static_features(stations: str | list = 'all', static_features: str | list = 'all') DataFrame[source]
Fetches all or selected static attributes of one or more stations.
- Parameters:
stations (str) – name/id of station of which to extract the data . For names of stations see
stations().static_features (list/str, optional (default="all")) – The name/names of static features to fetch. By default, all available static features are returned. For names of static features, see
static_features().
- Returns:
a pandas
pandas.DataFrame- Return type:
pd.DataFrame
Examples
>>> from aqua_fetch import RainfallRunoff >>> camels = RainfallRunoff('CAMELS_AUS') >>> camels.fetch_static_features('912101A') >>> camels.static_features >>> camels.fetch_static_features('912101A', ... features=['elev_mean', 'relief', 'ksat', 'pop_mean'])
- fetch_station_features(station: str, dynamic_features: str | list | None = 'all', static_features: str | list | None = None, st: str | None = None, en: str | None = None, **kwargs) tuple[DataFrame, DataFrame][source]
Fetches static and dynamic features for one station.
- Parameters:
station (str) – station id/gauge id for which the data is to be fetched. For names of stations, see
stations()dynamic_features (str/list, optional) – names of dynamic features/attributes to fetch. For names of dynamic features, check the output of
dynamic_features()static_features – names of static features/attributes to be fetches. For names of static features, check the output of
static_features()st (str,optional) – starting point from which the data to be fetched. By default, the data will be fetched from where it is available.
en (str, optional) – end point of data to be fetched. By default the dat will be fetched
- Returns:
A tuple of static and dynamic features, both as
pandas.DataFrame. The dataframe of static features will be of single row while the dynamic features will be of shape (time, dynamic features).- Return type:
Examples
>>> from aqua_fetch import RainfallRunoff >>> dataset = RainfallRunoff('CAMELS_AUS') >>> static, dynamic = dataset.fetch_station_features('912101A') >>> static.shape
>>> dynamic.shape
- fetch_stations_features(stations: str | List[str], dynamic_features: str | List[str] | None = 'all', static_features: str | List[str] | None = None, st=None, en=None, as_dataframe: bool = False, **kwargs) tuple[DataFrame, DataFrame | Dataset][source]
Reads attributes of more than one stations.
- Parameters:
stations – name/ids of stations for which data is to be fetched. For names of stations, see
stations().dynamic_features – list of dynamic features to be fetched. For names of dynamic features, see
dynamic_features(). ifall, then all dynamic features will be fetched. If None, then no dynamic attribute will be fetched and the second returned value will be None.static_features – list of static features to be fetched. If all, then all static features will be fetched. If None, then no static attribute will be fetched. For names of static features, see
static_features().st – start of data to be fetched.
en – end of data to be fetched.
as_dataframe (whether to return the data as
pandas.DataFrame. default) – isxarray.Datasetobjectdict (kwargs) – additional keyword arguments
- Returns:
A tuple of static and dynamic features. Static features are always returned as
pandas.DataFramewith shape (stations, static features). The index of static features’ DataFrame is the station/gauge ids while the columns are names of the static features. Dynamic features are returned either asxarray.Datasetorpandas.DataFramedepending upon whether as_dataframe is True or False and whether thexarraylibrary is installed or not. If dynamic features arexarray.Dataset, then this dataset consists of data_vars equal to the number of stations and station names asxarray.Dataset.variablesand time and dynamic_features as dimensions and coordinates. If dynamic features are returned aspandas.DataFrame, then the first index is time and the second index is dynamic_features.- Return type:
- Raises:
ValueError – if both
dynamic_featuresandstatic_featuresare None
Examples
>>> from aqua_fetch import RainfallRunoff >>> dataset = RainfallRunoff('CAMELS_AUS') ... # find out station ids >>> dataset.stations() ... # get data of selected stations >>> static, dynamic = dataset.fetch_stations_features(['912101A', '912105A', '915011A'], ... as_dataframe=True)
- get_boundary(station: str)[source]
returns boundary of a catchment as fiona.Geometry object.
- Parameters:
station (str) – name/id of catchment. For names of catchments, see
stations().- Returns:
a fiona.Geometry object representing the boundary of the catchment.
- Return type:
fiona.Geometry
Examples
>>> from aqua_fetch import RainfallRunoff >>> dataset = RainfallRunoff('CAMELS_SE') >>> dataset.get_boundary(dataset.stations()[0])
- plot_catchment(station: str, show_outlet: bool = False, ax: Axes = None, show: bool = True, **kwargs)[source]
plots catchment boundaries
- Parameters:
station (str) – name/id of station. For names of stations, see
stations()show_outlet (bool, optional (default=False)) – if True, then outlet of the catchment will be plotted as a red dot
ax (plt.Axes) – matplotlib axes to draw the plot. If not given, then new axes will be created.
show (bool)
**kwargs
- Return type:
plt.Axes
Examples
>>> from aqua_fetch import RainfallRunoff >>> dataset = RainfallRunoff('CAMELS_AUS') >>> dataset.plot_catchment() >>> dataset.plot_catchment(marker='o', ms=0.3) >>> ax = dataset.plot_catchment(marker='o', ms=0.3, show=False) >>> ax.set_title("Catchment Boundaries") >>> plt.show()
- plot_stations(stations: List[str] = 'all', marker='.', color: str = None, ax: Axes = None, show: bool = True, **kwargs) Axes[source]
plots coordinates of stations
- Parameters:
stations – name/names of stations. If not given, all stations will be plotted. For names of stations, see
stations().marker – marker to use.
color (str, optional) – name of static feature to use as color.
ax (plt.Axes) – matplotlib axes to draw the plot. If not given, then new axes will be created.
show (bool)
**kwargs
- Return type:
plt.Axes
Examples
>>> from aqua_fetch import RainfallRunoff >>> dataset = RainfallRunoff('CAMELS_AUS') >>> dataset.plot_stations() >>> dataset.plot_stations(['1', '2', '3']) >>> dataset.plot_stations(marker='o', ms=0.3) >>> ax = dataset.plot_stations(marker='o', ms=0.3, show=False) >>> ax.set_title("Stations") >>> plt.show() using area as color >>> ds.plot_stations(color='area_km2')
- q_mmd(stations: str | List[str] = 'all') DataFrame[source]
returns streamflow in the units of milimeter per day. This is obtained by diving
q/area- Parameters:
stations (str/list) – name/names of stations. Default is
all, which will return area of all stations. For names of stations, seestations().- Returns:
a
pandas.DataFramewhose indices are time-steps and columns are catchment/station ids.- Return type:
pd.DataFrame
- property start: str
returns starting date of data
Examples
>>> from aqua_fetch import RainfallRunoff >>> dataset = RainfallRunoff('CAMELS_AUS') >>> dataset.start()
- property static_features: List[str]
returns names of static features as python list of strings
Examples
>>> from aqua_fetch import RainfallRunoff >>> dataset = RainfallRunoff('CAMELS_AUS') >>> dataset.static_features
- stations() List[str][source]
Names/ids of stations/catchment/basins/gauges or whatever that would be used to index each catchment in the dataset. Every catchment has a unique name/id which can be used to fetch its data.
Examples
>>> from aqua_fetch import RainfallRunoff >>> dataset = RainfallRunoff('CAMELS_AUS') >>> dataset.stations()
- stn_coords(stations: str | List[str] = 'all') DataFrame[source]
returns coordinates of stations as
pandas.DataFramewithlongandlatas columns.- Parameters:
stations – name/names of stations. If not given, coordinates of all stations will be returned. For names of stations, see
stations().- Returns:
pandas.DataFramewithlongandlatcolumns. The length of dataframe will be equal to number of stations wholse coordinates are to be fetched.- Return type:
pd.DataFrame
Examples
>>> from aqua_fetch import RainfallRunoff >>> dataset = RainfallRunoff('CAMELS_CH') >>> dataset.stn_coords() # returns coordinates of all stations >>> dataset.stn_coords('2004') # returns coordinates of station whose id is 2004 >>> dataset.stn_coords(['2004', '6004']) # returns coordinates of two stations
>>> from aqua_fetch import RainfallRunoff >>> dataset = RainfallRunoff('CAMELS_AUS') >>> dataset.stn_coords() # returns coordinates of all stations >>> dataset.stn_coords('912101A') # returns coordinates of station whose id is 912101A >>> dataset.stn_coords(['G0050115', '912101A']) # returns coordinates of two stations
Low Level API
The low level API provides access to each individual dataset classes. This provides more control over the datasets.
- class aqua_fetch.rr._RainfallRunoff(path: str = None, timestep: str = 'D', to_netcdf: bool = True, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]
Bases:
DatasetsThis is the parent class for invidual rainfall-runoff datasets like CAMELS-GB etc. This class is not meant to be for direct use. It is inherited by the child classes which are specific to a dataset like CAMELS-GB, CAMELS-AUS etc. This class first downloads the dataset if it is not already downloaded. Then the selected features for a selected catchment/station are fetched and provided to the user using the method fetch.
- - path str/path
- Type:
diretory of the dataset
- - dynamic_features list
this dataset
- Type:
tells which dynamic features are available in
- - static_features list
- Type:
a list of static features.
- - static_attribute_categories list
are present in this category.
- Type:
tells which kinds of static features
- - stations : returns name/id of stations for which the data (dynamic features)
exists as list of strings.
- - fetch : fetches all features (both static and dynamic type) of all
station/gauge_ids or a speficified station. It can also be used to fetch all features of a number of stations ids either by providing their guage_id or by just saying that we need data of 20 stations which will then be chosen randomly.
- - fetch_dynamic_features :
fetches speficied dynamic features of one specified station. If the dynamic attribute is not specified, all dynamic features will be fetched for the specified station. If station is not specified, the specified dynamic features will be fetched for all stations.
- - fetch_static_features :
works same as fetch_dynamic_features but for static features. Here if the category is not specified then static features of the specified station for all categories are returned.
stations : returns list of stations
- __init__(path: str = None, timestep: str = 'D', to_netcdf: bool = True, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str)
verbosity (int) – 0: no message will be printed
kwargs (dict) – Any other keyword arguments for the Datasets class
- area(stations: str | List[str] = 'all') Series[source]
Returns area (Km2) of all/selected catchments as
pandas.Series- Parameters:
stations (str/list (default=None)) – name/names of stations. Default is
all, which will return area of all stations- Returns:
a
pandas.Serieswhose indices are catchment ids and values are areas of corresponding catchments.- Return type:
pd.Series
Examples
>>> from aqua_fetch import CAMELS_CH >>> dataset = CAMELS_CH() >>> dataset.area() # returns area of all stations >>> dataset.area('2004') # returns area of station whose id is 2004 >>> dataset.area(['2004', '6004']) # returns area of two stations
- property boundary_id_map: str
Name of the attribute in the boundary (shapefile/.gpkg) file that will be used to map the catchment/station id to the geometry of the catchment/station. This is used to create the boundary id map. if not given, then the first attribute in the boundary file will be used.
- property camels_dir
Directory where all camels datasets will be saved. This will under datasets directory
- property dyn_map: Dict[str, str]
A dictionary that maps dynamic features to their names in the dataset.
- property dynamic_features: List[str]
Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_dynamic_features().- Return type:
List[str]
- fetch(stations: str | list | int | float = 'all', dynamic_features: List[str] | str | None = 'all', static_features: str | List[str] | None = None, st: None | str = None, en: None | str = None, as_dataframe: bool = False, **kwargs) Tuple[DataFrame, DataFrame | Dataset][source]
Fetches the features of one or more stations.
- Parameters:
stations –
- It can have following values:
int : number of (randomly selected) stations to fetch
float : fraction of (randomly selected) stations to fetch
- strname/id of station to fetch. However, if
allis provided, then all stations will be fetched.
- strname/id of station to fetch. However, if
list : list of names/ids of stations to fetch
dynamic_features (If not None, then it is the features to be) – fetched. If None, then all available features are fetched
static_features (list of static features to be fetches. None) – means no static attribute will be fetched.
st (starting date of data to be returned. If None, the data will be) – returned from where it is available.
en (end date of data to be returned. If None, then the data will be) – returned till the date data is available.
as_dataframe (whether to return dynamic features as
pandas.DataFrame) – or asxarray.Dataset.kwargs (keyword arguments to read the files)
- Returns:
A tuple of static and dynamic features. Static features are always returned as pandas DataFrame with shape (stations, staticfeatures). The index of static features is the station/gauge ids while the columns are the static features. Dynamic features are returned as either xarray Dataset or pandas DataFrame depending upon whether as_dataframe is True or False and whether the xarray module is installed or not. If dynamic features are xarray Dataset, then it consists of data_vars equal to the number of stations and time adn dynamic_features as dimensions. If dynamic features are returned as pandas DataFrame, then the first index is time and the second index is dynamic_features.
- Return type:
Examples
>>> from aqua_fetch import CAMELS_AUS >>> dataset = CAMELS_AUS() >>> # get data of 10% of stations >>> df = dataset.fetch(stations=0.1, as_dataframe=True) # returns a multiindex dataframe ... # fetch data of 5 (randomly selected) stations >>> five_random_stn_data = dataset.fetch(stations=5, as_dataframe=True) ... # fetch data of 3 selected stations >>> three_selec_stn_data = dataset.fetch(stations=['912101A','912105A','915011A'], as_dataframe=True) ... # fetch data of a single stations >>> single_stn_data = dataset.fetch(stations='318076', as_dataframe=True) ... # get both static and dynamic features as dictionary >>> static, dynamic = dataset.fetch(1, static_features="all", as_dataframe=True) # -> dict >>> dynamic ... # get only selected dynamic features >>> sel_dyn_features = dataset.fetch(stations='318076', ... dynamic_features=['q_mmd_obs', 'solrad_wm2_silo'], as_dataframe=True) ... # fetch data between selected periods >>> data = dataset.fetch(stations='318076', st="20010101", en="20101231", as_dataframe=True)
- fetch_dynamic_features(station: str, dynamic_features='all', st=None, en=None, as_dataframe=False) DataFrame | Dataset[source]
Fetches all or selected dynamic features of one station.
- Parameters:
station (str) – name/id of station of which to extract the data
features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available dynamic features are returned.
st (Optional (default=None)) – start time from where to fetch the data.
en (Optional (default=None)) – end time untill where to fetch the data
as_dataframe (bool, optional (default=False)) – if true, the returned data is pandas DataFrame otherwise it is
xarray.Dataset
- Returns:
a pandas dataframe or xarray dataset of dynamic features If as_dataframe is True, then the returned data is a pandas DataFrame with multiindex. The first index is time and the second index is dynamic_features. If as_dataframe is False, and xarray module is installed, then the returned data is xarray dataset with data_vars equal to the number of stations and time and dynamic_features as dimensions.
- Return type:
pd.DataFrame/xr.Dataset
Examples
>>> from aqua_fetch import CAMELS_AUS >>> camels = CAMELS_AUS() >>> camels.fetch_dynamic_features('912101A', as_dataframe=True).unstack() >>> camels.dynamic_features >>> camels.fetch_dynamic_features('912101A', ... features=['airtemp_C_awap_max', 'vp_hpa_awap', 'q_cms_obs'], ... as_dataframe=True).unstack()
- fetch_static_features(stations: str | list = 'all', static_features: str | list = 'all') DataFrame[source]
Fetches all or selected static features of one or more stations.
- Parameters:
stations (str/list) – name/id of station of which to extract the data
static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.
- Returns:
- Return type:
pd.DataFrame
Examples
>>> from aqua_fetch import CAMELS_AUS >>> camels = CAMELS_AUS() >>> camels.fetch_static_features('912101A') >>> camels.static_features >>> camels.fetch_static_features('912101A', ... static_features=['elev_mean', 'relief', 'ksat', 'pop_mean']) for CAMELS_FR >>> from aqua_fetch import CAMELS_FR >>> dataset = CAMELS_FR() get the names of stations >>> stns = dataset.stations() >>> len(stns) 654 get all static data of all stations >>> static_data = dataset.fetch_static_features(stns) >>> static_data.shape (472, 210) get static data of one station only >>> static_data = dataset.fetch_static_features('42600042') >>> static_data.shape (1, 210) get the names of static features >>> dataset.static_features get only selected features of all stations >>> static_data = dataset.fetch_static_features(stns, ['slope_mean', 'aridity']) >>> static_data.shape (472, 2) >>> data = dataset.fetch_static_features('42600042', static_features=['slope_mean', 'aridity']) >>> data.shape (1, 2)
- fetch_station_features(station: str, dynamic_features: str | list | None = 'all', static_features: str | list | None = None, st: str | None = None, en: str | None = None, **kwargs) tuple[DataFrame, DataFrame][source]
Fetches features for one station.
- Parameters:
station – station id/gauge id for which the data is to be fetched.
dynamic_features (str/list, optional) – names of dynamic features/attributes to fetch
static_features – names of static features/attributes to be fetches
st (str,optional) – starting point from which the data to be fetched. By default, the data will be fetched from where it is available.
en (str, optional) – end point of data to be fetched. By default the dat will be fetched
- Returns:
A tuple of static and dynamic features, both as
pandas.DataFrame. The dataframe of static features will be of single row while the dynamic features will be of shape (time, dynamic features).- Return type:
Examples
>>> from aqua_fetch import CAMELS_AUS >>> dataset = CAMELS_AUS() >>> dataset.fetch_station_features('912101A')
- fetch_stations_features(stations: str | List[str], dynamic_features: str | List[str] = 'all', static_features: str | List[str] = None, st: str | Timestamp = None, en: str | Timestamp = None, as_dataframe: bool = False, **kwargs) Tuple[DataFrame, DataFrame | Dataset][source]
Reads features of more than one stations.
- Parameters:
stations – list of stations for which data is to be fetched.
dynamic_features – list of dynamic features to be fetched. if
all, then all dynamic features will be fetched.static_features (list of static features to be fetched.) – If
all, then all static features will be fetched. If None, `then no static attribute will be fetched.st – start of data to be fetched.
en – end of data to be fetched.
as_dataframe – whether to return the dynamic data as pandas dataframe. default is
xarray.Datasetobjectdict (kwargs) – additional keyword arguments
- Returns:
tuple – A tuple of static and dynamic features. Static features are always returned as
pandas.DataFramewith shape (stations, staticfeatures). The index of static features is the station/gauge ids while the columns are the static features. Dynamic features are returned as eitherxarray.Datasetorpandas.DataFramedepending upon whether as_dataframe is True or False and whether the xarray module is installed or not. If dynamic features are xarray Dataset, then it consists of data_vars equal to the number of stations and time adn dynamic_features as dimensions. If dynamic features are returned as pandas DataFrame, then the first index is time and the second index is dynamic_features.Raises – ValueError, if both dynamic_features and static_features are None
Examples
>>> from aqua_fetch import CAMELS_AUS >>> dataset = CAMELS_AUS() ... # find out station ids >>> dataset.stations() ... # get data of selected stations as xarray Dataset >>> dataset.fetch_stations_features(['912101A', '912105A', '915011A']) ... # get data of selected stations as pandas DataFrame >>> dataset.fetch_stations_features(['912101A', '912105A', '915011A'], ... as_dataframe=True) ... # get both dynamic and static features of selected stations >>> dataset.fetch_stations_features(['912101A', '912105A', '915011A'], ... dynamic_features=['q_mmd_obs', 'airtemp_C_mean_silo'], static_features=['elev_mean'])
- get_boundary(catchment_id: str)[source]
returns boundary of a catchment in a required format
- Parameters:
catchment_id (str) – name/id of catchment
- Returns:
geometry
- Return type:
fiona.Geometry
Examples
>>> from aqua_fetch import CAMELS_SE >>> dataset = CAMELS_SE() >>> dataset.get_boundary(dataset.stations()[0])
- static mean_temp(tmin: Series, tmax: Series) Series[source]
calculates mean temperature from tmin and tmax
- plot_catchment(catchment_id: str, show_outlet: bool = False, ax: Axes = None, show: bool = True, **kwargs)[source]
plots catchment boundaries
- Parameters:
- Return type:
plt.Axes
Examples
>>> from aqua_fetch import CAMELS_AUS >>> dataset = CAMELS_AUS() >>> dataset.plot_catchment('912101A') >>> dataset.plot_catchment('912101A', marker='o', ms=0.3) >>> ax = dataset.plot_catchment('912101A', marker='o', ms=0.3, show=False) >>> ax.set_title("Catchment Boundary") >>> plt.show() # show the outlet as well >>> CAMELS_AUS.plot_catchment('912101A', show_outlet=True)
- plot_stations(stations: List[str] = 'all', marker='.', color: str = None, ax: Axes = None, show: bool = True, **kwargs) Axes[source]
plots coordinates of stations
- Parameters:
- Return type:
plt.Axes
Examples
>>> from aqua_fetch import CAMELS_AUS >>> dataset = CAMELS_AUS() >>> dataset.plot_stations() >>> dataset.plot_stations(['1', '2', '3']) >>> dataset.plot_stations(marker='o', ms=0.3) >>> ax = dataset.plot_stations(marker='o', ms=0.3, show=False) >>> ax.set_title("Stations") >>> plt.show() using area as color >>> ds.plot_stations(color='area_km2')
- q_mmd(stations: str | List[str] = 'all') DataFrame[source]
returns streamflow in the units of milimeter per day. This is obtained by diving
q/area- Parameters:
stations (str/list) – name/names of stations. Default is
all, which will return area of all stations- Returns:
a
pandas.DataFramewhose indices are time-steps and columns are catchment/station ids.- Return type:
pd.DataFrame
- property static_factors: Dict[str, str]
A dictionary that maps static features to the factors with they needs to be multiplied to get the actual value
- property static_features: List[str]
Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_static_features().- Return type:
List[str]
- property static_map: Dict[str, str]
A dictionary that maps static features to their names in the dataset.
- stations() List[str][source]
Names/ids of stations/catchment/gauges or whatever that would be used to index each station in the dataset. Since this is a method, it is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again The user is recommended to implement this method in the child class in a more efficient way.
- stn_coords(stations: str | List[str] = 'all') DataFrame[source]
returns coordinates of stations as DataFrame with
longandlatas columns.- Parameters:
stations – name/names of stations. If not given, coordinates of all stations will be returned.
- Returns:
pandas.DataFramewithlongandlatcolumns. The length of dataframe will be equal to number of stations wholse coordinates are to be fetched.- Return type:
pd.DataFrame
Examples
>>> from aqua_fetch import CAMELS_CH >>> dataset = CAMELS_CH() >>> dataset.stn_coords() # returns coordinates of all stations >>> dataset.stn_coords('2004') # returns coordinates of station whose id is 2004 >>> dataset.stn_coords(['2004', '6004']) # returns coordinates of two stations
>>> from aqua_fetch import CAMELS_AUS >>> dataset = CAMELS_AUS() >>> dataset.stn_coords() # returns coordinates of all stations >>> dataset.stn_coords('912101A') # returns coordinates of station whose id is 912101A >>> dataset.stn_coords(['G0050115', '912101A']) # returns coordinates of two stations
- class aqua_fetch.rr._gsha._GSHA(gsha_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
Bases:
_RainfallRunoffParent class for those datasets which uses static and dynamic features from GSHA dataset . The following dataset classes are based on this class:
py:class:aqua_fetch.Japan
py:class:aqua_fetch.Thailand
py:class:aqua_fetch.Spain
- __init__(gsha_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str)
verbosity (int) – 0: no message will be printed
kwargs (dict) – Any other keyword arguments for the Datasets class
- property dynamic_features: List[str]
Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_dynamic_features().- Return type:
List[str]
- fetch_static_features(stations: str | List[str] = 'all', static_features: str | List[str] = 'all', st=None, en=None) DataFrame[source]
returns static atttributes of one or multiple stations
- Parameters:
stations (str) – name/id of station of which to extract the data
static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.
st
en
Examples
>>> from aqua_fetch import Japan >>> dataset = Japan() get the names of stations >>> stns = dataset.stations() >>> len(stns) 12004 get all static data of all stations >>> static_data = dataset.fetch_static_features(stns) >>> static_data.shape (12004, 27) get static data of one station only >>> static_data = dataset.fetch_static_features('01010070') >>> static_data.shape (1, 27) get the names of static features >>> dataset.static_features get only selected features of all stations >>> static_data = dataset.fetch_static_features(stns, ['Drainage_Area_km2', 'Elevation_m']) >>> static_data.shape (12004, 2)
- fetch_stations_features(stations: list, dynamic_features: str | list | None = 'all', static_features: str | list | None = None, st=None, en=None, as_dataframe: bool = False, **kwargs) Tuple[DataFrame, DataFrame | Dataset][source]
returns features of multiple stations
Examples
>>> from aqua_fetch import Arcticnet >>> dataset = Arcticnet() >>> stations = dataset.stations() >>> features = dataset.fetch_stations_features(stations)
- Returns:
A tuple of static and dynamic features. Static features are always returned as
pandas.DataFramewith shape (stations, staticfeatures). The index of static features is the station/gauge ids while the columns are the static features. Dynamic features are returned as either xarray Dataset orpandas.DataFramedepending upon whether as_dataframe is True or False and whether the xarray module is installed or not. If dynamic features are xarray Dataset, then it consists of data_vars equal to the number of stations and time adn dynamic_features as dimensions. If dynamic features are returned as pandas DataFrame, then the first index is time and the second index is dynamic_features.- Return type:
- property static_features: List[str]
Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_static_features().- Return type:
List[str]
- stations() List[str][source]
Names/ids of stations/catchment/gauges or whatever that would be used to index each station in the dataset. Since this is a method, it is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again The user is recommended to implement this method in the child class in a more efficient way.
- class aqua_fetch.Arcticnet(path: str | PathLike = None, gsha_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
Bases:
_GSHAData of 106 catchments of arctic region from r-arcticnet project . The meteorological data static catchment features and catchment boundaries taken from GSHA project. Therefore, the number of static features are 35 and dynamic features are 27 and the data is available from 1979-01-01 to 2003-12-31 although the observed streamflow (q_cms_obs) for some stations is available as earlier as from 1913-01-01.
- __init__(path: str | PathLike = None, gsha_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str)
verbosity (int) – 0: no message will be printed
kwargs (dict) – Any other keyword arguments for the Datasets class
- property static_map: Dict[str, str]
A dictionary that maps static features to their names in the dataset.
- stations() List[str][source]
Names/ids of stations/catchment/gauges or whatever that would be used to index each station in the dataset. Since this is a method, it is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again The user is recommended to implement this method in the child class in a more efficient way.
- class aqua_fetch.Bull(path, overwrite=False, **kwargs)[source]
Bases:
_RainfallRunoffFollowing the works of Aparicio et al., 2024. The data is taken from the Zenodo repository. This dataset contains 484 stations with 55 dynamic (time series) features and 214 static features. The dynamic features span from 1951 to 2021.
Examples
>>> from aqua_fetch import Bull >>> dataset = Bull() >>> _, data = dataset.fetch(0.1, as_dataframe=True) >>> data.shape (1426260, 48) # 40 represents number of stations Since data is a multi-index dataframe, we can get data of one station as below >>> data['BULL_9007'].unstack().shape # the name of station could be different (25932, 13) If we don't set as_dataframe=True, then the returned data will be a xarray Dataset >>> data = dataset.fetch(0.1) >>> type(data) xarray.core.dataset.Dataset >>> data.dims FrozenMappingWarningOnValuesAccess({'time': 25932, 'dynamic_features': 55}) >>> len(data.data_vars) 48 >>> _, df = dataset.fetch(stations=1, as_dataframe=True) # get data of only one random station >>> df = df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it >>> df.shape (25932, 55) # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 484 # get data by station id >>> df = dataset.fetch(stations='BULL_9007', as_dataframe=True) >>> df.unstack().shape (25932, 55) # get names of available dynamic features >>> dataset.dynamic_features # get only selected dynamic features >>> _, df = dataset.fetch(1, as_dataframe=True, ... dynamic_features=['pet_mm_AEMET', 'airtemp_C_mean_AEMET', 'pcp_mm_ERA5Land', 'q_obs_cms']) >>> df.unstack().shape (25932, 4) # get names of available static features >>> dataset.static_features # get data of 10 random stations >>> _, df = dataset.fetch(10, as_dataframe=True) >>> df.shape (166166, 10) # remember this is multi-indexed DataFrame # If we get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='BULL_9007', static_features="all", as_dataframe=True) >>> static.shape, dynamic.unstack().shape ((1, 214), (25932, 55)) >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape (484, 2) >>> dataset.stn_coords('BULL_9007') # returns coordinates of station whose id is GRDC_3664802 41.298 -1.967 >>> dataset.stn_coords(['BULL_9007', 'BULL_8083']) # returns coordinates of two stations
- __init__(path, overwrite=False, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str)
verbosity (int) – 0: no message will be printed
kwargs (dict) – Any other keyword arguments for the Datasets class
- property dyn_map
A dictionary that maps dynamic features to their names in the dataset.
- property dynamic_features: List[str]
Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_dynamic_features().- Return type:
List[str]
- property end
end of data
- property static_features: List[str]
Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_static_features().- Return type:
List[str]
- property static_map: Dict[str, str]
A dictionary that maps static features to their names in the dataset.
- stations() List[str][source]
Names/ids of stations/catchment/gauges or whatever that would be used to index each station in the dataset. Since this is a method, it is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again The user is recommended to implement this method in the child class in a more efficient way.
- class aqua_fetch.rr.CABra(path=None, overwrite=False, to_netcdf: bool = True, met_src: str = 'ens', **kwargs)[source]
Bases:
_RainfallRunoffReads and fetches CABra dataset which is catchment attribute dataset following the work of Almagro et al., 2021 This dataset consists of 87 static and 13 dynamic features of 735 Brazilian catchments. The temporal extent is from 1980 to 2020. The dyanmic features consist of daily hydro-meteorological time series
Examples
>>> from aqua_fetch import CABra >>> dataset = CABra() >>> _, data = dataset.fetch(0.1, as_dataframe=True) >>> data.shape (131472, 73) # 73 represents number of stations >>> data.index.names == ['time', 'dynamic_features'] True >>> _, df = dataset.fetch(stations=1, as_dataframe=True) >>> df = df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it >>> df.shape (10956, 13) # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 735 # get data by station id >>> _, df = dataset.fetch(stations='92', as_dataframe=True) >>> df.unstack().shape (10956, 13) # get names of available dynamic features >>> dataset.dynamic_features # get only selected dynamic features >>> _, df = dataset.fetch(1, as_dataframe=True, ... dynamic_features=['pcp_mm_ens', 'airtemp_C_ens_max', 'pet_mm_pm', 'rh_%_ens', 'q_cms_obs']) >>> df.unstack().shape (10956, 5) # get names of available static features >>> dataset.static_features # get data of 10 random stations >>> _, df = dataset.fetch(10, as_dataframe=True) >>> df.shape (131472, 10) # remember this is multi-indexed DataFrame # If we want to get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='92', static_features="all", as_dataframe=True) >>> static.shape, dynamic.unstack().shape ((1, 87), (10956, 13))
- __init__(path=None, overwrite=False, to_netcdf: bool = True, met_src: str = 'ens', **kwargs)[source]
- Parameters:
path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless
overwriteis set to True.overwrite (bool) – If the data is already down then you can set it to True, to make a fresh download.
to_netcdf (bool) – whether to convert all the data into one netcdf file or not. This will fasten repeated calls to fetch etc but will require netcdf5 package as well as xarry.
met_src (str) – source of meteorological data, must be one of
ens,era5orref.
- property boundary_id_map: str
Name of the attribute in the boundary (.shp/.gpkg) file that will be used to map the catchment/station id to the geometry of the catchment/station. This is used to create the boundary id map.
- property dyn_map
A dictionary that maps dynamic features to their names in the dataset.
- property dynamic_features: List[str]
Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_dynamic_features().- Return type:
List[str]
- property static_map: Dict[str, str]
A dictionary that maps static features to their names in the dataset.
- stations() List[str][source]
Names/ids of stations/catchment/gauges or whatever that would be used to index each station in the dataset. Since this is a method, it is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again The user is recommended to implement this method in the child class in a more efficient way.
- class aqua_fetch.rr.CAMELS_AUS(path: str = None, version: int = 2, to_netcdf: bool = True, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]
Bases:
_RainfallRunoffThis is a dataset of 561 Australian catchments with 187 static features and 28 dyanmic features for each catchment. The dyanmic features are timeseries from 1950-01-01 to 2022-03-31. This class Reads CAMELS-AUS dataset of Fowler et al., 2024 .
If
versionis 1 then this class reads data following Fowler et al., 2021 which is a dataset of 222 Australian catchments with 161 static features and 26 dyanmic features for each catchment. The dyanmic features are timeseries from 1957-01-01 to 2018-12-31.Examples
>>> from aqua_fetch import CAMELS_AUS >>> dataset = CAMELS_AUS() >>> _, df = dataset.fetch(stations=1, as_dataframe=True) >>> df = df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it >>> df.shape # if you are using version 1 then the shape will be (21184, 28) (26388, 28) ... # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 222 ... # get data of 10 % of stations as dataframe >>> _, df = dataset.fetch(0.1, as_dataframe=True) >>> df.shape (550784, 22) ... # The returned dataframe is a multi-indexed data >>> df.index.names == ['time', 'dynamic_features'] True ... # get data by station id >>> df = dataset.fetch(stations='912101A', as_dataframe=True)[1].unstack() >>> df.shape # if you are using version 1 then the shape will be (21184, 28) (26388, 28) ... # get names of available dynamic features >>> dataset.dynamic_features ... # get only selected dynamic features >>> _, data = dataset.fetch(1, as_dataframe=True, ... dynamic_features=['airtemp_C_awap_max', 'pcp_mm_awap', 'et_morton_actual_SILO', 'q_cms_obs']) >>> data.unstack().shape # if you are using version 1 then the shape will be (21184, 4) (26388, 4) ... # get names of available static features >>> dataset.static_features ... # get data of 10 random stations >>> _, df = dataset.fetch(10, as_dataframe=True) >>> df.shape # remember this is a multiindexed dataframe (26388, 260) # If we get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='912101A', static_features="all", as_dataframe=True) >>> static.shape, dynamic.unstack()shape >>> ((1, 166), (26388, 28))
- __init__(path: str = None, version: int = 2, to_netcdf: bool = True, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]
- Parameters:
path – path where the CAMELS_AUS dataset has been downloaded. This path must contain five zip files and one xlsx file. If None, then the data will be downloaded.
version – version of the dataset to download. Allowed values are 1 and 2.
to_netcdf
- property dyn_map
A dictionary that maps dynamic features to their names in the dataset.
- property dynamic_features: list
Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_dynamic_features().- Return type:
List[str]
- property end
end of data
- property static_features: List[str]
Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_static_features().- Return type:
List[str]
- property static_map: Dict[str, str]
A dictionary that maps static features to their names in the dataset.
- stations(as_list=True) list[source]
Names/ids of stations/catchment/gauges or whatever that would be used to index each station in the dataset. Since this is a method, it is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again The user is recommended to implement this method in the child class in a more efficient way.
- class aqua_fetch.rr.CAMELS_BR(path=None, verbosity: int = 1, **kwargs)[source]
Bases:
_RainfallRunoffThis is a dataset of 897 Brazilian catchments with 67 static features and 10 dyanmic features for each catchment. The dyanmic features are timeseries from 1920-01-01 to 2019-02-28. This class downloads and processes CAMELS dataset of Brazil as provided by VP Changas et al., 2020 . The simulated streamflow of 593 and raw streamflow of 3679 stations shipped with this data is not included in dynamic features. Both can be fetched through fetch_simulated_streamflow and fetch_raw_streamflow methods.
Examples
>>> from aqua_fetch import CAMELS_BR >>> dataset = CAMELS_BR() >>> _, df = dataset.fetch(stations=1, as_dataframe=True) >>> df = df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it >>> df.shape (14245, 12) # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 593 # we can get data of 10% catchments as below >>> _, data = dataset.fetch(0.1, as_dataframe=True) >>> data.shape (170940, 59) # the data is multi-index with ``time`` and ``dynamic_features`` as indices >>> data.index.names == ['time', 'dynamic_features'] True # get data by station id >>> _, df = dataset.fetch(stations='46035000', as_dataframe=True) >>> df.unstack().shape (14245, 12) # get names of available dynamic features >>> dataset.dynamic_features # get only selected dynamic features >>> _, df = dataset.fetch(1, as_dataframe=True, ... dynamic_features=['pcp_mm_cpc', 'aet_mm_mgb', 'airtemp_C_mean', 'q_cms_obs']) >>> df.unstack().shape (14245, 4) # get names of available static features >>> dataset.static_features # get data of 10 random stations >>> _, df = dataset.fetch(10, as_dataframe=True) >>> df.shape (170940, 10) # remember this is multi-indexed DataFrame # If we want to get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='46035000', static_features="all", as_dataframe=True) >>> static.shape, dynamic.unstack().shape ((1, 67), (14245, 12))
- __init__(path=None, verbosity: int = 1, **kwargs)[source]
- Parameters:
path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless
overwriteis set to True.
- all_stations(feature: str) List[str][source]
Tells all station ids for which a data of a specific attribute is available.
- area(stations: str | List[str] = 'all', source: str = 'gsim') Series[source]
Returns area (Km2) of all catchments as
pandas.Series- Parameters:
stations (str/list) – name/names of stations. Default is None, which will return area of all stations
source (str) – source of area calculation. It should be either
gsimorana
- Returns:
a
pandas.Serieswhose indices are catchment ids and values are areas of corresponding catchments.- Return type:
pd.Series
Examples
>>> from aqua_fetch import CAMELS_BR >>> dataset = CAMELS_BR() >>> dataset.area() # returns area of all stations >>> dataset.stn_coords('65100000') # returns area of station whose id is 912101A >>> dataset.stn_coords(['65100000', '64075000']) # returns area of two stations
- property boundary_id_map: str
Name of the attribute in the boundary (.shp/.gpkg) file that will be used to map the catchment/station id to the geometry of the catchment/station. This is used to create the boundary id map.
- property dyn_map
A dictionary that maps dynamic features to their names in the dataset.
- property dynamic_features: List[str]
Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_dynamic_features().- Return type:
List[str]
- property end
end of data
- fetch_raw_streamflow(stations: str = None) DataFrame[source]
returns raw streamflow data for one or more stations.
Example
>>> dataset = CAMELS_BR() >>> data = dataset.fetch_raw_streamflow('10500000') ... # fetch all time series data associated with a station. >>> x = dataset.fetch_raw_streamflow(dataset.all_stations())
- fetch_simulated_streamflow(stations: str = None) DataFrame[source]
returns raw streamflow data for one or more stations.
Example
>>> dataset = CAMELS_BR() >>> data = dataset.fetch_simulated_streamflow('10500000') ... # fetch all time series data associated with a station. >>> x = dataset.fetch_simulated_streamflow(dataset.all_stations())
- q_mmd(stations: str | List[str] = 'all') DataFrame[source]
returns streamflow in the units of milimeter per day. he name of original timeseries is
streamflow_mm.- Parameters:
stations (str/list) – name/names of stations. Default is None, which will return area of all stations
- Returns:
a
pandas.DataFramewhose indices are time-steps and columns are catchment/station ids.- Return type:
pd.DataFrame
- property static_features
Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_static_features().- Return type:
List[str]
- property static_map: Dict[str, str]
A dictionary that maps static features to their names in the dataset.
- stations() List[str][source]
Returns a list of station ids.
Example
>>> dataset = CAMELS_BR() >>> stations = dataset.stations()
- stn_coords(stations: str | List[str] = 'all') DataFrame[source]
returns coordinates of stations as
pandas.DataFramewithlongandlatas columns.- Parameters:
stations – name/names of stations. If not given, coordinates of all stations will be returned.
- Returns:
pandas.DataFramewithlongandlatcolumns. The length of dataframe will be equal to number of stations wholse coordinates are to be fetched.- Return type:
pd.DataFrame
Examples
>>> dataset = CAMELS_BR() >>> dataset.stn_coords() # returns coordinates of all stations >>> dataset.stn_coords('65100000') # returns coordinates of station whose id is 912101A >>> dataset.stn_coords(['65100000', '64075000']) # returns coordinates of two stations
- class aqua_fetch.rr.CAMELS_CH(path=None, overwrite: bool = False, to_netcdf: bool = True, timestep: str = 'D', **kwargs)[source]
Bases:
_RainfallRunoffData of 331 Swiss catchments from Hoege et al., 2023 . The dataset consists of 209 static catchment features and 9 dynamic features. The dynamic features span from 19810101 to 20201231 with daily timestep. For daily (
D)timestep, only streamflow is available for 170 swiss catchments. The hourly (H) streamflow data is obtained from Kauzlaric et al., 2023 .Examples
>>> from aqua_fetch import CAMELS_CH >>> dataset = CAMELS_CH() >>> _, data = dataset.fetch(0.1, as_dataframe=True) >>> data.shape (128560, 10) >>> data.index.names == ['time', 'dynamic_features'] True >>> _, df = dataset.fetch(stations=1, as_dataframe=True) >>> df = df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it >>> df.shape (8036, 9) # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 331 # get data by station id >>> _, df = dataset.fetch(stations='2004', as_dataframe=True) >>> df.unstack().shape (8036, 9) # get names of available dynamic features >>> dataset.dynamic_features # get only selected dynamic features >>> _, df = dataset.fetch(1, as_dataframe=True, dynamic_features=['pcp_mm', 'airtemp_C_mean', 'q_cms_obs']) >>> df.unstack().shape (8036, 3) # get names of available static features >>> dataset.static_features # get data of 10 random stations >>> _, df = dataset.fetch(10, as_dataframe=True) >>> df.shape (72324, 10) # remember this is multi-indexed DataFrame # If we get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='2004', static_features="all", as_dataframe=True) >>> static.shape, dynamic.unstack().shape ((1, 209), (8036, 9))
- __init__(path=None, overwrite: bool = False, to_netcdf: bool = True, timestep: str = 'D', **kwargs)[source]
- Parameters:
path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless
overwriteis set to True.overwrite (bool) – If the data is already down then you can set it to True, to make a fresh download.
to_netcdf (bool) – whether to convert all the data into one netcdf file or not. This will fasten repeated calls to fetch etc. but will require netcdf5 package as well as xarry.
- property dyn_map
A dictionary that maps dynamic features to their names in the dataset.
- property dynamic_features: List[str]
Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_dynamic_features().- Return type:
List[str]
- property end
end of data
- glacier_attrs() DataFrame[source]
- returns a dataframe with four columns
‘glac_area’
‘glac_vol’
‘glac_mass’
‘glac_area_neighbours’
- hourly_stations() List[str][source]
IDs of those stations which have hourly data and which are also part of CAMELS-CH dataset
- property static_features
Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_static_features().- Return type:
List[str]
- class aqua_fetch.rr.CAMELS_CL(path: str = None, **kwargs)[source]
Bases:
_RainfallRunoffThis is a dataset of 516 Chilean catchments with 104 static features and 12 dyanmic features for each catchment. The dyanmic features are timeseries from 1913-02-15 to 2018-03-09. This class downloads and processes CAMELS dataset of Chile following the work of Alvarez-Garreton et al., 2018 .
Examples
>>> from aqua_fetch import CAMELS_CL >>> dataset = CAMELS_CL() >>> _, df = dataset.fetch(stations=1, as_dataframe=True) >>> df = df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it >>> df.shape (38374, 12) # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 516 # we can get data of 10% catchments as below >>> _, data = dataset.fetch(0.1, as_dataframe=True) >>> data.shape (460488, 51) # the data is multi-index with ``time`` and ``dynamic_features`` as indices >>> df.index.names == ['time', 'dynamic_features'] True # get data by station id >>> _, df = dataset.fetch(stations='8350001', as_dataframe=True) >>> df.unstack().shape (38374, 12) # get names of available dynamic features >>> dataset.dynamic_features # get only selected dynamic features >>> _, df = dataset.fetch(1, as_dataframe=True, ... dynamic_features=['pet_mm_hargreaves', 'pcp_mm_mswep', 'airtemp_C_mean', 'q_cms_obs']) >>> df.unstack().shape (38374, 4) # get names of available static features >>> dataset.static_features # get data of 10 random stations >>> _, df = dataset.fetch(10, as_dataframe=True) >>> df.shape (460488, 10) # If we get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='8350001', static_features="all", as_dataframe=True) >>> static.shape, dynamic.unstack().shape >>> ((1, 104), (38374, 12))
- __init__(path: str = None, **kwargs)[source]
- Parameters:
path – path where the CAMELS-CL dataset has been downloaded. This path must contain five zip files and one xlsx file.
- property dyn_map
A dictionary that maps dynamic features to their names in the dataset.
- property dynamic_features: List[str]
Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_dynamic_features().- Return type:
List[str]
- property end
end of data
- property static_features: List[str]
Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_static_features().- Return type:
List[str]
- property static_map: Dict[str, str]
A dictionary that maps static features to their names in the dataset.
- stations() list[source]
Tells all station ids for which a data of a specific attribute is available.
- stn_coords(stations: str | List[str] = 'all') DataFrame[source]
returns coordinates of stations as DataFrame with
longandlatas columns.- Parameters:
stations – name/names of stations. If not given, coordinates of all stations will be returned.
- Returns:
pandas.DataFramewithlongandlatcolumns. The length of dataframe will be equal to number of stations wholse coordinates are to be fetched.- Return type:
pd.DataFrame
Examples
>>> dataset = CAMELS_CL() >>> dataset.stn_coords() # returns coordinates of all stations >>> dataset.stn_coords('12872001') # returns coordinates of station whose id is 912101A >>> dataset.stn_coords(['12872001', '12876004']) # returns coordinates of two stations
- class aqua_fetch.rr.CAMELS_COL(path=None, overwrite=False, to_netcdf: bool = True, **kwargs)[source]
Bases:
_RainfallRunoffDataset of 347 catchments from Colombia following the works of Jimenez et al., 2025. The dataset consists of 255 static catchment features and 6 dynamic features. The dynamic features span from 19810101 to 20221231 with daily timestep. The data is downloaded from Zenodo.
Examples
>>> from aqua_fetch import CAMELS_COL >>> dataset = CAMELS_COL() >>> _, data = dataset.fetch(0.1, as_dataframe=True) >>> data.shape (92040, 34) # 34 represents number of stations Since data is a multi-index dataframe, we can get data of one station as below >>> data['35067040'].unstack().shape (15340, 5) If we don't set as_dataframe=True, then the returned data will be a xarray Dataset >>> _, data = dataset.fetch(0.1) >>> type(data) xarray.core.dataset.Dataset >>> data.dims FrozenMappingWarningOnValuesAccess({'time': 15340, 'dynamic_features': 6}) >>> len(data.data_vars) 34 >>> _, df = dataset.fetch(stations=1, as_dataframe=True) # get data of only one random station >>> df = df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it >>> df.shape (15340, 6) # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 347 # get data by station id >>> _, df = dataset.fetch(stations='35067040', as_dataframe=True) >>> df.unstack().shape (15340, 6) # get names of available dynamic features >>> dataset.dynamic_features # get only selected dynamic features >>> _, df = dataset.fetch(1, as_dataframe=True, ... dynamic_features=['pcp_mm', 'airtemp_C_mean', 'pet_mm', 'q_cms_obs']) >>> df.unstack().shape (15340, 6) # get names of available static features >>> dataset.static_features # get data of 10 random stations >>> _, df = dataset.fetch(10, as_dataframe=True) >>> df.shape (2304640, 10) # remember this is multi-indexed DataFrame # If we want to get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='35067040', static_features="all", as_dataframe=True) >>> static.shape, dynamic.unstack().shape ((1, 255), (15340, 6)) >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape (347, 2) >>> dataset.stn_coords('35067040') # returns coordinates of station whose id is 35067040 4.746433 -73.587807 >>> dataset.stn_coords(['35067040', '21187030']) # returns coordinates of two stations 35067040 4.746433 -73.587807 21187030 4.203826 -75.092720
- __init__(path=None, overwrite=False, to_netcdf: bool = True, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str)
verbosity (int) – 0: no message will be printed
kwargs (dict) – Any other keyword arguments for the Datasets class
- property dynamic_features: List[str]
Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_dynamic_features().- Return type:
List[str]
- property static_map: Dict[str, str]
A dictionary that maps static features to their names in the dataset.
- stations() List[str][source]
Names/ids of stations/catchment/gauges or whatever that would be used to index each station in the dataset. Since this is a method, it is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again The user is recommended to implement this method in the child class in a more efficient way.
- class aqua_fetch.rr.CAMELS_DE(path=None, overwrite: bool = False, to_netcdf: bool = True, verbosity: int = 1, **kwargs)[source]
Bases:
_RainfallRunoffThis is the data from 1555 German catchments following the work of Loritz et al., 2024 . The data is downloaded from zenodo . This data consists of 155 static and 21 dynamic features. The dynamic features span from 1951-01-01 to 2020-12-31 with daily timestep.
Examples
>>> from aqua_fetch import CAMELS_DE >>> dataset = CAMELS_DE() >>> _, df = dataset.fetch(stations=1, as_dataframe=True) >>> df = df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it >>> df.shape (25568, 21) get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 1555 get data of 10 % of stations as dataframe >>> _, df = dataset.fetch(0.1, as_dataframe=True) >>> df.shape (536928, 155) The returned dataframe is a multi-indexed data >>> df.index.names == ['time', 'dynamic_features'] True get data by station id >>> _, df = dataset.fetch(stations='DE110260', as_dataframe=True) >>> df.unstack().shape (25568, 21) get names of available dynamic features >>> dataset.dynamic_features get only selected dynamic features >>> _, data = dataset.fetch(1, as_dataframe=True, ... dynamic_features=['airtemp_C_mean', 'rh_%', 'pcp_mm_mean', 'q_cms_obs']) >>> data.unstack().shape (25568, 4) get names of available static features >>> dataset.static_features get data of 10 random stations >>> _, df = dataset.fetch(10, as_dataframe=True) >>> df.shape # remember this is a multiindexed dataframe (536928, 10) If we want to get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='DE110260', static_features="all", as_dataframe=True) >>> static.shape, dynamic.unstack().shape ((1, 111), (25568, 21)) >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape (1555, 2) >>> dataset.stn_coords('DE110250') # returns coordinates of station whose id is DE110250 47.925221 8.191595 >>> dataset.stn_coords(['DE110250', 'DE110260']) # returns coordinates of two stations
- __init__(path=None, overwrite: bool = False, to_netcdf: bool = True, verbosity: int = 1, **kwargs)[source]
- Parameters:
path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless
overwriteis set to True.overwrite (bool) – If the data is already down then you can set it to True, to make a fresh download.
to_netcdf (bool) – whether to convert all the data into one netcdf file or not. This will fasten repeated calls to fetch etc. but will require netCDF5 package as well as xarray.
- property dyn_map
A dictionary that maps dynamic features to their names in the dataset.
- property dynamic_features: List[str]
Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_dynamic_features().- Return type:
List[str]
- property end
end of data
- property static_features: List[str]
Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_static_features().- Return type:
List[str]
- property static_map: Dict[str, str]
A dictionary that maps static features to their names in the dataset.
- stations() List[str][source]
Names/ids of stations/catchment/gauges or whatever that would be used to index each station in the dataset. Since this is a method, it is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again The user is recommended to implement this method in the child class in a more efficient way.
- class aqua_fetch.rr.CAMELS_DK(path=None, overwrite=False, to_netcdf: bool = True, **kwargs)[source]
Bases:
_RainfallRunoffThis is an updated version of :py class:aqua_fetch.rr.Caravan_DK dataset . This dataset was presented by Liu et al., 2024 and is available at dataverse . This dataset consists of 119 static and 13 dynamic features from 3330 Danish catchments. The dynamic (time series) features span from 1989-01-02 to 2023-12-31 with daily timestep. However, the streamflow observations are available for only 304 catchments.
Examples
>>> from aqua_fetch import CAMELS_DK >>> dataset = CAMELS_DK() >>> _, data = dataset.fetch(0.1, as_dataframe=True) >>> data.shape (166166, 30) # 30 represents number of stations Since data is a multi-index dataframe, we can get data of one station as below >>> data['54130033'].unstack().shape (12782, 13) If we don't set as_dataframe=True, then the returned data will be a xarray Dataset >>> _, data = dataset.fetch(0.1) >>> type(data) xarray.core.dataset.Dataset >>> data.dims FrozenMappingWarningOnValuesAccess({'time': 12782, 'dynamic_features': 13}) >>> len(data.data_vars) 30 >>> _, df = dataset.fetch(stations=1, as_dataframe=True) # get data of only one random station >>> df = df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it >>> df.shape (12782, 13) # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 304 # get data by station id >>> _, df = dataset.fetch(stations='54130033', as_dataframe=True) >>> df.unstack().shape (12782, 13) # get names of available dynamic features >>> dataset.dynamic_features # get only selected dynamic features >>> _, df = dataset.fetch(1, as_dataframe=True, ... dynamic_features=['Abstraction', 'pet_mm', 'airtemp_C_mean', 'pcp_mm', 'q_cms_obs']) >>> df.unstack().shape (12782, 5) # get names of available static features >>> dataset.static_features # get data of 10 random stations >>> _, df = dataset.fetch(10, as_dataframe=True) >>> df.shape (166166, 10) # remember this is multi-indexed DataFrame # If we get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='54130033', static_features="all", as_dataframe=True) >>> static.shape, dynamic.unstack().shape ((1, 119), (12782, 13)) >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape (304, 2) >>> dataset.stn_coords('54130033') # returns coordinates of station whose id is GRDC_3664802 6131379.493 559057.7232 >>> dataset.stn_coords(['54130033', '13210113']) # returns coordinates of two stations
- __init__(path=None, overwrite=False, to_netcdf: bool = True, **kwargs)[source]
- Parameters:
path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless
overwriteis set to True.overwrite (bool) – If the data is already down then you can set it to True, to make a fresh download.
to_netcdf (bool) – whether to convert all the data into one netcdf file or not. This will fasten repeated calls to fetch etc but will require netcdf5 package as well as xarry.
- property dyn_map
A dictionary that maps dynamic features to their names in the dataset.
- property static_map: Dict[str, str]
A dictionary that maps static features to their names in the dataset.
- stations() List[str][source]
Names/ids of stations/catchment/gauges or whatever that would be used to index each station in the dataset. Since this is a method, it is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again The user is recommended to implement this method in the child class in a more efficient way.
- class aqua_fetch.rr.CAMELS_FI(path=None, overwrite=False, to_netcdf: bool = True, **kwargs)[source]
Bases:
_RainfallRunoffDataset of 320 Finnish catchments with 16 dynamic features and 106 static features. The dynamic features span from 19610101 to 20231231 with daily timestep. The data is downloaded from Zenodo.
Examples
>>> from aqua_fetch import CAMELS_FI >>> dataset = CAMELS_FI() >>> _, data = dataset.fetch(0.1, as_dataframe=True) >>> data.shape (368160, 32) # 32 represents number of stations Since data is a multi-index dataframe, we can get data of one station as below >>> data['1156'].unstack().shape (23010, 16) If we don't set as_dataframe=True, then the returned data will be a xarray Dataset >>> _, data = dataset.fetch(0.1) >>> type(data) xarray.core.dataset.Dataset >>> data.dims FrozenMappingWarningOnValuesAccess({'time': 23010, 'dynamic_features': 16}) >>> len(data.data_vars) 32 >>> _, df = dataset.fetch(stations=1, as_dataframe=True) # get data of only one random station >>> df = df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it >>> df.shape (23010, 16) # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 320 # get data by station id >>> _, df = dataset.fetch(stations='1156', as_dataframe=True) >>> df.unstack().shape (23010, 16) # get names of available dynamic features >>> dataset.dynamic_features # get only selected dynamic features >>> _, df = dataset.fetch(1, as_dataframe=True, ... dynamic_features=['pcp_mm', 'snowdepth_m', 'airtemp_C_mean', 'pet_mm', 'q_cms_obs']) >>> df.unstack().shape (23010, 5) # get names of available static features >>> dataset.static_features # get data of 10 random stations >>> _, df = dataset.fetch(10, as_dataframe=True) >>> df.shape (368160, 10) # remember this is multi-indexed DataFrame # If we want to get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='1156', static_features="all", as_dataframe=True) >>> static.shape, dynamic.unstack().shape ((1, 106), (23010, 16)) >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape (320, 2) >>> dataset.stn_coords('1156') # returns coordinates of station whose id is 1156 62.253101 24.444099 >>> dataset.stn_coords(['1156', '1116']) # returns coordinates of two stations 1156 62.253101 24.444099 1116 60.385201 22.355301
- __init__(path=None, overwrite=False, to_netcdf: bool = True, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str)
verbosity (int) – 0: no message will be printed
kwargs (dict) – Any other keyword arguments for the Datasets class
- property static_map: Dict[str, str]
A dictionary that maps static features to their names in the dataset.
- stations() List[str][source]
Names/ids of stations/catchment/gauges or whatever that would be used to index each station in the dataset. Since this is a method, it is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again The user is recommended to implement this method in the child class in a more efficient way.
- class aqua_fetch.rr.CAMELS_FR(path=None, overwrite=False, **kwargs)[source]
Bases:
_RainfallRunoffDataset of 654 catchments from France following the works of Delaigue et al., 2024. The dataset consists of 344 static catchment features and 22 dynamic features. The dynamic features span from 1970101 to 20211231 with daily timestep.
Examples
>>> from aqua_fetch import CAMELS_FR >>> dataset = CAMELS_FR() >>> _, data = dataset.fetch(0.1, as_dataframe=True) >>> data.shape (166166, 30) # 30 represents number of stations Since data is a multi-index dataframe, we can get data of one station as below >>> data['J421191001'].unstack().shape (12782, 13) If we don't set as_dataframe=True, then the returned data will be a xarray Dataset >>> _, data = dataset.fetch(0.1) >>> type(data) xarray.core.dataset.Dataset >>> data.dims FrozenMappingWarningOnValuesAccess({'time': 460928, 'dynamic_features': 13}) >>> len(data.data_vars) 36 >>> _, df = dataset.fetch(stations=1, as_dataframe=True) # get data of only one random station >>> df = df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it >>> df.shape (18993, 22) # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 654 # get data by station id >>> _, df = dataset.fetch(stations='J421191001', as_dataframe=True) >>> df.unstack().shape (18993, 22) # get names of available dynamic features >>> dataset.dynamic_features # get only selected dynamic features >>> _, df = dataset.fetch(1, as_dataframe=True, ... dynamic_features=['pcp_mm', 'spechum_gkg', 'airtemp_C_mean', 'pet_mm_pm', 'q_cms_obs']) >>> df.unstack().shape (18993, 5) # get names of available static features >>> dataset.static_features # get data of 10 random stations >>> _, df = dataset.fetch(10, as_dataframe=True) >>> df.shape (417846, 10) # remember this is multi-indexed DataFrame # If we want to get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='J421191001', static_features="all", as_dataframe=True) >>> static.shape, dynamic.unstack().shape ((1, 334), (18993, 22)) >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape (654, 2) >>> dataset.stn_coords('J421191001') # returns coordinates of station whose id is J421191001 48.006298 -4.063848 >>> dataset.stn_coords(['J421191001', 'U104401001']) # returns coordinates of two stations J421191001 48.006298 -4.063848 U104401001 173.170761 -34.918594
- __init__(path=None, overwrite=False, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str)
verbosity (int) – 0: no message will be printed
kwargs (dict) – Any other keyword arguments for the Datasets class
- property dyn_map: Dict[str, str]
A dictionary that maps dynamic features to their names in the dataset.
- static_attrs() DataFrame[source]
combination of topographic + soil + landuse + geology + climate + hydro + climate + anthropogenic features
- Returns:
a
pandas.DataFrameof static features of all catchments of shape (654, xxxx)- Return type:
pd.DataFrame
- property static_map: Dict[str, str]
A dictionary that maps static features to their names in the dataset.
- stations() List[str][source]
Names/ids of stations/catchment/gauges or whatever that would be used to index each station in the dataset. Since this is a method, it is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again The user is recommended to implement this method in the child class in a more efficient way.
- ts_attrs() DataFrame[source]
daily_timeseries statistics of all catchments
- Returns:
a
pandas.DataFrameof static features of all catchments of shape (654, xxxx)- Return type:
pd.DataFrame
- class aqua_fetch.rr.CAMELS_GB(path=None, **kwargs)[source]
Bases:
_RainfallRunoffThis is a dataset of 671 catchments with 145 static features and 10 dyanmic features for each catchment following the work of Coxon et al., 2020. The dyanmic features are timeseries from 1970-10-01 to 2015-09-30. The data is downloaded from ceh website
Examples
>>> from aqua_fetch import CAMELS_GB >>> dataset = CAMELS_GB() >>> _, data = dataset.fetch(0.1, as_dataframe=True) >>> data.shape (164360, 67) >>> data.index.names == ['time', 'dynamic_features'] True >>> _, df = dataset.fetch(stations=1, as_dataframe=True) >>> df = df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it >>> df.shape (16436, 10) # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 671 # get data by station id >>> _, df = dataset.fetch(stations='97002', as_dataframe=True) >>> df.unstack().shape (16436, 10) # get names of available dynamic features >>> dataset.dynamic_features # get only selected dynamic features >>> _, df = dataset.fetch(1, as_dataframe=True, ... dynamic_features=['windspeed_mps', 'airtemp_C_mean', 'pet_mm', 'pcp_mm', 'q_cms_obs']) >>> df.unstack().shape (16436, 5) # get names of available static features >>> dataset.static_features # get data of 10 random stations >>> _, df = dataset.fetch(10, as_dataframe=True) >>> df.shape (164360, 10) # remember this is multi-indexed DataFrame # when we get both static and dynamic data, the returned data is a dictionary # with ``static`` and ``dyanic`` keys. >>> static, dynamic = dataset.fetch(stations='97002', static_features="all", as_dataframe=True) >>> static, dynamic.shape ((1, 290), (164360, 1))
- __init__(path=None, **kwargs)[source]
- Parameters:
path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless
overwriteis set to True.
- property dyn_map
A dictionary that maps dynamic features to their names in the dataset.
- property dynamic_features: List[str]
Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_dynamic_features().- Return type:
List[str]
- property end
end of data
- property static_features: List[str]
Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_static_features().- Return type:
List[str]
- property static_map: Dict[str, str]
A dictionary that maps static features to their names in the dataset.
- stations(to_exclude=None)[source]
Names/ids of stations/catchment/gauges or whatever that would be used to index each station in the dataset. Since this is a method, it is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again The user is recommended to implement this method in the child class in a more efficient way.
- class aqua_fetch.CAMELS_IND(path=None, overwrite=False, to_netcdf: bool = True, **kwargs)[source]
Bases:
_RainfallRunoffDataset of 472 catchments from Republic of India following the works of Mangukiya et al., 2024. The dataset consists of 210 static catchment features and 20 dynamic features. The dynamic features span from 19800101 to 20201231 with daily timestep.
Examples
>>> from aqua_fetch import CAMELS_IND >>> dataset = CAMELS_IND() >>> _, data = dataset.fetch(0.1, as_dataframe=True) >>> data.shape (299520, 47) # 47 represents number of stations Since data is a multi-index dataframe, we can get data of one station as below >>> data['17015'].unstack().shape (14976, 20) If we don't set as_dataframe=True, then the returned data will be a xarray Dataset >>> _, data = dataset.fetch(0.1) >>> type(data) xarray.core.dataset.Dataset >>> data.dims FrozenMappingWarningOnValuesAccess({'time': 14976, 'dynamic_features': 20}) >>> len(data.data_vars) 47 >>> _, df = dataset.fetch(stations=1, as_dataframe=True) # get data of only one random station >>> df = df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it >>> df.shape (14976, 20) # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 472 # get data by station id >>> _, df = dataset.fetch(stations='3001', as_dataframe=True) >>> df.unstack().shape (14976, 20) # get names of available dynamic features >>> dataset.dynamic_features # get only selected dynamic features >>> _, df = dataset.fetch(1, as_dataframe=True, ... dynamic_features=['pcp_mm', 'rh_%', 'airtemp_C_mean', 'pet_mm', 'q_cms_obs']) >>> df.unstack().shape (14976, 5) # get names of available static features >>> dataset.static_features # get data of 10 random stations >>> _, df = dataset.fetch(10, as_dataframe=True) >>> df.shape (299520, 10) # remember this is multi-indexed DataFrame # If we want to get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='3001', static_features="all", as_dataframe=True) >>> static.shape, dynamic.unstack().shape ((1, 210), (14976, 20)) >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape (472, 2) >>> dataset.stn_coords('3001') # returns coordinates of station whose id is 3001 18.3861 80.3917 >>> dataset.stn_coords(['3001', '17021']) # returns coordinates of two stations
- __init__(path=None, overwrite=False, to_netcdf: bool = True, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str)
verbosity (int) – 0: no message will be printed
kwargs (dict) – Any other keyword arguments for the Datasets class
- property dyn_map
A dictionary that maps dynamic features to their names in the dataset.
- class aqua_fetch.rr.CAMELS_LUX(path=None, timestep: str = 'D', overwrite=False, to_netcdf: bool = True, **kwargs)[source]
Bases:
_RainfallRunoffDataset of 56 catchments from Luxembourg following the work of Nijzink et al., 2025. The dataset consists of 61 static catchment features and 25 dynamic features. The dynamic features span from 20040101 to 20211231 with daily, hourly, and 15-minute timesteps. The data is downloaded from Zenodo.
Examples
>>> from aqua_fetch import CAMELS_LUX >>> dataset = CAMELS_LUX() >>> _, data = dataset.fetch(0.5, as_dataframe=True) >>> data.shape (155225, 28) # 28 represents number of stations Since data is a multi-index dataframe, we can get data of one station as below >>> data['ID_02'].unstack().shape (6209, 13) If we don't set as_dataframe=True, then the returned data will be a xarray Dataset >>> _, data = dataset.fetch(0.1) >>> type(data) xarray.core.dataset.Dataset >>> data.dims FrozenMappingWarningOnValuesAccess({'time': 6209, 'dynamic_features': 25}) >>> len(data.data_vars) 5 >>> _, df = dataset.fetch(stations=1, as_dataframe=True) # get data of only one random station >>> df = df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it >>> df.shape (6209, 25) # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 56 # get data by station id >>> _, df = dataset.fetch(stations='ID_02', as_dataframe=True) >>> df.unstack().shape (6209, 25) # get names of available dynamic features >>> dataset.dynamic_features # get only selected dynamic features >>> _, df = dataset.fetch(1, as_dataframe=True, ... dynamic_features=['pcp_mm_station', 'rh_%', 'airtemp_C_mean', 'pet_mm_pm', 'q_cms_obs']) >>> df.unstack().shape (6209, 25) # get names of available static features >>> dataset.static_features # get data of 10 random stations >>> _, df = dataset.fetch(10, as_dataframe=True) >>> df.shape (155225, 10) # remember this is multi-indexed DataFrame # If we want to get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='ID_02', static_features="all", as_dataframe=True) >>> static.shape, dynamic.unstack().shape ((1, 61), (6209, 25)) >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape (56, 2) >>> dataset.stn_coords('ID_02') # returns coordinates of station whose id is ID_02 49.586288 6.14908 >>> dataset.stn_coords(['ID_01', 'ID_02']) # returns coordinates of two stations ID_01 49.526478 6.114957 ID_02 49.586288 6.14908
- __init__(path=None, timestep: str = 'D', overwrite=False, to_netcdf: bool = True, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str)
verbosity (int) – 0: no message will be printed
kwargs (dict) – Any other keyword arguments for the Datasets class
- property dynamic_features: List[str]
Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_dynamic_features().- Return type:
List[str]
- property static_features: List[str]
Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_static_features().- Return type:
List[str]
- class aqua_fetch.rr.CAMELS_NZ(path: str | PathLike = None, timestep='H', **kwargs)[source]
Bases:
_RainfallRunoffDataset of 369 catchments from New Zealand following the works of Harrigan et al., 2025. The dataset consists of 39 static catchment features and 5 dynamic features. The dynamic features span from 19720101 to 20240802 with hourly timestep. The data is downloaded from figshare.
Examples
>>> from aqua_fetch import CAMELS_NZ >>> dataset = CAMELS_NZ() >>> _, data = dataset.fetch(0.1, as_dataframe=True) >>> data.shape (2304640, 36) # 36 represents number of stations Since data is a multi-index dataframe, we can get data of one station as below >>> data['74321'].unstack().shape (460928, 5) If we don't set as_dataframe=True, then the returned data will be a xarray Dataset >>> _, data = dataset.fetch(0.1) >>> type(data) xarray.core.dataset.Dataset >>> data.dims FrozenMappingWarningOnValuesAccess({'time': 460928, 'dynamic_features': 5}) >>> len(data.data_vars) 36 >>> _, df = dataset.fetch(stations=1, as_dataframe=True) # get data of only one random station >>> df = df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it >>> df.shape (460928, 5) # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 369 # get data by station id >>> _, df = dataset.fetch(stations='74321', as_dataframe=True) >>> df.unstack().shape (460928, 5) # get names of available dynamic features >>> dataset.dynamic_features # get only selected dynamic features >>> _, df = dataset.fetch(1, as_dataframe=True, ... dynamic_features=['pcp_mm', 'rh_%', 'airtemp_C_mean', 'pet_mm', 'q_cms_obs']) >>> df.unstack().shape (460928, 5) # get names of available static features >>> dataset.static_features # get data of 10 random stations >>> _, df = dataset.fetch(10, as_dataframe=True) >>> df.shape (2304640, 10) # remember this is multi-indexed DataFrame # If we want to get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='74321', static_features="all", as_dataframe=True) >>> static.shape, dynamic.unstack().shape ((1, 39), (460928, 5)) >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape (369, 2) >>> dataset.stn_coords('74321') # returns coordinates of station whose id is 74321 -45.945599 170.101486 >>> dataset.stn_coords(['74321', '802']) # returns coordinates of two stations 74321 -45.945599 170.101486 802 -34.918594 173.170761
- __init__(path: str | PathLike = None, timestep='H', **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str)
verbosity (int) – 0: no message will be printed
kwargs (dict) – Any other keyword arguments for the Datasets class
- property dyn_map: Dict[str, str]
A dictionary that maps dynamic features to their names in the dataset.
- property static_map: Dict[str, str]
A dictionary that maps static features to their names in the dataset.
- stations() List[str][source]
Names/ids of stations/catchment/gauges or whatever that would be used to index each station in the dataset. Since this is a method, it is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again The user is recommended to implement this method in the child class in a more efficient way.
- class aqua_fetch.rr.CAMELS_SE(path: str = None, to_netcdf: bool = True, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]
Bases:
_RainfallRunoffDataset of 50 Swedish catchments following the works of Teutschbein et al., 2024 . The data is downloaded from Swedish National Data Service website . The dataset consists of 76 static catchment features and 4 dynamic features. The dynamic features span from 19610101 to 20201231 with daily timestep.
Examples
>>> from aqua_fetch import CAMELS_SE >>> dataset = CAMELS_SE() >>> _, df = dataset.fetch(stations=1, as_dataframe=True) >>> _, df = df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it >>> df.shape (21915, 4) get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 50 get data of 10 % of stations as dataframe >>> _, df = dataset.fetch(0.1, as_dataframe=True) >>> df.shape (87660, 5) The returned dataframe is a multi-indexed data >>> df.index.names == ['time', 'dynamic_features'] True get data by station id >>> _, df = dataset.fetch(stations='5', as_dataframe=True) >>> df.unstack().shape (21915, 4) get names of available dynamic features >>> dataset.dynamic_features get only selected dynamic features >>> _, data = dataset.fetch(1, as_dataframe=True, ... dynamic_features=['q_cms_obs', 'q_mmd_obs', 'pcp_mm', 'airtemp_C_mean']) >>> data.unstack().shape (21915, 4) get names of available static features >>> dataset.static_features ... # get data of 10 random stations >>> _, df = dataset.fetch(10, as_dataframe=True) >>> df.shape # remember this is a multiindexed dataframe (87660, 10) If we want to get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='5', static_features="all", as_dataframe=True) >>> static.shape, dynamic.unstack().shape ((1, 76), (21915, 4)) >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape (50, 2) >>> dataset.stn_coords('5') # returns coordinates of station whose id is GRDC_3664802 68.0356 21.9758 >>> dataset.stn_coords(['5', '200']) # returns coordinates of two stations
- __init__(path: str = None, to_netcdf: bool = True, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]
- Parameters:
path – path where the CAMELS_SE dataset has been downloaded. This path must contain five zip files and one xlsx file. If None, then the data will be downloaded.
to_netcdf
- property dyn_map
A dictionary that maps dynamic features to their names in the dataset.
- property dynamic_features: List[str]
Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_dynamic_features().- Return type:
List[str]
- property end
end of data
- property static_features
Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_static_features().- Return type:
List[str]
- property static_map: Dict[str, str]
A dictionary that maps static features to their names in the dataset.
- stations() List[str][source]
Names/ids of stations/catchment/gauges or whatever that would be used to index each station in the dataset. Since this is a method, it is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again The user is recommended to implement this method in the child class in a more efficient way.
- class aqua_fetch.rr.CAMELS_SK(path=None, timestep: str = 'H', to_netcdf: bool = True, **kwargs)[source]
Bases:
_RainfallRunoffDataset of 178 catchments from South Korea following the work of Kim et al., 2025. The dataset consists of 215 static catchment features and 17 dynamic features. The dynamic features span from 20000101 to 20191231 with hourly timestep.
Examples
>>> from aqua_fetch import CAMELS_SK >>> dataset = CAMELS_SK() >>> _, data = dataset.fetch(0.1, as_dataframe=True) >>> data.shape (2980440, 17) # 17 represents number of stations Since data is a multi-index dataframe, we can get data of one station as below >>> data['2013615'].unstack().shape (175320, 13) If we don't set as_dataframe=True, then the returned data will be a xarray Dataset >>> _, data = dataset.fetch(0.1) >>> type(data) xarray.core.dataset.Dataset >>> data.dims FrozenMappingWarningOnValuesAccess({'time': 175320, 'dynamic_features': 17}) >>> len(data.data_vars) 17 >>> _, df = dataset.fetch(stations=1, as_dataframe=True) # get data of only one random station >>> df = df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it >>> df.shape (175320, 17) # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 178 # get data by station id >>> _, df = dataset.fetch(stations='2013615', as_dataframe=True) >>> df.unstack().shape (175320, 17) # get names of available dynamic features >>> dataset.dynamic_features # get only selected dynamic features >>> _, df = dataset.fetch(1, as_dataframe=True, ... dynamic_features=['total_precipitation', 'snow_depth', 'air_temp_obs', 'potential_evaporation', 'q_cms_obs']) >>> df.unstack().shape (175320, 17) # get names of available static features >>> dataset.static_features # get data of 10 random stations >>> _, df = dataset.fetch(10, as_dataframe=True) >>> df.shape (155225, 10) # remember this is multi-indexed DataFrame # If we want to get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='2013615', static_features="all", as_dataframe=True) >>> static.shape, dynamic.unstack().shape ((1, 215), (175320, 17)) >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape (178, 2) >>> dataset.stn_coords('2013615') # returns coordinates of station whose id is 2013615 35.880798 128.173096 >>> dataset.stn_coords(['2013615', '2017620']) # returns coordinates of two stations 2013615 35.880798 128.173096 2017620 35.527500 128.359207
- __init__(path=None, timestep: str = 'H', to_netcdf: bool = True, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str)
verbosity (int) – 0: no message will be printed
kwargs (dict) – Any other keyword arguments for the Datasets class
- property dynamic_features: List[str]
Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_dynamic_features().- Return type:
List[str]
- property static_features: List[str]
Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_static_features().- Return type:
List[str]
- class aqua_fetch.rr.CAMELS_US(path: str | PathLike = None, data_source: str = 'basin_mean_daymet', **kwargs)[source]
Bases:
_RainfallRunoffThis is a dataset of 671 US catchments with 59 static features and 8 dyanmic features for each catchment. The dyanmic features are timeseries from 1980-01-01 to 2014-12-31. This class downloads and processes CAMELS dataset of 671 catchments named as CAMELS from ucar.edu following Newman et al., 2015 , Newman et al., 2022 and Addor et al., 2017.
Examples
>>> from aqua_fetch import CAMELS_US >>> dataset = CAMELS_US() >>> _, df = dataset.fetch(stations=1, as_dataframe=True) >>> df = df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it >>> df.shape (12784, 8) # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 671 # we can get data of 10% catchments as below >>> _, data = dataset.fetch(0.1, as_dataframe=True) >>> data.shape (460488, 51) # the data is multi-index with ``time`` and ``dynamic_features`` as indices >>> data.index.names == ['time', 'dynamic_features'] True # get data by station id >>> _, df = dataset.fetch(stations='11478500', as_dataframe=True) >>> df.unstack().shape (12784, 8) # get names of available dynamic features >>> dataset.dynamic_features # get only selected dynamic features >>> _, df = dataset.fetch(1, as_dataframe=True, ... dynamic_features=['pcp_mm', 'solrad_wm2', 'airtemp_C_max', 'airtemp_C_min', 'q_cms_obs']) >>> df.unstack().shape (12784, 5) # get names of available static features >>> dataset.static_features # get data of 10 random stations >>> _, df = dataset.fetch(10, as_dataframe=True) >>> df.shape (102272, 10) # remember this is multi-indexed DataFrame # If we want to get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='11478500', static_features="all", as_dataframe=True) >>> static.shape, dynamic.unstack().shape ((1, 59), (12784, 8))
- __init__(path: str | PathLike = None, data_source: str = 'basin_mean_daymet', **kwargs)[source]
- Parameters:
path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless
overwriteis set to True.data_source (str) –
- allowed values are
basin_mean_daymet
basin_mean_maurer
basin_mean_nldas
basin_mean_v1p15_daymet
basin_mean_v1p15_nldas
elev_bands
hru
- property dyn_map
A dictionary that maps dynamic features to their names in the dataset.
- property dynamic_features: List[str]
Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_dynamic_features().- Return type:
List[str]
- property end
end of data
- property static_features: List[str]
Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_static_features().- Return type:
List[str]
- property static_map: Dict[str, str]
A dictionary that maps static features to their names in the dataset.
- stations() list[source]
Names/ids of stations/catchment/gauges or whatever that would be used to index each station in the dataset. Since this is a method, it is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again The user is recommended to implement this method in the child class in a more efficient way.
- class aqua_fetch.rr.Caravan_DK(path=None, overwrite=False, to_netcdf: bool = True, **kwargs)[source]
Bases:
_RainfallRunoffReads Caravan extension Denmark - Danish dataset for large-sample hydrology following the works of Koch and Schneider 2022 . The dataset is downloaded from zenodo . This dataset consists of static and dynamic features from 308 danish catchments. There are 38 dynamic (time series) features from 1981-01-02 to 2020-12-31 with daily timestep and 211 static features for each of 308 catchments.
Please note that there is an updated version of this dataset following the works of Liu et al., 2024 . This dataset is associated with the
aqua_fetch.CAMELS_DKclass which can be imported as follows:>>> from aqua_fetch import CAMELS_DK
Examples
>>> from aqua_fetch import Caravan_DK >>> dataset = Caravan_DK() >>> _, data = dataset.fetch(0.1, as_dataframe=True) >>> data.shape (569751, 30) # 30 represents number of stations >>> data.index.names == ['time', 'dynamic_features'] True >>> _, df = dataset.fetch(stations=1, as_dataframe=True) >>> df = df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it >>> df.shape (14609, 39) # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 308 # get data by station id >>> _, df = dataset.fetch(stations='80001', as_dataframe=True) >>> df.unstack().shape (14609, 39) # get names of available dynamic features >>> dataset.dynamic_features # get only selected dynamic features >>> _, df = dataset.fetch(1, as_dataframe=True, ... dynamic_features=['snow_depth_water_equivalent_mean', 'temperature_2m_mean', ... 'potential_evaporation_sum', 'total_precipitation_sum', 'q_cms_obs']) >>> df.unstack().shape (14609, 5) # get names of available static features >>> dataset.static_features # get data of 10 random stations >>> _, df = dataset.fetch(10, as_dataframe=True) >>> df.shape (569751, 10) # remember this is multi-indexed DataFrame # If we get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='80001', static_features="all", as_dataframe=True) >>> static.shape, dynamic.unstack().shape ((1, 211), (14609, 39))
- __init__(path=None, overwrite=False, to_netcdf: bool = True, **kwargs)[source]
- Parameters:
path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless
overwriteis set to True.overwrite (bool) – If the data is already down then you can set it to True, to make a fresh download.
to_netcdf (bool) – whether to convert all the data into one netcdf file or not. This will fasten repeated calls to fetch etc but will require netcdf5 package as well as xarry.
- property boundary_id_map: str
Name of the attribute in the boundary (shapefile/.gpkg) file that will be used to map the catchment/station id to the geometry of the catchment/station. This is used to create the boundary id map. if not given, then the first attribute in the boundary file will be used.
- property caravan_attr_fpath
returns path to attributes_caravan_camelsdk.csv file
- caravan_static_attributes(stations='all') DataFrame[source]
- Return type:
a
pandas.DataFrameof shape (308, 10)
- property dyn_map: Dict[str, str]
A dictionary that maps dynamic features to their names in the dataset.
- hyd_atlas_attributes(stations='all') DataFrame[source]
- Return type:
a
pandas.DataFrameof shape (308, 196)
- property other_attr_fpath
returns path to attributes_other_camelsdk.csv file
- other_static_attributes(stations='all') DataFrame[source]
- Return type:
a
pandas.DataFrameof shape (308, 5)
- property static_map: Dict[str, str]
A dictionary that maps static features to their names in the dataset.
- stations() List[str][source]
Names/ids of stations/catchment/gauges or whatever that would be used to index each station in the dataset. Since this is a method, it is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again The user is recommended to implement this method in the child class in a more efficient way.
- stn_coords(stations: str | List[str] = 'all') DataFrame[source]
returns coordinates of stations as DataFrame with
longandlatas columns.- Parameters:
stations – name/names of stations. If not given, coordinates of all stations will be returned.
- Returns:
pandas.DataFramewithlongandlatcolumns. The length of dataframe will be equal to number of stations wholse coordinates are to be fetched.- Return type:
pd.DataFrame
Examples
>>> dataset = Caravan_DK() >>> dataset.stn_coords() # returns coordinates of all stations >>> dataset.stn_coords('100010') # returns coordinates of station whose id is 912101A >>> dataset.stn_coords(['100010', '210062']) # returns coordinates of two stations
- class aqua_fetch.rr.CCAM(path=None, overwrite: bool = False, to_netcdf: bool = True, **kwargs)[source]
Bases:
_RainfallRunoffDataset for chinese catchments. The CCAM dataset was published by Hao et al., 2021 has two sets. One set consists of catchment attributes, meteorological data, catchment boundaries of over 4000 catchments. However this data does not have streamflow data. The second set consists of streamflow, catchment attributes, catchment boundaries and meteorological data for 102 catchments of Yellow River. Since this second set conforms to the norms of CAMELS, this class uses this second set. Therefore, the
fetch,stationsand other methods/attributes of this class return data of only Yellow River catchments and not for whole china. However, the first set of data is can also be fetched using fetch_meteo method of this class. The temporal extent of both sets is from 1999 to 2020. However, the streamflow time series in first set has very large number of missing values. The data of Yellow river consists fo 16 dynamic features (time series) and 124 static features (catchment attributes).Examples
>>> from aqua_fetch import CCAM >>> dataset = CCAM() >>> _, data = dataset.fetch(0.1, as_dataframe=True) >>> data.shape (128560, 10) >>> data.index.names == ['time', 'dynamic_features'] True >>> _, df = dataset.fetch(stations=1, as_dataframe=True) >>> df = df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it >>> df.shape (8035, 16) # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 102 # get data by station id >>> _, df = dataset.fetch(stations='0010', as_dataframe=True) >>> df.unstack().shape (8035, 16) # get names of available dynamic features >>> dataset.dynamic_features # get only selected dynamic features >>> _, df = dataset.fetch(1, as_dataframe=True, ... dynamic_features=['pcp_mm', 'airtemp_C_mean', 'evap_mm', 'rh_%', 'q_cms_obs']) >>> df.unstack().shape (8035, 5) # get names of available static features >>> dataset.static_features # get data of 10 random stations >>> _, df = dataset.fetch(10, as_dataframe=True) >>> df.shape (128560, 10) # remember this is multi-indexed DataFrame # If we want to get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='0010', static_features="all", as_dataframe=True) >>> static.shape, dynamic.unstack().shape ((1, 124), (8035, 16))
- __init__(path=None, overwrite: bool = False, to_netcdf: bool = True, **kwargs)[source]
- Parameters:
path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless
overwriteis set to True.overwrite (bool) – If the data is already down then you can set it to True, to make a fresh download.
to_netcdf (bool) – whether to convert all the data into one netcdf file or not. This will fasten repeated calls to fetch etc but will require netcdf5 package as well as xarry.
- property dyn_map
A dictionary that maps dynamic features to their names in the dataset.
- property dynamic_features: List[str]
names of hydro-meteorological time series data for Yellow River catchments
- property end
end of data
- fetch_meteo(station: str | List[str] = 'all', features: str | List[str] = 'all', st='1990-01-01', en='2021-03-31', as_dataframe: bool = True)[source]
fetches meteorological data of 4902 chinese catchments
Examples
>>> from aqua_fetch import CCAM >>> dataset = CCAM() >>> dynamic_features = ['PRE', 'TEM', 'PRS', 'RHU', 'EVP', 'WIN', 'PET'] >>> st = '1999-01-01' >>> en = '2020-03-31' >>> xds = dataset.fetch_meteo(features=features, st=st, en=en)
- property meteo_path
path where daily meteorological data of stations is present
- class aqua_fetch.rr.Finland(path: str | PathLike = None, estreams_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
Bases:
_EStreamsData of 669 catchments of Finland. The observed streamflow data is downloaded from https://wwwi3.ymparisto.fi . The meteorological data, stattic catchment features and catchment boundaries are taken from
aqua_fetch.EStreamsfollwoing the works of Nascimento et al., 2024 . Therefore, the number of static features are 214 and dynamic features are 10 and the data is available from 2012-01-01 to 2023-06-30.Examples
>>> from aqua_fetch import Finland >>> dataset = Finland() >>> _, data = dataset.fetch(0.1) # the returned data will be a xarray Dataset >>> type(data) xarray.core.dataset.Dataset >>> data.dims FrozenMappingWarningOnValuesAccess({'time': 4199, 'dynamic_features': 10}) >>> len(data.data_vars) # number of stations for which data has been fetched 66 >>> _, data = dataset.fetch(stations=1) # get data of only one random station # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 669 # get data by station id >>> _, data = dataset.fetch(stations='FI000001') # get names of available dynamic features >>> dataset.dynamic_features # get only selected dynamic features >>> _, data = dataset.fetch(1, ... dynamic_features=['pcp_mm', 'rh_%', 'airtemp_C_mean', 'pet_mm', 'q_cms_obs']) # get names of available static features >>> dataset.static_features # get data of 10 random stations >>> _, data = dataset.fetch(10) >>> len(data.data_vars) 10 # If we want to get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='FI000001', static_features="all") >>> static.shape, len(dynamic.data_vars) ((1, 214), 1) >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape (669, 2) >>> dataset.stn_coords('FI000001') # returns coordinates of station whose id is FI000001 64.226288 27.736528 >>> dataset.stn_coords(['FI000001', 'FI000002']) # returns coordinates of two stations FI000001 64.226288 27.736528 FI000002 64.226288 27.736528
- __init__(path: str | PathLike = None, estreams_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str)
verbosity (int) – 0: no message will be printed
kwargs (dict) – Any other keyword arguments for the Datasets class
- gauge_id_basin_id_map() dict[source]
For example for Portugal, it is guage_id : ‘03J/02H’ basin_id ‘PT000001’ ‘03J/02H’ -> ‘PT000001’
for Slovenia, it is gauge id : 1060 basin_id : SI000001 ‘1060’ -> ‘SI000001’
- get_q(as_dataframe: bool = True, overwrite: bool = False)[source]
downloads (if not already downloaded) and returns the daily streamflow data of Finland. either as
pandas.DataFrameor as xarray dataset.
- class aqua_fetch.rr.GRDCCaravan(path=None, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]
Bases:
_RainfallRunoffThis is a dataset of 5357 catchments from around the globe following the works of Faerber et al., 2023 . The dataset consists of 39 dynamic (timeseries) features and 211 static features. The dynamic (timeseries) data spands from 1950-01-02 to 2019-05-19.
if xarray + netCDF4 packages are installed then netcdf files will be downloaded otherwise csv files will be downloaded and used.
Examples
>>> from aqua_fetch import GRDCCaravan >>> dataset = GRDCCaravan() >>> _, df = dataset.fetch(stations=1, as_dataframe=True) >>> df = df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it >>> df.shape (26801, 39) get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 5357 get data of 10 % of stations as dataframe >>> _, df = dataset.fetch(0.1, as_dataframe=True) >>> df.shape (1045239, 535) The returned dataframe is a multi-indexed data >>> df.index.names == ['time', 'dynamic_features'] True get data by station id >>> _, df = dataset.fetch(stations='GRDC_3664802', as_dataframe=True) >>> df.unstack().shape (26800, 39) get names of available dynamic features >>> dataset.dynamic_features get only selected dynamic features >>> _, data = dataset.fetch(1, as_dataframe=True, ... dynamic_features=['total_precipitation_sum', 'potential_evaporation_sum', 'temperature_2m_mean', 'q_cms_obs']) >>> data.unstack().shape (26800, 4) get names of available static features >>> dataset.static_features ... # get data of 10 random stations >>> _, df = dataset.fetch(10, as_dataframe=True) >>> df.shape # remember this is a multiindexed dataframe (1045239, 10) If we want to get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='GRDC_3664802', static_features="all", as_dataframe=True) >>> static.shape, dynamic.unstack().shape ((1, 211), (26800, 39)) >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape (5357, 2) >>> dataset.stn_coords('GRDC_3664802') # returns coordinates of station whose id is GRDC_3664802 -26.2271 -51.0771 >>> dataset.stn_coords(['GRDC_3664802', 'GRDC_1159337']) # returns coordinates of two stations
- __init__(path=None, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str)
verbosity (int) – 0: no message will be printed
kwargs (dict) – Any other keyword arguments for the Datasets class
- property dyn_map
A dictionary that maps dynamic features to their names in the dataset.
- property dynamic_features
Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_dynamic_features().- Return type:
List[str]
- property end
end of data
- fetch_station_features(station: str, dynamic_features: str | list | None = 'all', static_features: str | list | None = None, st: str | None = None, en: str | None = None, **kwargs) tuple[DataFrame, DataFrame][source]
Fetches features for one station.
- Parameters:
station – station id/gauge id for which the data is to be fetched.
dynamic_features (str/list, optional) – names of dynamic features/attributes to fetch
static_features – names of static features/attributes to be fetches
st (str,optional) – starting point from which the data to be fetched. By default, the data will be fetched from where it is available.
en (str, optional) – end point of data to be fetched. By default the dat will be fetched
- Returns:
A tuple of static and dynamic features, both as
pandas.DataFrame. The dataframe of static features will be of single row while the dynamic features will be of shape (time, dynamic features).- Return type:
Examples
>>> from aqua_fetch import GRDCCaravan >>> dataset = GRDCCaravan() >>> dataset.fetch_station_features('912101A')
- property static_features
Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_static_features().- Return type:
List[str]
- property static_map: Dict[str, str]
A dictionary that maps static features to their names in the dataset.
- stations() List[str][source]
Names/ids of stations/catchment/gauges or whatever that would be used to index each station in the dataset. Since this is a method, it is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again The user is recommended to implement this method in the child class in a more efficient way.
- class aqua_fetch.rr.HYSETS(path: str, sources: Dict[str, str] = None, **kwargs)[source]
Bases:
_RainfallRunoffdatabase for hydrometeorological modeling of 14,425 North American watersheds from 1950-2023 following the work of Arsenault et al., 2020 This data has 20 dynamic features and 30 static features. Most of the dynamic features have more than one source. The data is available in netcdf format therefore, this package requires xarray and netCDF4 to be installed..
Following data_source are available.
sources
dynamic_features
SNODAS_SWE
dscharge, swe
SCDNA
discharge, pr, tasmin, tasmax
nonQC_stations
discharge, pr, tasmin, tasmax
Livneh
discharge, pr, tasmin, tasmax
ERA5
discharge, pr, tasmax, tasmin
ERAS5Land_SWE
discharge, swe
ERA5Land
discharge, pr, tasmax, tasmin
all sources contain one or more following dynamic_features with following shapes
dynamic_features
shape
time
(25202,)
watershedID
(14425,)
drainage_area
(14425,)
drainage_area_GSIM
(14425,)
flag_GSIM_boundaries
(14425,)
flag_artificial_boundaries
(14425,)
centroid_lat
(14425,)
centroid_lon
(14425,)
elevation
(14425,)
slope
(14425,)
discharge
(14425, 25202)
pr
(14425, 25202)
tasmax
(14425, 25202)
tasmin
(14425, 25202)
Examples
>>> from aqua_fetch import HYSETS >>> dataset = HYSETS(path="path/to/HYSETS") ... # fetch data of a random station >>> _, df = dataset.fetch(1, as_dataframe=True) >>> df.shape (27028, 20) >>> stations = dataset.stations() >>> len(stations) 14425 >>> _, df = dataset.fetch('999', as_dataframe=True) >>> df.unstack().shape (27028, 20)
- __init__(path: str, sources: Dict[str, str] = None, **kwargs)[source]
- Parameters:
path (str) – The path under which the data is to be saved or is saved already. If the data is alredy downloaded then provide the path under which HYSETS data is located. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless
overwriteis set to True.sources (dict) –
sources for each dynamic feature. The keys should be dynamic features and values should be sources. Available sources for the dynamic features are as below
10m_u_component_of_wind: [‘ERA5’, ‘ERA5Land’]
10m_v_component_of_wind: [‘ERA5’, ‘ERA5Land’]
2m_dewpoint: [‘ERA5’, ‘ERA5Land’]
2m_tasmax: [‘NRCAN’, ‘Livneh’, ‘QC_stations’, ‘ERA5’, ‘nonQC_stations’, ‘ERA5Land’, ‘SCDNA’]
2m_tasmin: [‘NRCAN’, ‘Livneh’, ‘QC_stations’, ‘ERA5’, ‘nonQC_stations’, ‘ERA5Land’, ‘SCDNA’]
discharge: [‘NRCAN’, ‘ERA5’, ‘ERA5Land’, ‘Livneh’, ‘nonQC_stations’, ‘SCDNA’, ‘SNODAS’, ‘QC_stations’]
evaporation: [‘ERA5’, ‘ERA5Land’]
snow_density: [‘ERA5’, ‘ERA5Land’]
snow_evaporation: [‘ERA5’, ‘ERA5Land’]
snow_water_equivalent: [‘ERA5’, ‘ERA5Land’, ‘SNODAS’]
snowfall: [‘ERA5’, ‘ERA5Land’]
snowmelt: [‘ERA5’, ‘ERA5Land’]
surface_downwards_solar_radiation: [‘ERA5’, ‘ERA5Land’]
surface_downwards_thermal_radiation: [‘ERA5’, ‘ERA5Land’]
surface_net_solar_radiation: [‘ERA5’, ‘ERA5Land’]
surface_net_thermal_radiation: [‘ERA5’, ‘ERA5Land’]
surface_pressure: [‘ERA5’, ‘ERA5Land’]
surface_runoff: [‘ERA5’, ‘ERA5Land’]
total_cloud_cover: [‘ERA5’]
total_precipitation: [‘NRCAN’, ‘Livneh’, ‘QC_stations’, ‘ERA5’, ‘nonQC_stations’, ‘ERA5Land’, ‘SCDNA’]
kwargs – arguments for
_RainfallRunoffbase class
- property OfficialID_WatershedID_map
A dictionary mapping Official_ID to Watershed_ID. For example ‘1’: ‘01AD002’
- property WatershedID_OfficialID_map
A dictionary mapping Watershed_ID to Official_ID. For example ‘01AD002’: ‘1’
- area(stations: str | List[str] = 'all', source: str = 'other') Series[source]
Returns area_gov (Km2) of all catchments as
pandas.Series- Parameters:
stations (str/list) – name/names of stations. Default is None, which will return area of all stations
source (str) – source of area calculation. It should be either
gsimorother
- Returns:
a
pandas.Serieswhose indices are catchment ids and values are areas of corresponding catchments.- Return type:
pd.Series
Examples
>>> from aqua_fetch import HYSETS >>> dataset = HYSETS() >>> dataset.area() # returns area of all stations >>> dataset.area('92') # returns area of station whose id is 912101A >>> dataset.area(['92', '142']) # returns area of two stations
- property boundary_id_map: str
Name of the attribute in the boundary (.shp/.gpkg) file that will be used to map the catchment/station id to the geometry of the catchment/station. This is used to create the boundary id map.
- property dyn_map: Dict[str, str]
A dictionary that maps dynamic features to their names in the dataset.
- property dynamic_features: List[str]
Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_dynamic_features().- Return type:
List[str]
- fetch_dynamic_features(station, dynamic_features='all', st=None, en=None, as_dataframe=False)[source]
Fetches dynamic features of one station.
Examples
>>> from aqua_fetch import HYSETS >>> dataset = HYSETS() >>> dyn_features = dataset.fetch_dynamic_features('station_name')
- fetch_static_features(stations: str | List[str] = 'all', static_features: str | List[str] = 'all', st=None, en=None) DataFrame[source]
returns static atttributes of one or multiple stations
- Parameters:
stations (str) – name/id of station of which to extract the data
static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.
st
en
Examples
>>> from aqua_fetch import HYSETS >>> dataset = HYSETS() get the names of stations >>> stns = dataset.stations() >>> len(stns) 14425 get all static data of all stations >>> static_data = dataset.fetch_static_features(stns) >>> static_data.shape (14425, 28) get static data of one station only >>> static_data = dataset.fetch_static_features('991') >>> static_data.shape (1, 28) get the names of static features >>> dataset.static_features get only selected features of all stations >>> static_data = dataset.fetch_static_features(stns, ['area_km2', 'Elevation_m']) >>> static_data.shape (14425, 2)
- fetch_stations_features(stations: list, dynamic_features: str | list | None = 'all', static_features: str | list | None = None, st=None, en=None, as_dataframe: bool = False, **kwargs) Tuple[DataFrame, DataFrame | Dataset][source]
returns features of multiple stations .. rubric:: Examples
>>> from aqua_fetch import HYSETS >>> dataset = HYSETS() >>> stations = dataset.stations()[0:3] >>> features = dataset.fetch_stations_features(stations)
- read_static_data(usecols=None, nrows=None)[source]
reads the HYSETS_watershed_properties.txt file while using Watershed_ID as index instead of
Official_ID. Watershed_ID starts with 1,2,3 and so on whileOfficial_IDis code from meteo agency such as01AD002for station 1.
- property static_features: List[str]
Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_static_features().- Return type:
List[str]
- property static_map: Dict[str, str]
A dictionary that maps static features to their names in the dataset.
- stations() List[str][source]
retuns a list of station names. The
Watershed_IDof the station is used as station name instead ofOfficial_ID. This is because in .nc files watershed_ID is used for stations instead of Official_ID.Official_IDstarts with 1, 2, 3 and so on whileWatershed_IDis a code from meteo agency such as01AD002for station 1.- Returns:
a list of ids of stations
- Return type:
Examples
>>> from aqua_fetch import HYSETS >>> dataset = HYSETS() ... # get name of all stations as list >>> dataset.stations()
- class aqua_fetch.rr.HYPE(time_step: str = 'daily', path=None, **kwargs)[source]
Bases:
_RainfallRunoffDownloads and preprocesses HYPE [1] dataset from Lindstroem et al., 2010 [2] . This is a rainfall-runoff dataset of Costa Rica of 564 stations from 1985 to 2019 at daily, monthly and yearly time steps.
Examples
>>> from aqua_fetch import HYPE >>> dataset = HYPE() ... # get data of 5% of stations >>> df = dataset.fetch(stations=0.05, as_dataframe=True) # returns a multiindex dataframe >>> df.shape (115047, 28) ... # fetch data of 5 (randomly selected) stations >>> df = dataset.fetch(stations=5, as_dataframe=True) >>> df.shape (115047, 5) fetch data of 3 selected stations >>> df = dataset.fetch(stations=['564','563','562'], as_dataframe=True) >>> df.shape (115047, 3) ... # fetch data of a single stations >>> df = dataset.fetch(stations='500', as_dataframe=True) (115047, 1) # get only selected dynamic features >>> df = dataset.fetch(stations='501', ... dynamic_features=['AET_mm', 'Prec_mm', 'Streamflow_mm'], as_dataframe=True) # fetch data between selected periods >>> df = dataset.fetch(stations='225', st="20010101", en="20101231", as_dataframe=True) >>> df.shape (32868, 1) ... # get data at monthly time step >>> dataset = HYPE(time_step="month") >>> df = dataset.fetch(stations='500', as_dataframe=True) >>> df.shape (3780, 1)
- __init__(time_step: str = 'daily', path=None, **kwargs)[source]
- Parameters:
path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless
overwriteis set to True.time_step (str) – one of
daily,monthoryear**kwargs – key word arguments
- area(stations: str | List[str] = 'all') Series[source]
Returns area (Km2) of all catchments as
pandas.Series- Parameters:
stations (str/list) – name/names of stations. Default is None, which will return area of all stations
- Returns:
a
pandas.Serieswhose indices are catchment ids and values are areas of corresponding catchments.- Return type:
pd.Series
Examples
>>> from aqua_fetch import HYPE >>> dataset = HYPE() >>> dataset.area() # returns area of all stations >>> dataset.stn_coords('2') # returns area of station whose id is 912101A >>> dataset.stn_coords(['2', '605']) # returns area of two stations
- property end
end of data
- fetch_static_features(station, static_features=None)[source]
static data for HYPE is not available.
- property static_features
Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_static_features().- Return type:
List[str]
- stations() list[source]
Names/ids of stations/catchment/gauges or whatever that would be used to index each station in the dataset. Since this is a method, it is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again The user is recommended to implement this method in the child class in a more efficient way.
- stn_coords(stations: str | List[str] = 'all') DataFrame[source]
returns coordinates of stations as DataFrame with
longandlatas columns.- Parameters:
stations – name/names of stations. If not given, coordinates of all stations will be returned.
Examples
>>> dataset = HYPE() >>> dataset.stn_coords() # returns coordinates of all stations >>> dataset.stn_coords('2') # returns coordinates of station whose id is 912101A >>> dataset.stn_coords(['2', '605']) # returns coordinates of two stations
- class aqua_fetch.Ireland(path: str | PathLike = None, estreams_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
Bases:
_EStreamsData of 464 catchments of Ireland. Out of these 464 catchments, 280 are from OPW and 184 are from EPA. The observed streamflow data for EPA stations is downloaded from https://epawebapp.epa.ie/Hydronet/#Flow while the observed streamflow for OPW stations is downloaded from https://waterlevel.ie/hydro-data/#/overview/Waterlevel. It should be that out of 280 OPW stations, streamflow data is available for only 129 stations. The meteorological data, static catchment features and catchment boundaries are taken from
aqua_fetch.EStreamsfollwoing the works of Nascimento et al., 2024 project. Therefore, the number of static features are 214 and dynamic features are 10 and the data is available from 1992-01-01 to 2020-06-31.Examples
>>> from aqua_fetch import Ireland >>> dataset = Ireland() >>> _, data = dataset.fetch(0.1) # the returned data will be a xarray Dataset >>> type(data) xarray.core.dataset.Dataset >>> data.dims FrozenMappingWarningOnValuesAccess({'time': 26844, 'dynamic_features': 10}) >>> len(data.data_vars) # number of stations for which data has been fetched 46 >>> _, data = dataset.fetch(stations=1) # get data of only one random station # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 464 # get data by station id >>> _, data = dataset.fetch(stations='IEEP0281') # get names of available dynamic features >>> dataset.dynamic_features # get only selected dynamic features >>> _, data = dataset.fetch(1, ... dynamic_features=['pcp_mm', 'rh_%', 'airtemp_C_mean', 'pet_mm', 'q_cms_obs']) # get names of available static features >>> dataset.static_features # get data of 10 random stations >>> _, data = dataset.fetch(10) >>> len(data.data_vars) 10 # If we want to get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='IEEP0281', static_features="all") >>> static.shape, len(dynamic.data_vars) ((1, 214), 1) >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape (464, 2) >>> dataset.stn_coords('IEEP0281') # returns coordinates of station whose id is IEEP0281 52.217434 -8.494649 >>> dataset.stn_coords(['IEEP0281', 'IEEP0282']) # returns coordinates of two stations IEEP0281 52.217434 -8.494649 IEEP0282 54.284546 -6.921607
- __init__(path: str | PathLike = None, estreams_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str)
verbosity (int) – 0: no message will be printed
kwargs (dict) – Any other keyword arguments for the Datasets class
- class aqua_fetch.rr.Italy(path: str | PathLike = None, estreams_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
Bases:
_EStreamsData of 294 catchments of Italy. The observed streamflow data is downloaded from http://www.hiscentral.isprambiente.gov.it/hiscentral/hydromap.aspx?map=obsclient . The meteorological data, static catchment features and catchment boundaries are taken from
aqua_fetch.EStreamsfollwoing the works of Nascimento et al., 2024 . Therefore, the number of static features are 214 and dynamic features are 10 and the data is available from 1992-01-01 to 2020-06-31.Examples
>>> from aqua_fetch import Italy >>> dataset = Italy() >>> _, data = dataset.fetch(0.1) # the returned data will be a xarray Dataset >>> type(data) xarray.core.dataset.Dataset >>> data.dims FrozenMappingWarningOnValuesAccess({'time': 26844, 'dynamic_features': 10}) >>> len(data.data_vars) # number of stations for which data has been fetched 29 >>> _, data = dataset.fetch(stations=1) # get data of only one random station # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 294 # get data by station id >>> _, data = dataset.fetch(stations='ITIS0001') # get names of available dynamic features >>> dataset.dynamic_features # get only selected dynamic features >>> _, data = dataset.fetch(1, ... dynamic_features=['pcp_mm', 'rh_%', 'airtemp_C_mean', 'pet_mm', 'q_cms_obs']) # get names of available static features >>> dataset.static_features # get data of 10 random stations >>> _, data = dataset.fetch(10) >>> len(data.data_vars) 10 # If we want to get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='ITIS0001', static_features="all") >>> static.shape, len(dynamic.data_vars) ((1, 214), 1) >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape (294, 2) >>> dataset.stn_coords('ITIS0001') # returns coordinates of station whose id is ITIS0001 42.835835 13.919167 >>> dataset.stn_coords(['ITIS0001', 'ITIS0002']) # returns coordinates of two stations ITIS0001 42.835835 13.919167 ITIS0002 42.783890 13.905833
- __init__(path: str | PathLike = None, estreams_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str)
verbosity (int) – 0: no message will be printed
kwargs (dict) – Any other keyword arguments for the Datasets class
- class aqua_fetch.Japan(path: str | PathLike = None, gsha_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
Bases:
_GSHAData of 694 catchments of Japan from river.go.jp website . The meteorological data static catchment features and catchment boundaries taken from GSHA project. Therefore, the number of static features are 35 and dynamic features are 27 and the data is available from 1979-01-01 to 2022-12-31.
- __init__(path: str | PathLike = None, gsha_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str)
verbosity (int) – 0: no message will be printed
kwargs (dict) – Any other keyword arguments for the Datasets class
- class aqua_fetch.rr.LamaHCE(*, timestep: str = 'D', data_type: str = 'total_upstrm', path=None, to_netcdf: bool = True, overwrite=False, **kwargs)[source]
Bases:
_RainfallRunoffLarge-Sample Data for Hydrology and Environmental Sciences for Central Europe (mainly Austria). The dataset is downloaded from zenodo following the work of Klingler et al., 2021 . For
total_upstrmdata, there are 859 stations with 61 static features and 17 dynamic features. The temporal extent of data is from 1981-01-01 to 2019-12-31.- __init__(*, timestep: str = 'D', data_type: str = 'total_upstrm', path=None, to_netcdf: bool = True, overwrite=False, **kwargs)[source]
- Parameters:
path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless
overwriteis set to True.timestep – possible values are
Dfor daily orHfor hourly timestepdata_type – possible values are
total_upstrm,intermediate_allorintermediate_lowimp
Examples
>>> from aqua_fetch import LamaHCE >>> dataset = LamaHCE(timestep='D', data_type='total_upstrm') # The daily dataset is from 859 with 80 static and 22 dynamic features >>> len(dataset.stations()), len(dataset.static_features), len(dataset.dynamic_features) (859, 80, 22) >>> df = dataset.fetch(3, as_dataframe=True) >>> df.shape (313368, 3) >>> dataset = LamaHCE(timestep='H', data_type='total_upstrm') >>> len(dataset.stations()), len(dataset.static_features), len(dataset.dynamic_features) (859, 80, 17) >>> dataset.fetch_dynamic_features('1', features = ['q_cms_obs'])
- property dyn_map
A dictionary that maps dynamic features to their names in the dataset.
- property dynamic_features
Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_dynamic_features().- Return type:
List[str]
- property end
end of data
- fetch_static_features(stations: str | List[str] = 'all', static_features: str | List[str] = None) DataFrame[source]
static features of LamaHCE
- Parameters:
stations (str) – name/id of station of which to extract the data
static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.
Examples
>>> from aqua_fetch import LamaHCE >>> dataset = LamaHCE(timestep='D', data_type='total_upstrm') >>> df = dataset.fetch_static_features('99') # (1, 61) ... # get list of all static features >>> dataset.static_features >>> dataset.fetch_static_features('99', >>> static_features=['area_calc', 'elev_mean', 'agr_fra', 'sand_fra']) # (1, 4)
- fetch_stations_features(stations: list, dynamic_features='all', static_features=None, st=None, en=None, as_dataframe: bool = False, **kwargs)[source]
Reads attributes of more than one stations.
This function checks of .nc files exist, then they are not prepared and saved otherwise first nc files are prepared and then the data is read again from nc files. Upon subsequent calls, the nc files are used for reading the data.
- Parameters:
stations – list of stations for which data is to be fetched.
dynamic_features – list of dynamic attributes to be fetched. if ‘all’, then all dynamic attributes will be fetched.
static_features – list of static attributes to be fetched. If all, then all static attributes will be fetched. If None, then no static attribute will be fetched.
st – start of data to be fetched.
en – end of data to be fetched.
as_dataframe – whether to return the data as pandas dataframe. default is
xarray.Datasetobjectdict (kwargs) – additional keyword arguments
- Returns:
tuple – A tuple of static and dynamic features. Static features are always returned as
pandas.DataFramewith shape (stations, static features). The index of static features’ DataFrame is the station/gauge ids while the columns are names of the static features. Dynamic features are returned either asxarray.Datasetorpandas.DataFramedepending upon whether as_dataframe is True or False and whether thexarraylibrary is installed or not. If dynamic features arexarray.Dataset, then this dataset consists of data_vars equal to the number of stations and station names asxarray.Dataset.variablesand time and dynamic_features as dimensions and coordinates. If dynamic features are returned aspandas.DataFrame, then the first index is time and the second index is dynamic_features.Raises – ValueError, if both dynamic_features and static_features are None
Examples
>>> from aqua_fetch import CAMELS_AUS >>> dataset = CAMELS_AUS() ... # find out station ids >>> dataset.stations() ... # get data of selected stations >>> dataset.fetch_stations_features(['912101A', '912105A', '915011A'], ... as_dataframe=True)
- property static_features: List[str]
Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_static_features().- Return type:
List[str]
- property static_map: Dict[str, str]
A dictionary that maps static features to their names in the dataset.
- stations() list[source]
Names/ids of stations/catchment/gauges or whatever that would be used to index each station in the dataset. Since this is a method, it is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again The user is recommended to implement this method in the child class in a more efficient way.
- class aqua_fetch.rr.LamaHIce(path=None, overwrite=False, *, timestep: str = 'D', data_type: str = 'total_upstrm', to_netcdf: bool = True, **kwargs)[source]
Bases:
LamaHCEDaily and hourly hydro-meteorological time series data of 111 river basins of Iceland following Helgason et al., 2024. The total period of dataset is from 1950 to 2021 for daily and 1976-20023 for hourly timestep. The average length of daily data is 33 years while for that of hourly it is 11 years. The dataset is available on hydroshare
- __init__(path=None, overwrite=False, *, timestep: str = 'D', data_type: str = 'total_upstrm', to_netcdf: bool = True, **kwargs)[source]
- Parameters:
path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless
overwriteis set to True.timestep – possible values are
Dfor daily orHfor hourly timestepdata_type – possible values are
total_upstrm,intermediate_allorintermediate_lowimp
- basin_attributes() DataFrame[source]
returns basin attributes which are catchment attributes, water balance all attributes and water balance filtered attributes
- Returns:
a dataframe of shape (111, 104) where 104 are the static catchment/basin attributes
- Return type:
pd.DataFrame
- property dyn_map
A dictionary that maps dynamic features to their names in the dataset.
- property end
end of data
- fetch_clim_features(stations: str | List[str] = None)[source]
Returns climate time series data for one or more stations
- Return type:
pd.DataFrame
- fetch_q(stations: str | List[str] = None, qc_flag: int = None)[source]
returns streamflow for one or more stations
- Parameters:
- Returns:
a
pandas.DataFramewhose index is the time and columns are names of stations For daily timestep, the dataframe has shape of 32630 rows and 111 columns- Return type:
pd.DataFrame
- fetch_static_features(stations: str | list = 'all', static_features: str | list = None) DataFrame[source]
fetches static features of one or more stations
- fetch_stn_meteo(stn: str, nrows: int = None) DataFrame[source]
returns climate/meteorological time series data for one station
- Returns:
a
pandas.DataFramewith 23 columns- Return type:
pd.DataFrame
- gauge_attributes() DataFrame[source]
returns gauge attributes from following two files
Gauge_attributes.csv
hydro_indices_1981_2018.csv
- Returns:
a dataframe of shape (111, 28)
- Return type:
pd.DataFrame
- property gauges_path
returns the path where gauge data files are located
- property q_dir
returns the path where q files are located
- q_mmd(stations: str | List[str] = None) DataFrame[source]
returns streamflow in the units of milimeter per day. This is obtained by diving q_cms/area
- Parameters:
stations (str/list) – name/names of stations. Default is None, which will return area of all stations
- Returns:
a
pandas.DataFramewhose indices are time-steps and columns are catchment/station ids.- Return type:
pd.DataFrame
- property q_path
path where all q files are located
- read_ts_of_station(station: str) DataFrame[source]
Reads daily dynamic (meteorological + streamflow) data for one catchment and returns as DataFrame
- class aqua_fetch.rr.NPCTRCatchments(path=None, timestep: str = 'Hourly', **kwargs)[source]
Bases:
_RainfallRunoffHigh-resolution streamflow and weather data (2013–2019) for seven small coastal watersheds in the northeast Pacific coastal temperate rainforest, Canada following Korver et al., 2022 . The data include 8 dynamic features at hourly and 5 min timestep and 14 static features. The dynamic features include streamflow, precipitation, temperature, relative humidity, wind speed, wind direction, and solar radiation.
Examples
>>> from aqua_fetch import NPCTRCatchments >>> ds = NPCTRCatchments() >>> ds.stations ['626', '693', '703', '708', '819', '844', '1015'] >>> len(ds.static_features) 12 >>> area = ds.area() >>> area.shape (7,) >>> coords = ds.stn_coords() >>> coords.shape (7, 2)
- __init__(path=None, timestep: str = 'Hourly', **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str)
verbosity (int) – 0: no message will be printed
kwargs (dict) – Any other keyword arguments for the Datasets class
- all_stn_coords() DataFrame[source]
Using coordinate information of Stream Sensor Nodes, assuming that stream sensors would be closer to the stream gauge. The values are taken from Table A1 of paper
- property dynamic_features: List[str]
Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_dynamic_features().- Return type:
List[str]
- fetch_static_features(stations: str | list = 'all', static_features: str | list = 'all') DataFrame[source]
Fetches all or selected static features of one or more stations.
- Parameters:
stations (str) – name/id of station of which to extract the data
static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.
- Returns:
- Return type:
pd.DataFrame
Examples
>>> from aqua_fetch import NPCTRCatchments >>> dataset = NPCTRCatchments() >>> dataset.fetch_static_features('626') >>> dataset.static_features >>> dataset.fetch_static_features('626', ... static_features=['area_km2', 'elev_catch_m', 'slope_%'])
- read_pcp()[source]
Examples
>>> ds = NPCTRCatchments() >>> pcp = ds.read_pcp() >>> pcp.shape (849472, 5) >>> pcp['Site'].nunique() 15 pcp.index[0], pcp.index[-1] (Timestamp('2013-09-09 21:00:00'), Timestamp('2019-10-01 00:00:00')) # A is accepted and E is estimated >>> pcp['Qflags'].unique() [nan, 'AV', 'EV', 'EV: Sensor malfunction due to wolf bite'] >>> ds = NPCTRCatchments(timestep='5min') >>> pcp = ds.read_pcp() >>> pcp.shape (8712098, 5) >>> pcp['Site'].nunique() 14 >>> pcp.index[0], pcp.index[-1] (Timestamp('2013-09-05 00:00:00'), Timestamp('2019-10-01 00:00:00'))
- read_rel_hum()[source]
Examples
>>> ds = NPCTRCatchments() >>> rh = ds.read_rel_hum() >>> rh.shape (849472, 4) >>> rh['Site'].nunique() 15 >>> rh.index[0], rh.index[-1] (Timestamp('2013-09-10 00:00:00'), NaT) ... getting data for 5min timestep >>> ds = NPCTRCatchments(timestep='5min') >>> rh_5m = ds.read_rel_hum() >>> rh_5m.shape (8281767, 3) >>> rh_5m['Site'].nunique() 13 >>> rh_5m.index[0], rh.index[-1] (Timestamp('2013-09-10 00:00:00'), NaT) >>> rh_5m['Qlevel'].unique() ['1', '2', '3', nan]
- read_snow_depth()[source]
Examples
>>> from aqua_fetch import NPCTRCatchments >>> ds = NPCTRCatchments() >>> snowdepth = ds.read_snow_depth() >>> snowdepth.shape (105016, 15) ... get 5min timestep data >>> ds = NPCTRCatchments(timestep='5min') >>> snowdepth = ds.read_snow_depth() >>> snowdepth.shape (105016, 15)
- read_sol_rad()[source]
Solar radiation is common among all stations so no ‘Site’ column is present in the dataframe.
Examples
>>> from aqua_fetch import NPCTRCatchments >>> ds = NPCTRCatchments() >>> solrad = ds.read_sol_rad() >>> solrad.shape (53072, 3) >>> solrad['Qflags_SolarRad'].unique() ['AV', 'EV'] >>> ds = NPCTRCatchments(timestep='5min') >>> solrad = ds.read_sol_rad() >>> solrad.shape (637108, 3) >>> solrad['SolarRadQ_flags'].nunique() 4
- read_temp()[source]
Examples
>>> from aqua_fetch import NPCTRCatchments >>> ds = NPCTRCatchments() >>> temp = ds.read_temp() >>> temp.shape (745836, 4) >>> temp['Site'].nunique() 14 >>> temp['Qflag'].unique() [nan, 'AV', 'EV'] >>> temp['Qlevel'].unique() [nan, 2., 3., 1.] >>> ds = NPCTRCatchments(timestep='5min') >>> temp_5m = ds.read_temp() >>> temp_5m.shape (8957388, 3) >>> temp_5m['Site'].nunique() 14 >>> temp_5m['Qlevel'].unique() [1, 2] >>> temp_5m['Qflags'].nunique() 5344
- read_wind_dir()[source]
>>> from aqua_fetch import NPCTRCatchments >>> ds = NPCTRCatchments() >>> winddir = ds.read_wind_dir() >>> winddir.shape (371651, 4) >>> winddir['Site'].nunique() 7 >>> winddir['Site'].unique() ['WSN626', 'SSN693', 'WSN693703', 'WSN703708', 'WSN8191015',
- ‘BuxtonEast’, ‘RefStn’]
… getting data for 5min timestep >>> ds = NPCTRCatchments(timestep=’5min’) >>> winddir = ds.read_wind_dir() >>> winddir.shape (5096864, 4) >>> winddir[‘Site’].nunique() 8 >>> winddir[‘Site’].unique() [‘WSN626’, ‘SSN693’, ‘WSN693703’, ‘WSN703708’, ‘WSN8191015’,
‘BuxtonEast’, ‘Hecate’, ‘RefStn’]
- read_wind_speed()[source]
Examples
>>> from aqua_fetch import NPCTRCatchments >>> ds = NPCTRCatchments() >>> ws = ds.read_wind_speed() >>> ws.shape (424744, 4) >>> ws['Site'].nunique() 8 >>> ws['Site'].unique() ['WSN626', 'SSN693', 'WSN693703', 'WSN703708', 'WSN8191015', 'BuxtonEast', 'Hecate', 'RefStn'] >>> ws.index[0], ws.index[-1] (Timestamp('2013-09-09 20:00:00'), Timestamp('2019-10-01 00:00:00')) ... getting data for 5min timestep >>> ds = NPCTRCatchments(timestep='5min') >>> ws = ds.read_wind_speed() >>> ws.shape (5096864, 4) >>> ws['Site'].nunique() 8
- property static_features: List[str]
Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_static_features().- Return type:
List[str]
- stations() List[str][source]
Names/ids of stations/catchment/gauges or whatever that would be used to index each station in the dataset. Since this is a method, it is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again The user is recommended to implement this method in the child class in a more efficient way.
- stn_coords(stations='all', sensor='SSN') DataFrame[source]
By default uses coordinate information of Stream Sensor Nodes, assuming that stream sensors would be closer to the stream gauge. The values are taken from Table A1 of paper
- class aqua_fetch.rr.Poland(path: str | PathLike = None, estreams_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
Bases:
_EStreamsData of 1287 catchments of Poland. The observed streamflow data is downloaded from https://danepubliczne.imgw.pl . The meteorological data, static catchment features and catchment boundaries are taken from
aqua_fetch.EStreamsfollwoing the works of Nascimento et al., 2024 . Therefore, the number of static features are 214 and dynamic features are 10 and the data is available from 1951-01-01 to 2023-06-30.Examples
>>> from aqua_fetch import Poland >>> dataset = Poland() >>> _, data = dataset.fetch(0.1) # the returned data will be a xarray Dataset >>> type(data) xarray.core.dataset.Dataset >>> data.dims FrozenMappingWarningOnValuesAccess({'time': 26844, 'dynamic_features': 10}) >>> len(data.data_vars) # number of stations for which data has been fetched 128 >>> _, data = dataset.fetch(stations=1) # get data of only one random station # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 1287 # get data by station id >>> _, data = dataset.fetch(stations='PL000001') # get names of available dynamic features >>> dataset.dynamic_features # get only selected dynamic features >>> _, data = dataset.fetch(1, ... dynamic_features=['pcp_mm', 'rh_%', 'airtemp_C_mean', 'pet_mm', 'q_cms_obs']) # get names of available static features >>> dataset.static_features # get data of 10 random stations >>> _, data = dataset.fetch(10) >>> len(data.data_vars) 10 # If we want to get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='PL000001', static_features="all") >>> static.shape, len(dynamic.data_vars) ((1, 214), 1) >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape (1287, 2) >>> dataset.stn_coords('PL000001') # returns coordinates of station whose id is PL000001 49.921848 18.327913 >>> dataset.stn_coords(['PL000001', 'PL000002']) # returns coordinates of two stations PL000001 49.921848 18.327913 PL000002 49.954769 18.326323
- __init__(path: str | PathLike = None, estreams_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str)
verbosity (int) – 0: no message will be printed
kwargs (dict) – Any other keyword arguments for the Datasets class
- property csv_files_dir: str
path where csv (obtained after extracting zip files) files will be stored
- class aqua_fetch.rr.Portugal(path: str | PathLike = None, estreams_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
Bases:
_EStreamsData of 280 catchments of Portugal. The observed streamflow data is downloaded from https://snirh.apambiente.pt . The meteorological data, static catchment features and catchment boundaries for the 280 catchments are taken from
aqua_fetch.EStreamsfollwoing the works of Nascimento et al., 2024 project. Therefore, the number of static features are 214 and dynamic features are 10 and the data is available from 1972-01-01 to 2022-12-31.Examples
>>> from aqua_fetch import Portugal >>> dataset = Portugal() >>> _, data = dataset.fetch(0.1) # the returned data will be a xarray Dataset >>> type(data) xarray.core.dataset.Dataset >>> data.dims FrozenMappingWarningOnValuesAccess({'time': 18628, 'dynamic_features': 10}) >>> len(data.data_vars) # number of stations for which data has been fetched 28 >>> _, data = dataset.fetch(stations=1) # get data of only one random station # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 280 # get data by station id >>> _, data = dataset.fetch(stations='PT000001') # get names of available dynamic features >>> dataset.dynamic_features # get only selected dynamic features >>> _, data = dataset.fetch(1, ... dynamic_features=['pcp_mm', 'rh_%', 'airtemp_C_mean', 'pet_mm', 'q_cms_obs']) # get names of available static features >>> dataset.static_features # get data of 10 random stations >>> _, data = dataset.fetch(10) >>> len(data.data_vars) 10 # If we want to get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='PT000001', static_features="all") >>> static.shape, len(dynamic.data_vars) ((1, 214), 1) >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape (280, 2) >>> dataset.stn_coords('PT000001') # returns coordinates of station whose id is PT000001 41.794998 -7.969 >>> dataset.stn_coords(['PT000001', 'PT000002']) # returns coordinates of two stations PT000001 41.794998 -7.969 PT000002 39.679001 -8.437
- __init__(path: str | PathLike = None, estreams_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str)
verbosity (int) – 0: no message will be printed
kwargs (dict) – Any other keyword arguments for the Datasets class
- gauge_id_basin_id_map() dict[source]
For example for Portugal, it is guage_id : ‘03J/02H’ basin_id ‘PT000001’ ‘03J/02H’ -> ‘PT000001’
for Slovenia, it is gauge id : 1060 basin_id : SI000001 ‘1060’ -> ‘SI000001’
- get_q(as_dataframe: bool = True)[source]
returns the streamflow data of Portugal as xarray.Dataset or pandas.DataFrame
- Returns:
xarray.Dataset or pandas.DataFrame. If as_dataframe is True, returns pandas.DataFrame
with columns as station codes and index as time. If as_dataframe is False, returns
xarray.Dataset with station codes as variables and time as dimension.
- class aqua_fetch.RRLuleaSweden(path=None, **kwargs)[source]
Bases:
DatasetsRainfall runoff data for an urban catchment from 2016-2019 following the work of Broekhuizen et al., 2020 .
- __init__(path=None, **kwargs)[source]
- Parameters:
name – str (default=None) name of dataset
units – str, (default=None) the unit system being used
path – str (default=None) path where the data is available (manually downloaded). If None, it will be downloaded
processes – int number of processes to use for parallel processing
verbosity – int determines the amount of information to be printed
remove_zip – bool whether to remove the zip files after unz
- fetch(st: str | int | DatetimeIndex = None, en: str | int | DatetimeIndex = None)[source]
fetches rainfall runoff data
- Parameters:
st (optional) – start of data to be fetched. By default the data starts from 2016-06-16 20:50:00
en (optional) – end of data to be fetched. By default the end is 2019-09-15 18:41
- fetch_flow(st: str | int | DatetimeIndex = None, en: str | int | DatetimeIndex = None) DataFrame[source]
fetches flow data
- Parameters:
st (optional) – start of data to be fetched. By default the data starts from 2016-06-16 20:50:00
en (optional) – end of data to be fetched. By default the end is 2019-09-15 18:35:00
- Returns:
a dataframe of shape (37_618, 3) where the columns are velocity, level and flow rate
- Return type:
pd.DataFrame
Examples
>>> from aqua_fetch import RRLuleaSweden >>> dataset = RRLuleaSweden() >>> flow = dataset.fetch_flow() >>> flow.shape (37618, 3)
- fetch_pcp(st: str | int | DatetimeIndex = None, en: str | int | DatetimeIndex = None) DataFrame[source]
fetches precipitation data
- Parameters:
st (optional) – start of data to be fetched. By default the data starts from 2016-06-16 19:48:00
en (optional) – end of data to be fetched. By default the end is 2019-10-26 23:59:00
- Returns:
a dataframe of shape (967_080, 1)
- Return type:
pd.DataFrame
Examples
>>> from aqua_fetch import RRLuleaSweden >>> dataset = RRLuleaSweden() >>> pcp = dataset.fetch_pcp() >>> pcp.shape (967080, 1)
- class aqua_fetch.rr.Simbi(path: str = None, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]
Bases:
_RainfallRunoffmonthly rainfall from 1905 - 2005, daily rainfall from 1920-1940, 70 daily streamflow series, and 23 monthly temperature series for 24 catchments of Haiti
Data is obtained from Bathelemy et al., 2023 while related publication is Bathelemy et al., 2024
Examples
>>> from aqua_fetch import Simbi >>> simbi = Simbi()
- __init__(path: str = None, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]
- Parameters:
path – path where the Simbi dataset has been downloaded. This path must contain five zip files and one xlsx file. If None, then the data will be downloaded.
to_netcdf
- property boundary_id_map: str
Name of the attribute in the boundary (shapefile/.gpkg) file that will be used to map the catchment/station id to the geometry of the catchment/station. This is used to create the boundary id map. if not given, then the first attribute in the boundary file will be used.
- property dyn_map: Dict[str, str]
A dictionary that maps dynamic features to their names in the dataset.
- property dynamic_features
Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_dynamic_features().- Return type:
List[str]
- property end
end of data
- fetch_static_features(stations: str | list = 'all', static_features: str | list = 'all') DataFrame[source]
Returns static features of one or more stations.
- Parameters:
stations (str/list) – name/id of station/stations of which to extract the data
static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.
- Returns:
a
pandas.DataFrameof shape (stations, features)- Return type:
pd.DataFrame
Examples
>>> from aqua_fetch import Simbi >>> dataset = Simbi() get all static data of all stations >>> stns = dataset.static_data_stations() >>> static_data = dataset.fetch_static_features(stns) >>> static_data.shape (24, 232) get static data of one station only >>> static_data = dataset.fetch_static_features('001') >>> static_data.shape (1, 232) get the names of static features >>> dataset.static_features get only selected features of all stations >>> static_data = dataset.fetch_static_features(stns, ['stream_density', 'pcp', 'Forest_lc_98']) >>> static_data.shape (24, 3) >>> data = dataset.fetch_static_features('001', static_features=['stream_density', 'pcp', 'Forest_lc_98']) >>> data.shape (1, 3)
- property static_features
Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_static_features().- Return type:
List[str]
- property static_map: Dict[str, str]
A dictionary that maps static features to their names in the dataset.
- class aqua_fetch.rr.Slovenia(path: str | PathLike = None, estreams_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
Bases:
_EStreamsData of 117 catchments of Portugal. The observed streamflow data is downloaded from https://vode.arso.gov.si . The meteorological data, static catchment features and catchment boundaries for the 117 catchments are taken from
aqua_fetch.EStreamsfollwoing the works of Nascimento et al., 2024 project. Therefore, the number of static features are 214 and dynamic features are 10 and the data is available from 1950-01-01 to 2023-12-31 .Examples
>>> from aqua_fetch import Slovenia >>> dataset = Slovenia() >>> _, data = dataset.fetch(0.1) # the returned data will be a xarray Dataset >>> type(data) xarray.core.dataset.Dataset >>> data.dims FrozenMappingWarningOnValuesAccess({'time': 27028, 'dynamic_features': 10}) >>> len(data.data_vars) 10 >>> _, df = dataset.fetch(stations=1) # get data of only one random station # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 117 # get data by station id >>> _, data = dataset.fetch(stations='SI000090') # get names of available dynamic features >>> dataset.dynamic_features # get only selected dynamic features >>> _, data = dataset.fetch(1, ... dynamic_features=['pcp_mm', 'rh_%', 'airtemp_C_mean', 'pet_mm', 'q_cms_obs']) # get names of available static features >>> dataset.static_features # get data of 10 random stations >>> _, data = dataset.fetch(10) # If we want to get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='SI000090', static_features="all") >>> static.shape, len(dynamic.data_vars) ((1, 214), 1) >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape (117, 2) >>> dataset.stn_coords('SI000090') # returns coordinates of station whose id is SI000090 45.865093 15.460184 >>> dataset.stn_coords(['SI000090', 'SI000002']) # returns coordinates of two stations SI000090 45.865093 15.460184 SI000002 46.648823 16.059244
- __init__(path: str | PathLike = None, estreams_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str)
verbosity (int) – 0: no message will be printed
kwargs (dict) – Any other keyword arguments for the Datasets class
- get_q(as_dataframe: bool = True)[source]
returns the streamflow data of Portugal as xarray.Dataset or pandas.DataFrame
- Returns:
xarray.Dataset or pandas.DataFrame. If as_dataframe is True, returns pandas.DataFrame
with columns as station codes and index as time. If as_dataframe is False, returns
xarray.Dataset with station codes as variables and time as dimension.
- class aqua_fetch.rr.Spain(path: str | PathLike = None, gsha_path: str | PathLike = None, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]
Bases:
_GSHAData of 889 catchments of Spain from ceh-es website. The meteorological data static catchment features and catchment boundaries taken from GSHA project. Therefore, the number of static features are 35 and dynamic features are 27 and the data is available from 1979-01-01 to 2020-09-30.
- __init__(path: str | PathLike = None, gsha_path: str | PathLike = None, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str)
verbosity (int) – 0: no message will be printed
kwargs (dict) – Any other keyword arguments for the Datasets class
- daily_q_all_areas() DataFrame[source]
Daily data of gauging stations in river from all areas
Retuns
16_806_305 rows x 3
- daily_q_area(area: str) DataFrame[source]
Reads Daily data of gauging stations in river which is in afliq.csv file
- get_q(as_dataframe: bool = True)[source]
returns daily q of all stations
- Returns:
a
pandas.DataFrameof shape (39721, 1447)- Return type:
pd.DataFrame
- class aqua_fetch.Thailand(path: str | PathLike = None, gsha_path: str | PathLike = None, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]
Bases:
_GSHAData of 73 catchments of Thailand from RID project . The meteorological data static catchment features and catchment boundaries taken from GSHA project. Therefore, the number of static features are 35 and dynamic features are 27 and the data is available from 1980-01-01 to 1999-12-31.
- __init__(path: str | PathLike = None, gsha_path: str | PathLike = None, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str)
verbosity (int) – 0: no message will be printed
kwargs (dict) – Any other keyword arguments for the Datasets class
- class aqua_fetch.USGS(path: str | PathLike = None, hysets_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
Bases:
_RainfallRunoffThis class handles the hydrometeorological data for the USA. The daily and hourly discharge data is downloaded from usgs/nwis website . The data is optionally stored in a netCDF file if xarray is available. Currently the data is downloaded for only those sites/catchments that are in the HYSETS database. This is because the catchment boundaries are taken from HYSETS database using
aqua_fetch.HYSETS.For hourly timestep, “iv” service is used to download the instantaneous data which is then resampled to hourly data. Data with only
A, [92],A, [91],A, [93],A, e,Aflags is used. For daily streamflow, “dv” service is used to download the data. In this case, the data with onlyAandA, eflags is used.- __init__(path: str | PathLike = None, hysets_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
- Parameters:
path (str) – Path to store the data
- area(stations: str | List[str] = 'all') Series[source]
Returns area_gov (Km2) of all catchments as
pandas.Series- Parameters:
stations (str/list) – name/names of stations. Default is None, which will return area of all stations
- Returns:
a
pandas.Serieswhose indices are catchment ids and values are areas of corresponding catchments.- Return type:
pd.Series
Examples
>>> from aqua_fetch import USGS >>> dataset = USGS() >>> dataset.area() # returns area of all stations >>> dataset.area('912101A') # returns area of station whose id is 912101A >>> dataset.area(['912101A', '12388200']) # returns area of two stations
- property dynamic_features: List[str]
Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_dynamic_features().- Return type:
List[str]
- fetch_static_features(stations: str | List[str] = 'all', static_features: str | List[str] = 'all', st=None, en=None) DataFrame[source]
returns static atttributes of one or multiple stations
- Parameters:
stations (str) – name/id of station of which to extract the data
static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.
st
en
Examples
>>> from aqua_fetch import USGS >>> dataset = USGS() get the names of stations >>> stns = dataset.stations() >>> len(stns) 12004 get all static data of all stations >>> static_data = dataset.fetch_static_features(stns) >>> static_data.shape (12004, 27) get static data of one station only >>> static_data = dataset.fetch_static_features('01010070') >>> static_data.shape (1, 27) get the names of static features >>> dataset.static_features get only selected features of all stations >>> static_data = dataset.fetch_static_features(stns, ['area_km2', 'Elevation_m']) >>> static_data.shape (12004, 2)
- fetch_stations_features(stations: list, dynamic_features: str | list | None = 'all', static_features: str | list | None = None, st=None, en=None, as_dataframe: bool = False, **kwargs) Tuple[DataFrame, DataFrame | Dataset][source]
returns features of multiple stations
Examples
>>> from aqua_fetch import USGS >>> dataset = USGS() >>> stations = dataset.stations()[0:3] >>> features = dataset.fetch_stations_features(stations)
- property static_features: List[str]
Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_static_features().- Return type:
List[str]
- property static_map: Dict[str, str]
A dictionary that maps static features to their names in the dataset.
- stations() List[str][source]
Names/ids of stations/catchment/gauges or whatever that would be used to index each station in the dataset. Since this is a method, it is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again The user is recommended to implement this method in the child class in a more efficient way.
- stn_coords(stations: str | List[str] = 'all') DataFrame[source]
returns coordinates of stations as DataFrame with
longandlatas columns.- Parameters:
stations – name/names of stations. If not given, coordinates of all stations will be returned.
- Returns:
pandas.DataFramewithlongandlatcolumns. The length of dataframe will be equal to number of stations wholse coordinates are to be fetched.- Return type:
pd.DataFrame
Examples
>>> dataset = USGS() >>> dataset.stn_coords() # returns coordinates of all stations >>> dataset.stn_coords('01010000') # returns coordinates of station whose id is 912101A >>> dataset.stn_coords(['01010000', '01010070']) # returns coordinates of two stations
- class aqua_fetch.rr.WaterBenchIowa(path=None, **kwargs)[source]
Bases:
_RainfallRunoffRainfall run-off dataset for Iowa (US) following the work of Demir et al., 2022 This is hourly dataset of 125 catchments with 7 static features and 3 dyanmic features (pcp, et, discharge) for each catchment. The dyanmic features are timeseries from 2011-10-01 12:00 to 2018-09-30 11:00.
Examples
>>> from aqua_fetch import WaterBenchIowa >>> ds = WaterBenchIowa() ... # fetch static and dynamic features of 5 stations >>> data = ds.fetch(5, as_dataframe=True) >>> data.shape # it is a multi-indexed DataFrame (184032, 5) ... # fetch both static and dynamic features of 5 stations >>> data = ds.fetch(5, static_features="all", as_dataframe=True) >>> data.keys() dict_keys(['dynamic', 'static']) >>> data['static'].shape (5, 7) >>> data['dynamic'] # returns a xarray DataSet ... # using another method >>> data = ds.fetch_dynamic_features('644', as_dataframe=True) >>> data.unstack().shape (61344, 3) # when we get both static and dynamic data, the returned data is a dictionary # with ``static`` and ``dyanic`` keys. >>> data = ds.fetch(stations='644', static_features="all", as_dataframe=True) >>> data['static'].shape, data['dynamic'].shape >>> ((1, 7), (184032, 1))
- __init__(path=None, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str)
verbosity (int) – 0: no message will be printed
kwargs (dict) – Any other keyword arguments for the Datasets class
- property dyn_map
A dictionary that maps dynamic features to their names in the dataset.
- property dynamic_features: List[str]
Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_dynamic_features().- Return type:
List[str]
- property end
end of data
- fetch_static_features(stations: str | List[str], static_features: str | List[str] = 'all') DataFrame[source]
- Parameters:
stations (str) – name/id of station of which to extract the data
static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.
Examples
>>> from aqua_fetch import WaterBenchIowa >>> dataset = WaterBenchIowa() get the names of stations >>> stns = dataset.stations() >>> len(stns) 125 get all static data of all stations >>> static_data = dataset.fetch_static_features(stns) >>> static_data.shape (125, 7) get static data of one station only >>> static_data = dataset.fetch_static_features('592') >>> static_data.shape (1, 7) get the names of static features >>> dataset.static_features get only selected features of all stations >>> static_data = dataset.fetch_static_features(stns, ['slope', 'area_km2']) >>> static_data.shape (125, 2) >>> data = dataset.fetch_static_features('592', static_features=['slope', 'area_km2']) >>> data.shape (1, 2)
- fetch_station_attributes(station: str, dynamic_features: str | list | None = 'all', static_features: str | list | None = None, st: str | None = None, en: str | None = None, **kwargs) DataFrame[source]
Examples
>>> from aqua_fetch import WaterBenchIowa >>> dataset = WaterBenchIowa() >>> data = dataset.fetch_station_attributes('666')
- property static_features: List[str]
Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_static_features().- Return type:
List[str]
- property static_map: Dict[str, str]
A dictionary that maps static features to their names in the dataset.
- stations() List[str][source]
Names/ids of stations/catchment/gauges or whatever that would be used to index each station in the dataset. Since this is a method, it is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again The user is recommended to implement this method in the child class in a more efficient way.
The following datasets are very much similar to RainfallRunoff datasets, but they do not have observed streamflow data. They are used to provide static and dynamic features to other datasets.
- class aqua_fetch.GSHA(path=None, overwrite: bool = False, to_netcdf: bool = True, **kwargs)[source]
Bases:
_RainfallRunoffGlobal streamflow characteristics, hydrometeorology and catchment attributes following Peirong et al., 2023. The data is downloaded from its zenodo repository. It should be noted that this dataset does not contain observed streamflow data. It has 21568 stations, 26 dynamic (meteorological + storage) features with daily timestep, 21 dynamic features (landcover + streamflow indices + reservoir) with yearly timestep and 35 static features.
Examples
>>> from aqua_fetch import GSHA >>> dataset = GSHA() >>> len(dataset.stations()) 21568 >>> dataset.agencies ['arcticnet', 'AFD', 'GRDC', 'IWRIS', 'MLIT', 'HYDAT', 'ANA', 'BOM', 'CCRR', 'China', 'CHP', 'RID', 'USGS'] >>> dataset.start Timestamp('1979-01-01 00:00:00') >>> dataset.end Timestamp('2022-12-31 00:00:00') >>> dataset.static_features ['ele_mt_uav', 'slp_dg_uav', 'lat', 'long', 'area_km2', 'agency', ...] >>> len(dataset.dynamic_features) 26 >>> len(dataset.daily_dynamic_features) 26 >>> len(dataset.yearly_dynamic_features) 21 >>> dataset.fetch_static_features('1001_arcticnet') fetch static features for all stations of arcticnet agency >>> dataset.fetch_static_features(agency='arcticnet') fetch static features for all stations of arcticnet agency >>> ds.fetch_dynamic_features(agency='arcticnet')
- __init__(path=None, overwrite: bool = False, to_netcdf: bool = True, **kwargs)[source]
- Parameters:
to_netcdf (bool) – whether to convert all the data into one netcdf file or not. This will fasten repeated calls to fetch etc but will require netcdf5 package as well as xarry.
- property agencies: List[str]
returns the names of agencies as list
arcticnet: AntarcticaAFD: SpainGRDC: GlobalIWRIS: IndiaMLIT: JapanHYDAT: CanadaANA: BrazilBOM: AustraliaCCRR: ChileChinaCHP: ChinaRID: ThailandUSGS
- atlas(stations: List[str] = 'all', agency: List[str] = 'all') DataFrame[source]
The link table between GSHA watershed IDs and RiverATLAS river reach IDs, as well as the selected static attributes
- Returns:
a
pandas.DataFrameof shape (n, 24) where n is the number of stations- Return type:
pd.DataFrame
- property boundary_id_map: str
Name of the attribute in the boundary (shapefile/.gpkg) file that will be used to map the catchment/station id to the geometry of the catchment/station. This is used to create the boundary id map. if not given, then the first attribute in the boundary file will be used.
- property dyn_map
A dictionary that maps dynamic features to their names in the dataset.
- property dynamic_features: List[str]
Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_dynamic_features().- Return type:
List[str]
- fetch_dynamic_features(stations: List[str] | str = 'all', dynamic_features='all', st=None, en=None, as_dataframe=False, agency: List[str] = 'all') Dataset[source]
Fetches all or selected dynamic features of one station.
- Parameters:
stations (str) – name/id of station of which to extract the data
features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available dynamic features are returned.
st (Optional (default=None)) – start time from where to fetch the data.
en (Optional (default=None)) – end time untill where to fetch the data
as_dataframe (bool, optional (default=False)) – if true, the returned data is
pandas.DataFrameotherwise it is xarray dataset
Examples
>>> from aqua_fetch import GSHA >>> dataset = GSHA() >>> data = dataset.fetch_dynamic_features('1001_arcticnet', as_dataframe=True) >>> data.shape (16071, 26) >>> dataset.dynamic_features >>> stns = ['1001_arcticnet', '10062_arcticnet'] >>> data = dataset.fetch_dynamic_features(stns, ... dynamic_features=['airtemp_C_mean_era5', 'pcp_mm_mswep'])
- fetch_lai(stations: List[str] = 'all', agency: List[str] = 'all')[source]
Leaf Area Index timeseries for one or more than one station either as
xarray.Datasetorpandas.DataFrame. The data has daily timestep.
- fetch_meteo_vars(stations: List[str] = 'all', agency: List[str] = 'all')[source]
Meteorological variables from 1979-01-01 to 2022-12-31 for one or more than one station either as
xarray.Datasetor dictionary. The data has daily timestep.
- fetch_static_features(stations: str | List[str] = 'all', static_features: str | List[str] = 'all', agency: List[str] = 'all') DataFrame[source]
Returns static features of one or more stations.
- Parameters:
stations (str) – name/id of station/stations of which to extract the data
static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.
- Returns:
a
pandas.DataFrameof shape (stations, features)- Return type:
pd.DataFrame
Examples
>>> from aqua_fetch import GSHA >>> dataset = GSHA() get the names of stations >>> stns = dataset.stations() >>> len(stns) 21568 get all static data of all stations >>> static_data = dataset.fetch_static_features(stns) >>> static_data.shape (21568, 35) get static data of one station only >>> static_data = dataset.fetch_static_features('1001_arcticnet') >>> static_data.shape (1, 35) get the names of static features >>> dataset.static_features get only selected features of all stations >>> static_data = dataset.fetch_static_features(stns, ['ele_mt_uav', 'slp_dg_uav']) >>> static_data.shape (21568, 2) >>> data = dataset.fetch_static_features('1001_arcticnet', static_features=['slp_dg_uav', 'slp_dg_uav']) >>> data.shape (1, 2) >>> out = ds.fetch_static_features(agency='arcticnet') >>> out.shape (106, 35
- fetch_stn_dynamic_features(station: str, dynamic_features='all') DataFrame[source]
Fetches all or selected dynamic features of one station.
- Parameters:
station (str) – name/id of station of which to extract the data
features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available dynamic features are returned.
- Returns:
a
pandas.DataFrameof shape (n, features) where n is the number of days- Return type:
pd.DataFrame
Examples
>>> from aqua_fetch import GSHA >>> dataset = GSHA() >>> data = dataset.fetch_stn_dynamic_features('1001_arcticnet') >>> data.shape (16071, 26) >>> dataset.dynamic_features >>> data = dataset.fetch_stn_dynamic_features('1001_arcticnet', ... dynamic_features=['airtemp_C_mean_era5', 'pcp_mm_mswep']) >>> data.shape (16071, 2)
- fetch_storage_vars(stations: List[str] = 'all', agency: List[str] = 'all')[source]
Water storage term variables from 1979-01-01 to 2021-12-31 for one or more than one station either as
xarray.Datasetor dictionary. The data has daily timestep.
- lai_stn(stn: str) Series[source]
Daily leaf area index. As per documentation, due to satellite data quality, some watersheds might have relatively serious data missing issue. The data is from 1981-01-01 to 2020-12-31.
- Returns:
a
pandas.Seriesof shape (14571,) where 14571 is the number of days- Return type:
pd.Series
- lc_variables(stations: List[str] = 'all', agency: List[str] = 'all')[source]
Landcover variables for one or more than one station either as
xarray.Datasetor dictionary. The data has yearly timestep.
- lc_variables_stn(stn: str) DataFrame[source]
Landcover variables for a given station which have yearly timestep. Following three landcover variables are provided:
urban_fraction(%): Ratio of urban extent to the entire watershed area (percentage).
forest_fraction(%): Ratio of forest extent to the entire watershed area (percentage).
cropland_fraction(%): Ratio of cropland extent to the entire watershed area (percentage).
- Returns:
a
pandas.DataFrameof shape (n, 3) where n is the number of years- Return type:
pd.DataFrame
- meteo_vars_all_stns()[source]
Meteorological variables from 1979-01-01 to 2022-12-31 for all stations either as
xarray.Datasetor dictionary. The data has daily timestep.
- meteo_vars_stn(stn: str) DataFrame[source]
Daily meteorological variables from 1979-01-01 to 2022-12-31 for a given station.
- Returns:
a
pandas.DataFrameof shape (16071, 19) where n is the number of days- Return type:
pd.DataFrame
- reservoir_variables(stations: List[str] = 'all', agency: List[str] = 'all')[source]
Reservoir variables for one or more than one station either as
xarray.Datasetor dictionary. The data has yearly timestep.
- reservoir_variables_stn(stn: str) DataFrame[source]
Reservoir variables for a given station from 1979 to 2020 with yearly timestep. Following two reservoir variables are provided:
capacity: Reservoir capacity of the year in the watershed (m3). To avoid including too many missing values, we use the ICOLD capacity in the linked table of the GeoDAR dataset.dor: Degree of regulation of the watershed (yearly reservoir capacity/yearly mean flow). If yearly mean flow is missing, the value is substituted with the average of all mean flow values.
- Returns:
a
pandas.DataFrameof shape (42, 2) where 42 is the number of years- Return type:
pd.DataFrame
- property static_features: List[str]
Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_static_features().- Return type:
List[str]
- property static_map: Dict[str, str]
A dictionary that maps static features to their names in the dataset.
- stn_coords(stations: List[str] = 'all', agency: List[str] = 'all') DataFrame[source]
returns the latitude and longitude of stations
- Returns:
a
pandas.DataFrameof shape (n, 2) where n is the number of stations- Return type:
pd.DataFrame
Examples
>>> from aqua_fetch import GSHA >>> dataset = GSHA() >>> dataset.stn_coords('1001_arcticnet') >>> dataset.stn_coords(['1001_arcticnet', '1002_arcticnet']) get coordinates for all stations of arcticnet agency >>> dataset.stn_coords(agency='arcticnet')
- storage_vars_all_stns()[source]
Water storage term variables from 1979-01-01 to 2021-12-31 for all stations either as
xarray.Datasetor dictionary. The data has daily timestep.
- storage_vars_stn(stn: str) DataFrame[source]
Daily Water storage term variables from 1979-01-01 to 2021-12-31 for a given station.
SM_layer1: 0-7 cm soil moisture from ERA5 land soil water layer 1 (m3/m3) for 1979-2021.
SM_layer2: 7-28 cm soil moisture from ERA5 land soil water layer 2 (m3/m3) for 1979-2021.
SM_layer3: 28-100 cm soil moisture from ERA5 land soil water layer 3 (m3/m3) for 1979-2021.
SM_layer4: 100-289 cm soil moisture from ERA5 land soil water layer 4 (m3/m3) for 1979-2021.
SWDE: Snow water equivalent from ERA5 snow depth water equivalent (m of water equivalent) for 1979-2021.
groundwater(%): Groundwater percentage from GRACE-FO data assimilation (%) for 2003-2021 (weekly).
- Returns:
a
pandas.DataFrameof shape (15706, 6) where n is the number of days- Return type:
pd.DataFrame
- streamflow_indices(stations: List[str] = 'all', agency: List[str] = 'all')[source]
Landcover variables for one or more than one station either as
xarray.Datasetor dictionary. The data has yearly timestep.
- streamflow_indices_stn(stn: str) DataFrame[source]
Streamflow indices for a given station which have yearly timestep.
- Returns:
a
pandas.DataFrameof shape (n, 16) where n is the number of years- Return type:
pd.DataFrame
- uncertainty(stations: List[str] = 'all', agency: List[str] = 'all') DataFrame[source]
Uncertainty estimates of all meteorological variables over all watersheds
P_uncertainty (%) Precipitation uncertainty estimates (in percentage). Uncertainties are calculated from EM-Earth deterministic and MSWEP datasets.
T_uncertainty (%) Temperature uncertainty estimates (in percentage). Uncertainties are calculated from EUSTACE, MERRA-2, and ERA5 datasets.
EVP_uncertainty (%) Actual evapotranspiration uncertainty estimates (in percentage). Uncertainties are calculated from GLEAM and REA datasets.
LRAD_uncertainty (%) Downward longwave radiation uncertainty estimates (in percentage). Uncertainties are calculated from MERRA-2 and ERA5-land datasets.
SRAD_uncertainty (%) Downward shortwave radiation uncertainty estimates (in percentage). Uncertainties are calculated from MERRA-2 and ERA5-land datasets.
wind_uncertainty (%) Wind speed uncertainty estimates (in percentage). The u- and v- components are aggregated on each grid to obtain wind speed. Uncertainties are calculated from MERRA-2 and ERA5-land datasets.
pet_uncertainty (%) Potential evapotranspiration uncertainty estimates (in percentage). Uncertainties are calculated from GLEAM and REA datasets.
- Returns:
a
pandas.DataFrameof shape (n, 7) where n is the number of stations- Return type:
pd.DataFrame
- class aqua_fetch.EStreams(path=None, **kwargs)[source]
Bases:
_RainfallRunoffHandles EStreams data following the work of Nascimento et al., 2024 . The data is available at its zenodo repository . It should be noted that this dataset does not contain observed streamflow data. It has 17130 stations, 9 dynamic (meteorological) features with daily timestep, 27 dynamic features with yearly timestep and 214 static features. The dynamic features are from 1950-01-01 to 2023-06-30.
Examples
>>> from aqua_fetch import EStreams >>> dataset = EStreams()
- __init__(path=None, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str)
verbosity (int) – 0: no message will be printed
kwargs (dict) – Any other keyword arguments for the Datasets class
- area(stations: List[str] = 'all', countries: List[str] = 'all') Series[source]
area of catchments im km2
- property dyn_map
A dictionary that maps dynamic features to their names in the dataset.
- property dynamic_features: List[str]
Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_dynamic_features().- Return type:
List[str]
- fetch_dynamic_features(stations: List[str] | str = 'all', dynamic_features='all', st=None, en=None, as_dataframe=False, countries: str | List[str] = 'all')[source]
Fetches all or selected dynamic features of one station.
- Parameters:
stations (str) – name/id of station of which to extract the data
features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available dynamic features are returned.
st (Optional (default=None)) – start time from where to fetch the data.
en (Optional (default=None)) – end time untill where to fetch the data
as_dataframe (bool, optional (default=False)) – if true, the returned data is
pandas.DataFrameotherwise it isxarray.Dataset
Examples
>>> from aqua_fetch import EStreams >>> camels = EStreams() >>> camels.fetch_dynamic_features('IEEP0281', as_dataframe=True).unstack() >>> camels.dynamic_features >>> camels.fetch_dynamic_features('IEEP0281', ... features=['p_mean', 't_mean', 'pet_mean'], ... as_dataframe=True).unstack()
- fetch_stn_dynamic_features(station: str, dynamic_features='all') DataFrame[source]
Fetches all or selected dynamic features of one station.
- Parameters:
station (str) – name/id of station of which to extract the data
features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available dynamic features are returned.
- Returns:
a
pandas.DataFrameof shape (n, features) where n is the number of days- Return type:
pd.DataFrame
Examples
>>> from aqua_fetch import EStreams >>> camels = EStreams() >>> camels.fetch_stn_dynamic_features('IEEP0281').unstack() >>> camels.dynamic_features >>> camels.fetch_stn_dynamic_features('IEEP0281', ... features=['p_mean', 't_mean', 'pet_mean']).unstack()
- hydro_clim_sigs(stations: List[str] = 'all', countries: List[str] = 'all') DataFrame[source]
Returns the hydro-climatic signatures of one or more stations
- Returns:
a
pandas.DataFrameof hydro-climatic signatures of shape (stations, 31)- Return type:
pd.DataFrame
- meteo_data(stations: str | List[str] = 'all', countries: List[str] | str = 'all')[source]
Returns the meteorological data of one or more stations either as dictionary of dataframes or xarray Dataset
- meteo_data_station(station: str) DataFrame[source]
Returns the meteorological data of a station
- Returns:
a
pandas.DataFrameof meteorological data of shape (time, 9)- Return type:
pd.DataFrame
- property static_features
Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_static_features().- Return type:
List[str]
- property static_map: Dict[str, str]
A dictionary that maps static features to their names in the dataset.
- stations() List[str][source]
Returns a list of all station names. Note that the basin_id column is used as the station name.
- stn_coords(stations: List[str] = 'all', countries: List[str] = 'all') DataFrame[source]
Returns the coordinates of one or more stations
- Returns:
a
pandas.DataFrameof shape (stations, 2)- Return type:
pd.DataFrame
Examples
>>> from aqua_fetch import EStreams >>> dataset = EStreams() >>> dataset.stn_coords('IEEP0281') >>> dataset.stn_coords(['IEEP0281', 'IEEP0282']) >>> dataset.stn_coords(countries='IE')