Rainfall Runoff datasets
This section include datasets which can be used for rainfall runoff modeling.
They all contain observed streamflow and meteological data as time series.
These are named as dynamic features. The physical catchment properties
are included as static features as tabular data, where each row corresponds
to one catchment and each column to one static feature.
In addition to published datasets, this package introduces 10 new datasets for rainfall-runoff modeling. These datasets have not yet been published but follow the CAMELS dataset series convention. They include Ireland, Finland, Italy, Poland, Portugal, Japan, Thailand, Arcticnet, Spain, and the USGS. The observed streamflow data are sourced from the national meteorological or hydrological websites of the respective countries. Catchment boundaries and meteorological data for Ireland, Finland, Italy, Poland, and Portugal are obtained from EStreams (Nascimento et al., 2024), and similarly for Japan, Thailand, Arcticnet, and Spain from GSHA (Peirong et al., 2023). For USGS, the catchment boundaries are sourced from HYSETS (Arsenault et al., 2020).
Although each data source has a dedicated, however
all datasets listed in Table List of datasets are accessible via the aqua_fetch.rr.RainfallRunoff
class, which allows for a unified and consistent approach to each dataset. The class
provides several methods to access static features, dynamic features, or catchment
boundaries. Although the raw data files for each dataset may come in different formats,
the methods to access these features through the aqua_fetch.rr.RainfallRunoff class remain the same.
Individual classes for each dataset are also available and may offer more control to
users over specific datasets. However, for most cases, the use of the aqua_fetch.rr.RainfallRunoff
class will suffice.
The naming and units of dynamic features in each dataset may vary. However, we have
standardized these features using the formula name_unit_specifier for each dynamic
feature across all datasets. In this formula, the specifier can indicate the source
(such as ERA5 or MSWEP for precipitation), the method used to calculate the feature
(like makkink or penman for evapotranspiration), or the aggregation type (min, max, mean).
For example, a precipitation dynamic feature from MSWEP would be labeled as pcp_mm_mswep.
This approach ensures that feature names are representative and understandable.
Dynamic features for which this method is inapplicable retain their original names.
Another feature of the AquaFetch is the optional inclusion of static and dynamic features from EStreams and GSHA for all datasets listed in Table List of datasets. This is beneficial as EStreams and GSHA include several static and dynamic features calculated for the catchments, which are not included in other datasets. For instance, EStreams provides information on annual variation in land use for all European catchments, a feature not available in CAMELS-GB (Coxon et al., 2020) or other European datasets. This step is optional since it initiaties the download of GSHA and EStreams datasets which can be time-consuming and may not always be necessary.
Certain datasets in this package feature overlapping stations from the same region.
For example, both the aqua_fetch.Bull and Spain datasets cover Spain.
However, the Bull dataset was introduced by by Aparicio et al., 2024 ,
whereas the Spain dataset was introduced in this work. The Spain dataset contains
more stations, totaling 889, while the Bull dataset includes 484 stations.
Similarly, both the CABra (Almagro et al., 2021) and CAMELS_BR (Chagas et al., 2020) datasets
cover Brazil and have been published in peer-reviewed journals. However, they differ
in their temporal coverage and the number of static and dynamic features. Furthermore,
Denmark is covered by two datasets, Caravan_DK (Koch 2022) and CAMELS_DK (Liu et al., 2024),
which differ in temporal coverage and the number of static and dynamic features.
The HYSETS dataset (Arsenault et al., 2020) covers Mexico, the US, and Canada. However,
we identified issues with the observed streamflow data for the US in HYSETS. As a
result, we introduced the USGS dataset, which focuses specifically on the US region.
The catchment boundaries, static features, and meteorological data for USGS, however,
are still obtained from HYSETS.
List of datasets
Source Name |
Class |
Number of Daily Stations |
Number of Hourly Stations |
Dynamic features |
Static features |
Temporal Coverage |
Spatial Coverage |
Reference |
|---|---|---|---|---|---|---|---|---|
|
|
106 |
27 |
35 |
1979 - 2003 |
Arctic (Russia) |
||
|
484 |
55 |
214 |
1990 - 2020 |
Spain |
|||
|
735 |
12 |
97 |
1980 - 2010 |
Brazil |
|||
|
5667 |
13 |
799 |
1900 - 2018 |
United States of America |
|||
|
222, 561 |
26 |
166, 187 |
1900 - 2018 |
Australia |
|||
|
897 |
10 |
67 |
1920 - 2019 |
Brazil |
|||
|
331 |
9 |
209 |
1981 - 2020 |
Switzerland |
|||
|
516 |
12 |
104 |
1913 - 2018 |
Chile |
|||
|
347 |
6 |
255 |
1981 - 2022 |
Columbia |
|||
|
1555 |
21 |
111 |
1951 - 2020 |
Germany |
|||
|
304 |
13 |
119 |
1989 - 2023 |
Denmark |
|||
|
320 |
16 |
111 |
1963 - 2023 |
Finland |
|||
|
654 |
22 |
344 |
1970 - 2021 |
France |
|||
|
671 |
10 |
145 |
1970 - 2015 |
Britain |
|||
|
|
472 |
20 |
210 |
1980 - 2020 |
Republic of India |
||
|
56 |
56 |
25 |
61 |
2004 - 2021 |
Luxumbourg |
||
|
369 |
5 |
40 |
1972 - 2024 |
New Zealand |
|||
|
50 |
4 |
76 |
1961 - 2020 |
Sweden |
|||
|
178 |
17 |
215 |
2000 - 2019 |
South Korea |
|||
|
671 |
8 |
59 |
1980 - 2014 |
United States |
|||
|
304 |
38 |
211 |
1981 - 2020 |
Denmark |
|||
|
111 |
16 |
124 |
1990 - 2020 |
China |
|||
|
669 |
27 |
35 |
2012 - 2023 |
Finland |
|||
|
5357 |
39 |
211 |
1950 - 2023 |
Global |
|||
|
561 |
|||||||
|
14425 |
5 |
28 |
1950 - 2018 |
North America (Mexico, Canada, USA) |
|||
|
|
464 |
27 |
35 |
1992 - 2020 |
Ireland |
||
|
294 |
37 |
35 |
1992 - 2020 |
Italy |
|||
|
|
751 |
696 |
27 |
35 |
1979 - 2022 |
Japan |
|
|
859 |
859 |
22 |
80 |
1981 - 2019 |
Central Europe |
||
|
111 |
111 |
36 |
154 |
1950 - 2021 |
Iceland |
||
|
7 |
14 |
14 |
2013 - 2019 |
Canada |
|||
|
1287 |
27 |
35 |
1992 - 2020 |
Poland |
|||
|
280 |
27 |
35 |
1992 - 2020 |
Portugal |
|||
|
1 |
2 |
0 |
2016 - 2019 |
Lulea (Sweden) |
|||
|
24 |
3 |
232 |
1920 - 1940 |
Haiti |
|||
|
117 |
3 |
10 |
1950 - 2023 |
Slovenia |
|||
|
889 |
27 |
35 |
1979 - 2020 |
Spain |
|||
|
|
73 |
27 |
35 |
1980 - 1999 |
Thailand |
||
|
|
12004 |
5 |
27 |
1950 - 2018 |
United States |
||
|
125 |
3 |
7 |
2011 - 2018 |
Iowa (USA) |
Duplicate Datasets
For some regions/coutries, there are multiple datasets available. These datasets may have different number of stations, temporal coverage, static and dynamic features. The following table lists the duplicate datasets available in AquaFetch.
Country/Region |
First Dataset |
Second Dataset |
Third Dataset |
|---|---|---|---|
|
|
||
|
|||
|
|||
|
High Level API
The aqua_fetch.rr.RainfallRunoff class represents high level API
which provides a unified and easy-to-use interface to access all the datasets.
It is recommended to use this class to access the datasets.
- class aqua_fetch.rr.RainfallRunoff(dataset: str, path: str | PathLike = None, overwrite: bool = False, to_netcdf: bool = True, processes: int = None, remove_zip: bool = True, verbosity: int = 1, **kwargs)[source]
Bases:
objectThis class provides access to all the rainfall-runoff datasets. For simiplity and resusability, use this class instead of using the individual dataset classes.
Examples
>>> from aqua_fetch import RainfallRunoff >>> dataset = RainfallRunoff('CAMELS_SE') # instead of CAMELS_SE, you can provide any other dataset name ... # get data by station id >>> _, dynamic = dataset.fetch(stations='5', as_dataframe=True) >>> df = dynamic['5'] # dynamic is a dictionary of with keys as station names and values as DataFrames >>> df.shape (21915, 4) ... ... # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 50 ... # get data of 10 % of stations as dataframe >>> _, dynamic = dataset.fetch(0.1, as_dataframe=True) >>> len(dynamic) # dynamic has data for 10% of stations (5) 5 ... ... # dynamic is a dictionary whose values are dataframes of dynamic features >>> [df.shape for df in dynamic.values()] [(21915, 4), (21915, 4), (21915, 4), (21915, 4), (21915, 4)] ... ... get the data of a single (randomly selected) station >>> _, dynamic = dataset.fetch(stations=1, as_dataframe=True) >>> len(dynamic) # dynamic has data for 1 station 1 ... # get names of available dynamic features >>> dataset.dynamic_features ... # get only selected dynamic features >>> _, dynamic = dataset.fetch('5', as_dataframe=True, ... dynamic_features=['pcp_mm', 'airtemp_C_mean', 'q_cms_obs']) >>> dynamic['5'].shape (21915, 3) ... ... # get names of available static features >>> dataset.static_features ... # get data of 10 random stations >>> _, dynamic = dataset.fetch(10, as_dataframe=True) >>> len(dynamic) # remember this is a dictionary with values as dataframe 10 ... # If we get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='5', static_features="all", as_dataframe=True) >>> static.shape, len(dynamic), dynamic['5'].shape ((1, 76), 1, (21915, 4)) ... # If we don't set as_dataframe=True and have xarray installed then the returned data will be a xarray Dataset >>> _, dynamic = dataset.fetch(10) ... type(dynamic) # -> xarray.core.dataset.Dataset ... >>> dynamic.dims # -> FrozenMappingWarningOnValuesAccess({'time': 21915, 'dynamic_features': 4}) ... >>> len(dynamic.data_vars) # -> 10 ... >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape (50, 2) >>> dataset.stn_coords('5') # returns coordinates of station whose id is 5 68.035599 21.9758 >>> dataset.stn_coords(['5', '736']) # returns coordinates of two stations ... # get area of a single station >>> dataset.area('5') # get coordinates of two stations >>> dataset.area(['5', '736']) ... # if fiona library is installed we can get the boundary as fiona Geometry >>> dataset.get_boundary('5') ...
See sphx_glr_auto_examples_camels_australia.py for more comprehensive usage example.
- __init__(dataset: str, path: str | PathLike = None, overwrite: bool = False, to_netcdf: bool = True, processes: int = None, remove_zip: bool = True, verbosity: int = 1, **kwargs)[source]
Rainfall Runoff datasets
- Parameters:
dataset (str) –
dataset name. This must be one of the following:
ArcticnetBullCABraCCAMCAMELSHCAMELS_AUSCAMELS_BRCAMELS_CHCAMELS_CLCAMELS_COLCAMELS_DECAMELS_DK0CAMELS_DKCAMELS_FICAMELS_FRCAMELS_GBCAMELS_INDCAMELS_LUXCAMELS_NZCAMELS_SECAMELS_SKCAMELS_USEStreamsFinlandGRDCCaravanGSHAHYSETSHYPEIrelandItalyJapanLamaHCELamaHIcePolandPortugalRRLuleaSwedenSimbiSloveniaSpainThailandUSGSWaterBenchIowa
path (str) – path to directory inside which data is located/downloaded. If provided and the path/dataset exists, then the data will be read from this path. If provided and the path/dataset does not exist, then the data will be downloaded at this path. If not provided, then the data will be downloaded in the default path which is
.../aqua_fetch/data/.overwrite (bool) – If the data is already downloaded then you can set it to True, to make a fresh download.
to_netcdf (bool) – whether to convert all the data into one netcdf file or not. This will fasten repeated calls to fetch etc but will require netCDF4 package as well as
xarray.verbosity (int) – 0: no message will be printed
kwargs – additional keyword arguments for the underlying dataset class For example
versionforaqua_fetch.rr.CAMELS_AUSortimestepforaqua_fetch.rr.LamaHCEdataset ormet_srcforaqua_fetch.rr.CAMELS_BR
- area(stations: str | List[str] = 'all') Series[source]
Returns area (Km2) of all/selected catchments as
pandas.Series- Parameters:
stations (str/list (default=``all``)) – name/names of stations. Default is
all, which will return area of all stations. For names of stations, seestations().- Returns:
a
pandas.Serieswhose indices are catchment ids and values are areas of corresponding catchments.- Return type:
pd.Series
Examples
>>> from aqua_fetch import RainfallRunoff >>> dataset = RainfallRunoff('CAMELS_CH') >>> dataset.area() # returns area of all stations >>> dataset.area('2004') # returns area of station whose id is 2004 >>> dataset.area(['2004', '6004']) # returns area of two stations
- property dynamic_features: List[str]
returns names of dynamic features as python list of strings
Examples
>>> from aqua_fetch import RainfallRunoff >>> dataset = RainfallRunoff('CAMELS_AUS') >>> dataset.dynamic_features
- property end: str
returns end date of data
Examples
>>> from aqua_fetch import RainfallRunoff >>> dataset = RainfallRunoff('CAMELS_AUS') >>> dataset.end()
- fetch(stations: str | List[str] | int | float = 'all', dynamic_features: List[str] | str | None = 'all', static_features: str | List[str] | None = None, st: None | str = None, en: None | str = None, as_dataframe: bool = False, **kwargs) tuple[DataFrame, Dict[str, DataFrame] | Dataset][source]
Fetches the features of one or more stations.
- Parameters:
stations –
It can have following values:
int: number of (randomly selected) stations to fetchfloat: fraction of (randomly selected) stations to fetchstr: name/id of station to fetch. However, ifallis provided, then all stations will be fetched. For names of stations, seestations().list: list of names/ids of stations to fetch
dynamic_features ((default=``all``)) –
It can have following values:
str: name of dynamic feature to fetch. Ifallis provided, then all dynamic features will be fetched. For names of dynamic features, seedynamic_features().list: list of dynamic features to fetch.None : No dynamic feature will be fetched. The second returned value will be None.
static_features ((default=None)) –
It can have following values:
str: name of static feature to fetch. Ifallis provided, then all static features will be fetched. For names of static features, seestatic_features().list: list of static features to fetch.None : No static feature will be fetched. The first returned value will be None.
st – starting date of data to be returned. If None, the data will be returned from where it is available.
en – end date of data to be returned. If None, then the data will be returned till the date data is available.
as_dataframe – whether to return dynamic attributes as
pandas.DataFrameor asxarray.Dataset. ifxarraylibrary is not installed, then this parameter will be ignored and the data will be returned aspandas.DataFrame.kwargs – keyword arguments
- Returns:
A tuple of static and dynamic features. Static features are always returned as
pandas.DataFramewith shape (stations, static features). The index of static features’ DataFrame is the station/gauge ids while the columns are names of the static features. Dynamic features are returned either asxarray.Datasetor a python dictionary whose keys are station names and values arepandas.DataFrame. It depends upon whether as_dataframe is True or False and whether thexarraylibrary is installed or not. If dynamic features arexarray.Dataset, then this dataset consists of data_vars equal to the number of stations and station names asxarray.Dataset.variablesand time and dynamic_features as dimensions and coordinates.- Return type:
Examples
>>> from aqua_fetch import RainfallRunoff >>> dataset = RainfallRunoff('CAMELS_AUS') ... >>> # get data of 10% of stations >>> _, dynamic = dataset.fetch(stations=0.1, as_dataframe=True) # dynamic is a dictionary ... ... # fetch data of 5 (randomly selected) stations >>> _, five_random_stn_data = dataset.fetch(stations=5, as_dataframe=True) ... ... # fetch data of 2 selected stations >>> _, two_selec_stn_data = dataset.fetch(stations=['912101A','912105A'], as_dataframe=True) ... ... # fetch data of a single stations >>> _, single_stn_data = dataset.fetch(stations='912101A', as_dataframe=True) ... ... # get both static and dynamic features as dictionary >>> static, dyanmic = dataset.fetch(1, static_features="all", as_dataframe=True) # -> dict >>> dynamic ... ... # get only selected dynamic features >>> _, sel_dyn_features = dataset.fetch(stations='912101A', ... dynamic_features=['q_cms_obs', 'pcp_mm_silo'], as_dataframe=True) ... ... # fetch data between selected periods >>> _, data = dataset.fetch(stations='912101A', st="20010101", en="20101231", as_dataframe=True)
- fetch_dynamic_features(station: str, dynamic_features='all', st=None, en=None, as_dataframe=False) DataFrame | Dataset[source]
Fetches all or selected dynamic attributes of one station.
- Parameters:
station (str) – name/id of station of which to extract the data. For names of stations see
stations()dynamic_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available dynamic features are returned. For names of dynamic features, see
dynamic_features()st (Optional (default=None)) – start time from where to fetch the data.
en (Optional (default=None)) – end time untill where to fetch the data
as_dataframe (bool, optional (default=False)) – if true, the returned data is
pandas.DataFrameotherwise it isxarray.Dataset
- Returns:
a
pandas.DataFrameorxarray.Datasetdepending upon the value of as_dataframe and whetherxarrayis installed or not.- Return type:
pd.DataFrame or xr.Dataset
Examples
>>> from aqua_fetch import RainfallRunoff >>> camels = RainfallRunoff('CAMELS_AUS') >>> camels.fetch_dynamic_features('912101A', as_dataframe=True) >>> camels.dynamic_features >>> camels.fetch_dynamic_features('912101A', ... features=['airtemp_C_silo_max', 'vp_hpa_silo', 'q_cms_obs'], ... as_dataframe=True)
- fetch_static_features(stations: str | list = 'all', static_features: str | list = 'all') DataFrame[source]
Fetches all or selected static attributes of one or more stations.
- Parameters:
stations (str) – name/id of station of which to extract the data . For names of stations see
stations().static_features (list/str, optional (default="all")) – The name/names of static features to fetch. By default, all available static features are returned. For names of static features, see
static_features().
- Returns:
a pandas
pandas.DataFrame- Return type:
pd.DataFrame
Examples
>>> from aqua_fetch import RainfallRunoff >>> camels = RainfallRunoff('CAMELS_AUS') >>> camels.fetch_static_features('912101A') >>> camels.static_features >>> camels.fetch_static_features('912101A', ... features=['elev_mean', 'relief', 'ksat', 'pop_mean'])
- fetch_station_features(station: str, dynamic_features: str | list | None = 'all', static_features: str | list | None = None, st: str | None = None, en: str | None = None, **kwargs) tuple[DataFrame, DataFrame][source]
Fetches static and dynamic features for one station.
- Parameters:
station (str) – station id/gauge id for which the data is to be fetched. For names of stations, see
stations()dynamic_features (str/list, optional) – names of dynamic features/attributes to fetch. For names of dynamic features, check the output of
dynamic_features()static_features – names of static features/attributes to be fetches. For names of static features, check the output of
static_features()st (str,optional) – starting point from which the data to be fetched. By default, the data will be fetched from where it is available.
en (str, optional) – end point of data to be fetched. By default the dat will be fetched
- Returns:
A tuple of static and dynamic features, both as
pandas.DataFrame. The dataframe of static features will be of single row while the dynamic features will be of shape (time, dynamic features).- Return type:
Examples
>>> from aqua_fetch import RainfallRunoff >>> dataset = RainfallRunoff('CAMELS_AUS') >>> static, dynamic = dataset.fetch_station_features('912101A') >>> static.shape ... >>> dynamic.shape
- fetch_stations_features(stations: str | List[str], dynamic_features: str | List[str] | None = 'all', static_features: str | List[str] | None = None, st=None, en=None, as_dataframe: bool = False, **kwargs) tuple[DataFrame, Dict[str, DataFrame] | Dataset][source]
Reads attributes of more than one stations.
- Parameters:
stations – name/ids of stations for which data is to be fetched. For names of stations, see
stations().dynamic_features – list of dynamic features to be fetched. For names of dynamic features, see
dynamic_features(). ifall, then all dynamic features will be fetched. If None, then no dynamic attribute will be fetched and the second returned value will be None.static_features – list of static features to be fetched. If all, then all static features will be fetched. If None, then no static attribute will be fetched. For names of static features, see
static_features().st – start of data to be fetched.
en – end of data to be fetched.
as_dataframe (whether to return the data as
pandas.DataFrame. default) – isxarray.Datasetobjectdict (kwargs) – additional keyword arguments
- Returns:
A tuple of static and dynamic features. Static features are always returned as
pandas.DataFramewith shape (stations, static features). The index of static features’ DataFrame is the station/gauge ids while the columns are names of the static features. Dynamic features are returned either asxarray.Datasetor a python dictionary whose keys are names of stations and values arepandas.DataFramedepending upon whether as_dataframe is True or False and whether thexarraylibrary is installed or not. If dynamic features arexarray.Dataset, then this dataset consists of data_vars equal to the number of stations and station names asxarray.Dataset.variablesand time and dynamic_features as dimensions and coordinates.- Return type:
- Raises:
ValueError – if both
dynamic_featuresandstatic_featuresare None
Examples
>>> from aqua_fetch import RainfallRunoff >>> dataset = RainfallRunoff('CAMELS_AUS') ... # find out station ids >>> dataset.stations() ... # get data of selected stations >>> static, dynamic = dataset.fetch_stations_features(['912101A', '912105A', '915011A'], ... as_dataframe=True)
- get_boundary(station: str)[source]
returns boundary of a catchment as fiona.Geometry object.
- Parameters:
station (str) – name/id of catchment. For names of catchments, see
stations().- Returns:
a fiona.Geometry object representing the boundary of the catchment.
- Return type:
fiona.Geometry
Examples
>>> from aqua_fetch import RainfallRunoff >>> dataset = RainfallRunoff('CAMELS_SE') >>> dataset.get_boundary(dataset.stations()[0])
- plot_catchment(station: str, show_outlet: bool = False, ax: Axes = None, show: bool = True, **kwargs)[source]
plots catchment boundaries
- Parameters:
station (str) – name/id of station. For names of stations, see
stations()show_outlet (bool, optional (default=False)) – if True, then outlet of the catchment will be plotted as a red dot
ax (plt.Axes) – matplotlib axes to draw the plot. If not given, then new axes will be created.
show (bool)
**kwargs
- Return type:
plt.Axes
Examples
>>> from aqua_fetch import RainfallRunoff >>> dataset = RainfallRunoff('CAMELS_AUS') >>> dataset.plot_catchment() >>> dataset.plot_catchment(marker='o', ms=0.3) >>> ax = dataset.plot_catchment(marker='o', ms=0.3, show=False) >>> ax.set_title("Catchment Boundaries") >>> plt.show()
- plot_num_observations(stations: str | List[str] = 'all', dynamic_features: str | List[str] = 'all', start: str | Timestamp = None, end: str | Timestamp = None, show_constant: bool = False, figsize: Tuple[float, float] = None, ax=None, show: bool = True)[source]
Plots the number of observations available for different dynamic features as cumulative distribution function (CDF). This plot is not plotted if all stations have same number of observations for a dynamic feature.
- Parameters:
stations (Union[str, List[str]]) – The stations to include in the plot. If ‘all’, all stations will be included.
dynamic_features (Union[str, List[str]]) – The dynamic features to include in the plot. If ‘all’, all dynamic features will be included.
start (Union[str, pd.Timestamp], optional) – The start date for the data to consider. If None, the start date of the dataset will be used.
end (Union[str, pd.Timestamp], optional) – The end date for the data to consider. If None, the end date of the dataset will be used.
show_constant (bool, optional) – Whether to show features with constant number of observations across stations. If True, these features will be included in the plot as well.
figsize (Tuple[float, float], optional) – The size of the figure to create. If None, a default size will be used.
ax (plt.Axes, optional) – The matplotlib axes to draw the plot. If not given, then new axes will be created.
show (bool, optional) – Whether to display the plot immediately.
- Returns:
The matplotlib axes containing the plot.
- Return type:
plt.Axes
Examples
>>> from aqua_fetch import CAMELS_FI >>> dataset = CAMELS_FI() >>> dataset.plot_num_observations() # plotting for different time periods >>> dataset = RainfallRunoff('CAMELS_COL') ... # plot number of observations for different periods >>> _, ax = plt.subplots() >>> for idx, period in enumerate([("19810101", "19901231"), ("19910101", "20001231"), ("20010101", "20101231")]): >>> start, end = period >>> ax = dataset.plot_num_observations( >>> dynamic_features=['q_cms_obs'], >>> ax=ax, >>> start=start, end=end, show=False) >>> ax.lines[idx].set_label(f'{start} to {end}') >>> assert isinstance(ax, plt.Axes) >>> ax.legend() >>> plt.show()
- plot_stations(stations: List[str] = 'all', marker='.', color: str = None, ax: Axes = None, show: bool = True, **kwargs) Axes[source]
plots coordinates of stations
- Parameters:
stations – name/names of stations. If not given, all stations will be plotted. For names of stations, see
stations().marker – marker to use.
color (str, optional) – name of static feature to use as color.
ax (plt.Axes) – matplotlib axes to draw the plot. If not given, then new axes will be created.
show (bool)
**kwargs
- Return type:
plt.Axes
Examples
>>> from aqua_fetch import RainfallRunoff >>> dataset = RainfallRunoff('CAMELS_AUS') >>> dataset.plot_stations() >>> dataset.plot_stations(['1', '2', '3']) >>> dataset.plot_stations(marker='o', ms=0.3) >>> ax = dataset.plot_stations(marker='o', ms=0.3, show=False) >>> ax.set_title("Stations") >>> plt.show() using area as color >>> ds.plot_stations(color='area_km2')
- q_mm(stations: str | List[str] = 'all') DataFrame[source]
returns streamflow in the units of milimeter per timestep (e.g. mm/day or mm/hour). This is obtained by diving
q/area- Parameters:
stations (str/list) – name/names of stations. Default is
all, which will return area of all stations. For names of stations, seestations().- Returns:
a
pandas.DataFramewhose indices are time-steps and columns are catchment/station ids.- Return type:
pd.DataFrame
- property start: str
returns starting date of data
Examples
>>> from aqua_fetch import RainfallRunoff >>> dataset = RainfallRunoff('CAMELS_AUS') >>> dataset.start()
- property static_features: List[str]
returns names of static features as python list of strings
Examples
>>> from aqua_fetch import RainfallRunoff >>> dataset = RainfallRunoff('CAMELS_AUS') >>> dataset.static_features
- stations() List[str][source]
Names/ids of stations/catchment/basins/gauges or whatever that would be used to index each catchment in the dataset. Every catchment has a unique name/id which can be used to fetch its data.
Examples
>>> from aqua_fetch import RainfallRunoff >>> dataset = RainfallRunoff('CAMELS_AUS') >>> dataset.stations()
- stn_coords(stations: str | List[str] = 'all') DataFrame[source]
returns coordinates of stations as
pandas.DataFramewithlongandlatas columns.- Parameters:
stations – name/names of stations. If not given, coordinates of all stations will be returned. For names of stations, see
stations().- Returns:
pandas.DataFramewithlongandlatcolumns. The length of dataframe will be equal to number of stations wholse coordinates are to be fetched.- Return type:
pd.DataFrame
Examples
>>> from aqua_fetch import RainfallRunoff >>> dataset = RainfallRunoff('CAMELS_CH') >>> dataset.stn_coords() # returns coordinates of all stations >>> dataset.stn_coords('2004') # returns coordinates of station whose id is 2004 >>> dataset.stn_coords(['2004', '6004']) # returns coordinates of two stations
>>> from aqua_fetch import RainfallRunoff >>> dataset = RainfallRunoff('CAMELS_AUS') >>> dataset.stn_coords() # returns coordinates of all stations >>> dataset.stn_coords('912101A') # returns coordinates of station whose id is 912101A >>> dataset.stn_coords(['G0050115', '912101A']) # returns coordinates of two stations
Low Level API
The low level API provides access to each individual dataset classes. This provides more control over the datasets.
- class aqua_fetch.rr._RainfallRunoff(path: str = None, timestep: str = 'D', to_netcdf: bool = True, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]
Bases:
DatasetsThis is the parent class for invidual rainfall-runoff datasets like CAMELS-GB etc. This class is not meant to be for direct use. It is inherited by the child classes which are specific to a dataset like CAMELS-GB, CAMELS-AUS etc. This class first downloads the dataset if it is not already downloaded. Then the selected features for a selected catchment/station are fetched and provided to the user using the method fetch.
- - path str/path
- Type:
diretory of the dataset
- - dynamic_features list
this dataset
- Type:
tells which dynamic features are available in
- - static_features list
- Type:
a list of static features.
- - static_attribute_categories list
are present in this category.
- Type:
tells which kinds of static features
- - stations : returns name/id of stations for which the data (dynamic features)
exists as list of strings.
- - fetch : fetches all features (both static and dynamic type) of all
station/gauge_ids or a speficified station. It can also be used to fetch all features of a number of stations ids either by providing their guage_id or by just saying that we need data of 20 stations which will then be chosen randomly.
- - fetch_dynamic_features :
fetches speficied dynamic features of one specified station. If the dynamic attribute is not specified, all dynamic features will be fetched for the specified station. If station is not specified, the specified dynamic features will be fetched for all stations.
- - fetch_static_features :
works same as fetch_dynamic_features but for static features. Here if the category is not specified then static features of the specified station for all categories are returned.
stations : returns list of stations
- __init__(path: str = None, timestep: str = 'D', to_netcdf: bool = True, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str) – This can only be set for datasets which are available at multiple timesteps such as LamaHCE or LamaHIce etc.
to_netcdf (bool) – whether the data should be saved in netCDF format or not If set to true, the data will be saved in netCDF format which can take time for the first time it is created. However, it leads to faster I/O operations in subsequent accesses.
overwrite (bool) – whether to overwrite existing files or not. If set to True, the data will be redownloaded.
verbosity (int) –
This parameter determines the level of verbosity for logging messages.
0: no message will be printed
1: only important messages will be printed
>1: any higher value greater than 1 will result in more verbose output
kwargs – Any other keyword arguments for the parent
Datasetsclass
- area(stations: str | List[str] = 'all') Series[source]
Returns area (Km2) of all/selected catchments as
pandas.Series- Parameters:
stations (str/list (default=None)) – name/names of stations. Default is
all, which will return area of all stations- Returns:
a
pandas.Serieswhose indices are catchment ids and values are areas of corresponding catchments.- Return type:
pd.Series
Examples
>>> from aqua_fetch import CAMELS_CH >>> dataset = CAMELS_CH() >>> dataset.area() # returns area of all stations >>> dataset.area('2004') # returns area of station whose id is 2004 >>> dataset.area(['2004', '6004']) # returns area of two stations
- property boundary_id_map: str
Name of the attribute in the boundary (shapefile/.gpkg) file that will be used to map the catchment/station id to the geometry of the catchment/station. This is used to create the boundary id map. if not given, then the first attribute in the boundary file will be used.
- property camels_dir
Directory where all camels datasets will be saved. This will under datasets directory
- property dyn_fname: str | PathLike
name of the .nc file which contains dynamic features. This file is created during dataset initialization only if to_netcdf is True and xarray is installed and the file does not already exists. The creation of this file can take some time however it leads to faster I/O operations.
- property dyn_map: Dict[str, str]
A dictionary that maps dynamic features to their names in the dataset.
- property dynamic_features: List[str]
Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_dynamic_features().- Return type:
List[str]
- property end: Timestamp
end of data
- fetch(stations: str | list | int | float = 'all', dynamic_features: List[str] | str | None = 'all', static_features: str | List[str] | None = None, st: None | str = None, en: None | str = None, as_dataframe: bool = False, **kwargs) Tuple[DataFrame, Dict[str, DataFrame] | Dataset][source]
Fetches the features of one or more stations.
- Parameters:
stations –
- It can have following values:
int : number of (randomly selected) stations to fetch
float : fraction of (randomly selected) stations to fetch
- strname/id of station to fetch. However, if
allis provided, then all stations will be fetched.
- strname/id of station to fetch. However, if
list : list of names/ids of stations to fetch
dynamic_features (If not None, then it is the features to be) – fetched. If None, then all available features are fetched
static_features (list of static features to be fetches. None) – means no static attribute will be fetched.
st (starting date of data to be returned. If None, the data will be) – returned from where it is available.
en (end date of data to be returned. If None, then the data will be) – returned till the date data is available.
as_dataframe (whether to return dynamic features as
pandas.DataFrame) – or asxarray.Dataset.kwargs (keyword arguments to read the files)
- Returns:
A tuple of static and dynamic features. Static features are always returned as pandas DataFrame with shape (stations, staticfeatures). The index of static features is the station/gauge ids while the columns are the static features. Dynamic features are returned as either xarray Dataset or a dictionary with keys as station names and values as pandas DataFrame. This depends upon whether as_dataframe is True or False and whether the xarray module is installed or not. If dynamic features are xarray Dataset, then it consists of data_vars equal to the number of stations and time adn dynamic_features as dimensions.
- Return type:
Examples
>>> from aqua_fetch import CAMELS_AUS >>> dataset = CAMELS_AUS() >>> # get data of 10% of stations >>> _, dynamic = dataset.fetch(stations=0.1, as_dataframe=True) # dynamic is a dictionary ... # fetch data of 5 (randomly selected) stations >>> _, five_random_stn_data = dataset.fetch(stations=5, as_dataframe=True) ... # fetch data of 3 selected stations >>> _, three_selec_stn_data = dataset.fetch(stations=['912101A','912105A','915011A'], as_dataframe=True) ... # fetch data of a single stations >>> _, single_stn_data = dataset.fetch(stations='318076', as_dataframe=True) ... # get both static and dynamic features as dictionary >>> static, dynamic = dataset.fetch(1, static_features="all", as_dataframe=True) # -> dict >>> dynamic ... # get only selected dynamic features >>> _, sel_dyn_features = dataset.fetch(stations='318076', ... dynamic_features=['q_mm_obs', 'solrad_wm2_silo'], as_dataframe=True) ... # fetch data between selected periods >>> _, data = dataset.fetch(stations='318076', st="20010101", en="20101231", as_dataframe=True)
- fetch_dynamic_features(station: str, dynamic_features='all', st=None, en=None, as_dataframe=False) DataFrame | Dataset[source]
Fetches all or selected dynamic features of one station.
- Parameters:
station (str) – name/id of station of which to extract the data
features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available dynamic features are returned.
st (Optional (default=None)) – start time from where to fetch the data.
en (Optional (default=None)) – end time untill where to fetch the data
as_dataframe (bool, optional (default=False)) – if true, the returned data is pandas DataFrame otherwise it is
xarray.Dataset
- Returns:
a pandas dataframe or xarray dataset of dynamic features If as_dataframe is True, then the returned data is a pandas DataFrame whose index is time and the columns are dynamic_features. If as_dataframe is False, and xarray module is installed, then the returned data is xarray dataset with data_vars equal to the number of stations and time and dynamic_features as dimensions.
- Return type:
pd.DataFrame/xr.Dataset
Examples
>>> from aqua_fetch import CAMELS_AUS >>> camels = CAMELS_AUS() >>> camels.fetch_dynamic_features('912101A', as_dataframe=True) >>> camels.dynamic_features >>> camels.fetch_dynamic_features('912101A', ... dynamic_features=['airtemp_C_awap_max', 'vp_hpa_awap', 'q_cms_obs'], ... as_dataframe=True)
- fetch_static_features(stations: str | list = 'all', static_features: str | list = 'all') DataFrame[source]
Fetches all or selected static features of one or more stations.
- Parameters:
stations (str/list) – name/id of station of which to extract the data
static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.
- Returns:
- Return type:
pd.DataFrame
Examples
>>> from aqua_fetch import CAMELS_AUS >>> camels = CAMELS_AUS() >>> camels.fetch_static_features('912101A') >>> camels.static_features >>> camels.fetch_static_features('912101A', ... static_features=['elev_mean', 'relief', 'ksat', 'pop_mean']) for CAMELS_FR >>> from aqua_fetch import CAMELS_FR >>> dataset = CAMELS_FR() get the names of stations >>> stns = dataset.stations() >>> len(stns) 654 get all static data of all stations >>> static_data = dataset.fetch_static_features(stns) >>> static_data.shape (472, 210) get static data of one station only >>> static_data = dataset.fetch_static_features('42600042') >>> static_data.shape (1, 210) get the names of static features >>> dataset.static_features get only selected features of all stations >>> static_data = dataset.fetch_static_features(stns, ['slope_mean', 'aridity']) >>> static_data.shape (472, 2) >>> data = dataset.fetch_static_features('42600042', static_features=['slope_mean', 'aridity']) >>> data.shape (1, 2)
- fetch_station_features(station: str, dynamic_features: str | list | None = 'all', static_features: str | list | None = None, st: str | None = None, en: str | None = None, **kwargs) tuple[DataFrame, DataFrame][source]
Fetches features for one station.
- Parameters:
station – station id/gauge id for which the data is to be fetched.
dynamic_features (str/list, optional) – names of dynamic features/attributes to fetch
static_features – names of static features/attributes to be fetches
st (str,optional) – starting point from which the data to be fetched. By default, the data will be fetched from where it is available.
en (str, optional) – end point of data to be fetched. By default the dat will be fetched
- Returns:
A tuple of static and dynamic features, both as
pandas.DataFrame. The dataframe of static features will be of single row while the dynamic features will be of shape (time, dynamic features).- Return type:
Examples
>>> from aqua_fetch import CAMELS_AUS >>> dataset = CAMELS_AUS() >>> static, dynamic = dataset.fetch_station_features('912101A') >>> static.shape, dynamic.shape
- fetch_stations_features(stations: str | List[str], dynamic_features: str | List[str] = 'all', static_features: str | List[str] = None, st: str | Timestamp = None, en: str | Timestamp = None, as_dataframe: bool = False, **kwargs) Tuple[DataFrame, Dict[str, DataFrame] | Dataset][source]
Reads features of more than one stations.
- Parameters:
stations – list of stations for which data is to be fetched.
dynamic_features – list of dynamic features to be fetched. if
all, then all dynamic features will be fetched.static_features (list of static features to be fetched.) – If
all, then all static features will be fetched. If None, `then no static attribute will be fetched.st – start of data to be fetched.
en – end of data to be fetched.
as_dataframe – whether to return the dynamic data as pandas dataframe. default is
xarray.Datasetobjectdict (kwargs) – additional keyword arguments
- Returns:
tuple – A tuple of static and dynamic features. Static features are always returned as
pandas.DataFramewith shape (stations, staticfeatures). The index of static features is the station/gauge ids while the columns are the static features. Dynamic features are returned as eitherxarray.Datasetor adictwith keys as station names and values aspandas.DataFramedepending upon whether as_dataframe is True or False and whether the xarray module is installed or not. If dynamic features are xarray Dataset, then it consists of data_vars equal to the number of stations and time and dynamic_features as dimensions.Raises – ValueError, if both dynamic_features and static_features are None
Examples
>>> from aqua_fetch import CAMELS_AUS >>> dataset = CAMELS_AUS() ... # find out station ids >>> dataset.stations() ... # get data of selected stations as xarray Dataset >>> dataset.fetch_stations_features(['912101A', '912105A', '915011A']) ... # get data of selected stations as dictionary of pandas DataFrame >>> dataset.fetch_stations_features(['912101A', '912105A', '915011A'], ... as_dataframe=True) ... # get both dynamic and static features of selected stations >>> dataset.fetch_stations_features(['912101A', '912105A', '915011A'], ... dynamic_features=['q_mm_obs', 'airtemp_C_mean_silo'], static_features=['elev_mean'])
- get_boundary(catchment_id: str, to_wgs84: bool = True)[source]
returns boundary of a catchment in a required format
- Parameters:
- Returns:
geometry
- Return type:
fiona.Geometry
Examples
>>> from aqua_fetch import CAMELS_SE >>> dataset = CAMELS_SE() >>> dataset.get_boundary(dataset.stations()[0])
- static mean_temp(tmin: Series, tmax: Series) Series[source]
calculates mean temperature from tmin and tmax
- plot_catchment(catchment_id: str, show_outlet: bool = False, ax: Axes = None, show: bool = True, **kwargs)[source]
plots catchment boundaries
- Parameters:
- Return type:
plt.Axes
Examples
>>> from aqua_fetch import CAMELS_AUS >>> dataset = CAMELS_AUS() >>> dataset.plot_catchment('912101A') >>> dataset.plot_catchment('912101A', marker='o', ms=0.3) >>> ax = dataset.plot_catchment('912101A', marker='o', ms=0.3, show=False) >>> ax.set_title("Catchment Boundary") >>> plt.show() # show the outlet as well >>> CAMELS_AUS.plot_catchment('912101A', show_outlet=True)
- plot_num_observations(stations: str | List[str] = 'all', dynamic_features: str | List[str] = 'all', start: str | Timestamp = None, end: str | Timestamp = None, show_constant: bool = False, figsize: Tuple[float, float] = None, ax=None, show: bool = True)[source]
Plots the number of observations available for different dynamic features as cumulative distribution function (CDF). This plot is not plotted if all stations have same number of observations for a dynamic feature.
- Parameters:
stations (Union[str, List[str]]) – The stations to include in the plot. If ‘all’, all stations will be included.
dynamic_features (Union[str, List[str]]) – The dynamic features to include in the plot. If ‘all’, all dynamic features will be included.
start (Union[str, pd.Timestamp], optional) – The start date for the data to consider. If None, the start date of the dataset will be used.
end (Union[str, pd.Timestamp], optional) – The end date for the data to consider. If None, the end date of the dataset will be used.
show_constant (bool, optional) – Whether to show features with constant number of observations across stations. If True, these features will be included in the plot as well.
figsize (Tuple[float, float], optional) – The size of the figure to create. If None, a default size will be used.
ax (plt.Axes, optional) – The matplotlib axes to draw the plot. If not given, then new axes will be created.
show (bool, optional) – Whether to display the plot immediately.
- Returns:
The matplotlib axes containing the plot.
- Return type:
plt.Axes
Examples
>>> from aqua_fetch import CAMELS_FI >>> dataset = CAMELS_FI() >>> dataset.plot_num_observations() # plotting for different time periods >>> dataset = RainfallRunoff('CAMELS_COL') >>> _, ax = plt.subplots() >>> for idx, period in enumerate([("19810101", "19901231"), ("19910101", "20001231"), ("20010101", "20101231")]): >>> start, end = period >>> ax = dataset.plot_num_observations( >>> dynamic_features=['q_cms_obs'], >>> ax=ax, >>> start=start, end=end, show=False) >>> ax.lines[idx].set_label(f'{start} to {end}') >>> assert isinstance(ax, plt.Axes) >>> ax.legend() >>> plt.show()
- plot_stations(stations: List[str] = 'all', marker='.', color: str = None, ax: Axes = None, show: bool = True, **kwargs) Axes[source]
plots coordinates of stations
- Parameters:
- Return type:
plt.Axes
Examples
>>> from aqua_fetch import CAMELS_AUS >>> dataset = CAMELS_AUS() >>> dataset.plot_stations() >>> dataset.plot_stations(['1', '2', '3']) >>> dataset.plot_stations(marker='o', ms=0.3) >>> ax = dataset.plot_stations(marker='o', ms=0.3, show=False) >>> ax.set_title("Stations") >>> plt.show() using area as color >>> ds.plot_stations(color='area_km2')
- q_mm(stations: str | List[str] = 'all') DataFrame[source]
returns streamflow in the units of milimeter per timestep (e.g. mm/day or mm/hour). This is obtained by diving
q/area- Parameters:
stations (str/list) – name/names of stations. Default is
all, which will return q_mm of all stations- Returns:
a
pandas.DataFramewhose indices are time-steps and columns are catchment/station ids.- Return type:
pd.DataFrame
- property static_factors: Dict[str, str]
A dictionary that maps static features to the factors with they needs to be multiplied to get the actual value
- property static_features: List[str]
Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_static_features().- Return type:
List[str]
- property static_map: Dict[str, str]
A dictionary that maps static features to their names in the dataset.
- stations() List[str][source]
Names/ids of stations/catchment/gauges or whatever that would be used to index each station in the dataset.
- stn_coords(stations: str | List[str] = 'all') DataFrame[source]
returns coordinates of stations as DataFrame with
longandlatas columns.- Parameters:
stations – name/names of stations. If not given, coordinates of all stations will be returned.
- Returns:
pandas.DataFramewithlongandlatcolumns. The length of dataframe will be equal to number of stations wholse coordinates are to be fetched.- Return type:
pd.DataFrame
Examples
>>> from aqua_fetch import CAMELS_CH >>> dataset = CAMELS_CH() >>> dataset.stn_coords() # returns coordinates of all stations >>> dataset.stn_coords('2004') # returns coordinates of station whose id is 2004 >>> dataset.stn_coords(['2004', '6004']) # returns coordinates of two stations
>>> from aqua_fetch import CAMELS_AUS >>> dataset = CAMELS_AUS() >>> dataset.stn_coords() # returns coordinates of all stations >>> dataset.stn_coords('912101A') # returns coordinates of station whose id is 912101A >>> dataset.stn_coords(['G0050115', '912101A']) # returns coordinates of two stations
- class aqua_fetch.rr._gsha._GSHA(gsha_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
Bases:
_RainfallRunoffParent class for those datasets which uses static and dynamic features from GSHA dataset . The following dataset classes are based on this class:
aqua_fetch.Spain
- __init__(gsha_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str) – This can only be set for datasets which are available at multiple timesteps such as LamaHCE or LamaHIce etc.
to_netcdf (bool) – whether the data should be saved in netCDF format or not If set to true, the data will be saved in netCDF format which can take time for the first time it is created. However, it leads to faster I/O operations in subsequent accesses.
overwrite (bool) – whether to overwrite existing files or not. If set to True, the data will be redownloaded.
verbosity (int) –
This parameter determines the level of verbosity for logging messages.
0: no message will be printed
1: only important messages will be printed
>1: any higher value greater than 1 will result in more verbose output
kwargs – Any other keyword arguments for the parent
Datasetsclass
- property dynamic_features: List[str]
Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_dynamic_features().- Return type:
List[str]
- property end: Timestamp
end of data
- fetch_static_features(stations: str | List[str] = 'all', static_features: str | List[str] = 'all', st=None, en=None) DataFrame[source]
returns static atttributes of one or multiple stations
- Parameters:
stations (str) – name/id of station of which to extract the data
static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.
st
en
Examples
>>> from aqua_fetch import Japan >>> dataset = Japan() get the names of stations >>> stns = dataset.stations() >>> len(stns) 12004 get all static data of all stations >>> static_data = dataset.fetch_static_features(stns) >>> static_data.shape (12004, 27) get static data of one station only >>> static_data = dataset.fetch_static_features('01010070') >>> static_data.shape (1, 27) get the names of static features >>> dataset.static_features get only selected features of all stations >>> static_data = dataset.fetch_static_features(stns, ['Drainage_Area_km2', 'Elevation_m']) >>> static_data.shape (12004, 2)
- fetch_stations_features(stations: list, dynamic_features: str | list | None = 'all', static_features: str | list | None = None, st=None, en=None, as_dataframe: bool = False, **kwargs) Tuple[DataFrame, DataFrame | Dataset][source]
returns features of multiple stations
Examples
>>> from aqua_fetch import Arcticnet >>> dataset = Arcticnet() >>> stations = dataset.stations() >>> features = dataset.fetch_stations_features(stations)
- Returns:
A tuple of static and dynamic features. Static features are always returned as
pandas.DataFramewith shape (stations, staticfeatures). The index of static features is the station/gauge ids while the columns are the static features. Dynamic features are returned as either xarray Dataset orpandas.DataFramedepending upon whether as_dataframe is True or False and whether the xarray module is installed or not. If dynamic features are xarray Dataset, then it consists of data_vars equal to the number of stations and time adn dynamic_features as dimensions. If dynamic features are returned as pandas DataFrame, then the first index is time and the second index is dynamic_features.- Return type:
- property static_features: List[str]
Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_static_features().- Return type:
List[str]
- class aqua_fetch.rr._estreams._EStreams(path: str | PathLike = None, estreams_path: str | PathLike = None, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]
Bases:
_RainfallRunoffParent/Helper class for those datasets which use static and dynamic data from EStreams. It handles specifically following classes
aqua_fetch.Finlandaqua_fetch.Italyaqua_fetch.Polandaqua_fetch.Portugalaqua_fetch.Slovenia
- __init__(path: str | PathLike = None, estreams_path: str | PathLike = None, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str) – This can only be set for datasets which are available at multiple timesteps such as LamaHCE or LamaHIce etc.
to_netcdf (bool) – whether the data should be saved in netCDF format or not If set to true, the data will be saved in netCDF format which can take time for the first time it is created. However, it leads to faster I/O operations in subsequent accesses.
overwrite (bool) – whether to overwrite existing files or not. If set to True, the data will be redownloaded.
verbosity (int) –
This parameter determines the level of verbosity for logging messages.
0: no message will be printed
1: only important messages will be printed
>1: any higher value greater than 1 will result in more verbose output
kwargs – Any other keyword arguments for the parent
Datasetsclass
- property dynamic_features: List[str]
Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_dynamic_features().- Return type:
List[str]
- property end: Timestamp
end of data
- fetch_static_features(stations: str | List[str] = 'all', static_features: str | List[str] = 'all', countries: List[str] = 'all') DataFrame[source]
returns static atttributes of one or multiple stations
- Parameters:
stations (str) – name/id of station of which to extract the data
static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.
Examples
>>> from aqua_fetch import Japan >>> dataset = Japan() get the names of stations >>> stns = dataset.stations() >>> len(stns) 12004 get all static data of all stations >>> static_data = dataset.fetch_static_features(stns) >>> static_data.shape (12004, 27) get static data of one station only >>> static_data = dataset.fetch_static_features('01010070') >>> static_data.shape (1, 27) get the names of static features >>> dataset.static_features get only selected features of all stations >>> static_data = dataset.fetch_static_features(stns, ['Drainage_Area_km2', 'Elevation_m']) >>> static_data.shape (12004, 2)
- fetch_stations_features(stations: list, dynamic_features: str | list | None = 'all', static_features: str | list | None = None, st=None, en=None, as_dataframe: bool = False, **kwargs)[source]
returns features of multiple stations
- Returns:
A tuple of static and dynamic features. Static features are always returned as
pandas.DataFramewith shape (stations, static features). The index of static features’ DataFrame is the station/gauge ids while the columns are names of the static features. Dynamic features are returned either asxarray.Datasetor a python dictionary whose keys are station names and values arepandas.DataFramedepending upon whether as_dataframe is True or False and whether thexarraylibrary is installed or not. If dynamic features arexarray.Dataset, then this dataset consists of data_vars equal to the number of stations and station names asxarray.Dataset.variablesand time and dynamic_features as dimensions and coordinates.- Return type:
Examples
>>> from aqua_fetch import Arcticnet >>> dataset = Arcticnet() >>> stations = dataset.stations() >>> features = dataset.fetch_stations_features(stations)
- gauge_id_basin_id_map() dict[source]
For example for Portugal, it is guage_id : ‘03J/02H’ basin_id ‘PT000001’ ‘03J/02H’ -> ‘PT000001’
for Slovenia, it is gauge id : 1060 basin_id : SI000001 ‘1060’ -> ‘SI000001’
- property static_features: List[str]
Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_static_features().- Return type:
List[str]
- class aqua_fetch.Arcticnet(path: str | PathLike = None, gsha_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
Bases:
_GSHAData of 106 catchments of arctic region from r-arcticnet project . The meteorological data static catchment features and catchment boundaries taken from GSHA project. Therefore, the number of static features are 35 and dynamic features are 27 and the data is available from 1979-01-01 to 2003-12-31 although the observed streamflow (q_cms_obs) for some stations is available as earlier as from 1913-01-01.
- __init__(path: str | PathLike = None, gsha_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str) – This can only be set for datasets which are available at multiple timesteps such as LamaHCE or LamaHIce etc.
to_netcdf (bool) – whether the data should be saved in netCDF format or not If set to true, the data will be saved in netCDF format which can take time for the first time it is created. However, it leads to faster I/O operations in subsequent accesses.
overwrite (bool) – whether to overwrite existing files or not. If set to True, the data will be redownloaded.
verbosity (int) –
This parameter determines the level of verbosity for logging messages.
0: no message will be printed
1: only important messages will be printed
>1: any higher value greater than 1 will result in more verbose output
kwargs – Any other keyword arguments for the parent
Datasetsclass
- property end: Timestamp
end of data
- class aqua_fetch.Bull(path, overwrite=False, **kwargs)[source]
Bases:
_RainfallRunoffFollowing the works of Aparicio et al., 2024. The data is taken from the Zenodo repository. This dataset contains 484 stations with 55 dynamic (time series) features and 214 static features. The dynamic features span from 1951 to 2021.
Examples
>>> from aqua_fetch import Bull >>> dataset = Bull() ... # get data by station id >>> _, dynamic = dataset.fetch(stations='BULL_9007', as_dataframe=True) >>> df = dynamic['BULL_9007'] # dynamic is a dictionary of with keys as station names and values as DataFrames >>> df.shape (25932, 55) ... ... # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 484 ... # get data of 10 % of stations as dataframe >>> _, dynamic = dataset.fetch(0.1, as_dataframe=True) >>> len(dynamic) # dynamic has data for 10% of stations (48 out of 484) 48 ... ... # dynamic is a dictionary whose values are dataframes of dynamic features >>> [df.shape for df in dynamic.values()] [(25932, 55), (25932, 55), (25932, 55),... (25932, 55), (25932, 55)] ... ... get the data of a single (randomly selected) station >>> _, dynamic = dataset.fetch(stations=1, as_dataframe=True) >>> len(dynamic) # dynamic has data for 1 station 1 ... # get names of available dynamic features >>> dataset.dynamic_features ... # get only selected dynamic features >>> _, dynamic = dataset.fetch('BULL_9007', as_dataframe=True, ... dynamic_features=['pet_mm_AEMET', 'airtemp_C_mean_AEMET', 'pcp_mm_ERA5Land', 'q_obs_cms']) >>> dynamic['BULL_9007'].shape (25932, 4) ... ... # get names of available static features >>> dataset.static_features ... # get data of 10 random stations >>> _, dynamic = dataset.fetch(10, as_dataframe=True) >>> len(dynamic) # remember this is a dictionary with values as dataframe 10 ... # If we get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='BULL_9007', static_features="all", as_dataframe=True) >>> static.shape, len(dynamic), dynamic['BULL_9007'].shape ((1, 214), 1, (25932, 55)) ... # If we don't set as_dataframe=True and have xarray installed then the returned data will be a xarray Dataset >>> _, dynamic = dataset.fetch(10) ... type(dynamic) xarray.core.dataset.Dataset ... >>> dynamic.dims FrozenMappingWarningOnValuesAccess({'time': 25932, 'dynamic_features': 55}) ... >>> len(dynamic.data_vars) 10 ... >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape (484, 2) >>> dataset.stn_coords('BULL_9007') # returns coordinates of station whose id is BULL_9007 41.298 -1.967 >>> dataset.stn_coords(['BULL_9007', 'BULL_8083']) # returns coordinates of two stations ... # get area of a single station >>> dataset.area('BULL_9007') # get coordinates of two stations >>> dataset.area(['BULL_9007', 'BULL_8083']) ... # if fiona library is installed we can get the boundary as fiona Geometry >>> dataset.get_boundary('BULL_9007')
- __init__(path, overwrite=False, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str) – This can only be set for datasets which are available at multiple timesteps such as LamaHCE or LamaHIce etc.
to_netcdf (bool) – whether the data should be saved in netCDF format or not If set to true, the data will be saved in netCDF format which can take time for the first time it is created. However, it leads to faster I/O operations in subsequent accesses.
overwrite (bool) – whether to overwrite existing files or not. If set to True, the data will be redownloaded.
verbosity (int) –
This parameter determines the level of verbosity for logging messages.
0: no message will be printed
1: only important messages will be printed
>1: any higher value greater than 1 will result in more verbose output
kwargs – Any other keyword arguments for the parent
Datasetsclass
- property dyn_map
A dictionary that maps dynamic features to their names in the dataset.
- property dynamic_features: List[str]
Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_dynamic_features().- Return type:
List[str]
- property end
end of data
- property static_features: List[str]
Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_static_features().- Return type:
List[str]
- class aqua_fetch.rr.CABra(path=None, overwrite=False, to_netcdf: bool = True, met_src: str = 'ens', **kwargs)[source]
Bases:
_RainfallRunoffReads and fetches CABra dataset which is catchment attribute dataset following the work of Almagro et al., 2021 This dataset consists of 87 static and 13 dynamic features of 735 Brazilian catchments. The temporal extent is from 1980 to 2020. The dyanmic features consist of daily hydro-meteorological time series
Examples
>>> from aqua_fetch import CABra >>> dataset = CABra() ... # get data by station id >>> _, dynamic = dataset.fetch(stations='92', as_dataframe=True) >>> df = dynamic['92'] # dynamic is a dictionary of with keys as station names and values as DataFrames >>> df.shape (10956, 13) ... ... # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 735 ... # get data of 10 % of stations as dataframe >>> _, dynamic = dataset.fetch(0.1, as_dataframe=True) >>> len(dynamic) # dynamic has data for 10% of stations (73 out of 735) 73 ... ... # dynamic is a dictionary whose values are dataframes of dynamic features >>> [df.shape for df in dynamic.values()] [(10956, 13), (10956, 13), (10956, 13),... (10956, 13), (10956, 13)] ... ... get the data of a single (randomly selected) station >>> _, dynamic = dataset.fetch(stations=1, as_dataframe=True) >>> len(dynamic) # dynamic has data for 1 station 1 ... # get names of available dynamic features >>> dataset.dynamic_features ... # get only selected dynamic features >>> _, dynamic = dataset.fetch('92', as_dataframe=True, ... dynamic_features=['pcp_mm_ens', 'airtemp_C_ens_max', 'pet_mm_pm', 'rh_%_ens', 'q_cms_obs']) >>> dynamic['92'].shape (10956, 4) ... ... # get names of available static features >>> dataset.static_features ... # get data of 10 random stations >>> _, dynamic = dataset.fetch(10, as_dataframe=True) >>> len(dynamic) # remember this is a dictionary with values as dataframe 10 ... # If we get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='92', static_features="all", as_dataframe=True) >>> static.shape, len(dynamic), dynamic['92'].shape ((1, 87), 1, (10956, 13))
# If we don’t set as_dataframe=True and have xarray installed then the returned data will be a xarray Dataset >>> _, dynamic = dataset.fetch(10) … type(dynamic) xarray.core.dataset.Dataset … >>> dynamic.dims FrozenMappingWarningOnValuesAccess({‘time’: 10956, ‘dynamic_features’: 13}) … >>> len(dynamic.data_vars) 10 … >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape
(735, 2)
>>> dataset.stn_coords('92') # returns coordinates of station whose id is 92 -2.509 -47.764 >>> dataset.stn_coords(['92', '5']) # returns coordinates of two stations ... # get area of a single station >>> dataset.area('92') # get coordinates of two stations >>> dataset.area(['92', '5']) ... # if fiona library is installed we can get the boundary as fiona Geometry >>> dataset.get_boundary('92')
- __init__(path=None, overwrite=False, to_netcdf: bool = True, met_src: str = 'ens', **kwargs)[source]
- Parameters:
path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless
overwriteis set to True.overwrite (bool) – If the data is already down then you can set it to True, to make a fresh download.
to_netcdf (bool) – whether to convert all the data into one netcdf file or not. This will fasten repeated calls to fetch etc but will require netCDF4 package as well as xarry.
met_src (str) – source of meteorological data, must be one of
ens,era5orref.
- property boundary_id_map: str
Name of the attribute in the boundary (.shp/.gpkg) file that will be used to map the catchment/station id to the geometry of the catchment/station. This is used to create the boundary id map.
- property dyn_fname: str | PathLike
name of the .nc file which contains dynamic features. This file is created during dataset initialization only if to_netcdf is True and xarray is installed and the file does not already exists. The creation of this file can take some time however it leads to faster I/O operations.
- property dyn_map
A dictionary that maps dynamic features to their names in the dataset.
- property dynamic_features: List[str]
Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_dynamic_features().- Return type:
List[str]
- property end: Timestamp
end of data
- property static_map: Dict[str, str]
A dictionary that maps static features to their names in the dataset.
- class aqua_fetch.rr.CAMELSH(path=None, overwrite=False, timestep='H', **kwargs)[source]
Bases:
_RainfallRunoffHourly data of 5,767 catchments from United States of America with 13 dynamic features and 779 static features for each catchment. For more details on data see Tran et al., (2025) . The dynamic features span from 19800101 to 20241231 . The data is downloaded from Zenodo.
Please note that usage of this dataset requires xarray and netCDF4 libraries.
Examples
>>> from aqua_fetch import CAMELSH >>> dataset = CAMELSH() ... # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 5767 ... # get data by station id/name >>> _, dynamic = dataset.fetch(stations='02342070', as_dataframe=True) >>> df = dynamic['02342070'] # dynamic is a dictionary of with keys as station names and values as DataFrames >>> df.shape (394488, 13) ... ... # get data of 10 % of stations as dataframe >>> _, dynamic = dataset.fetch(0.1, as_dataframe=True) >>> len(dynamic) # dynamic has data for 10% of stations (67 out of 5767) 67 ... ... # dynamic is a dictionary whose values are dataframes of dynamic features >>> [df.shape for df in dynamic.values()] [(394488, 13), (394488, 8), (394488, 13),... (394488, 13), (394488, 13)] ... ... get the data of a single (randomly selected) station >>> _, dynamic = dataset.fetch(stations=1, as_dataframe=True) >>> len(dynamic) # dynamic has data for 1 station 1 ... # get names of available dynamic features >>> dataset.dynamic_features ... # get only selected dynamic features >>> _, dynamic = dataset.fetch('02342070', as_dataframe=True, ... dynamic_features=['SWdown', 'pcp_mm', 'pet_mm', 'airtemp_C_mean', 'q_cms_obs']) >>> dynamic['02342070'].shape (394488, 5) ... ... # get names of available static features >>> dataset.static_features ... # get data of 10 random stations >>> _, dynamic = dataset.fetch(10, as_dataframe=True) >>> len(dynamic) # remember this is a dictionary with values as dataframe 10 ... # If we get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='02342070', static_features="all", as_dataframe=True) >>> static.shape, len(dynamic), dynamic['02342070'].shape ((1, 779), 1, (394488, 13)) ... # If we don't set as_dataframe=True and have xarray installed then the returned data will be a xarray Dataset >>> _, dynamic = dataset.fetch(10) ... type(dynamic) xarray.core.dataset.Dataset ... >>> dynamic.dims FrozenMappingWarningOnValuesAccess({'time': 394488, 'dynamic_features': 8}) ... >>> len(dynamic.data_vars) 10 ... >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape (5767, 2) >>> dataset.stn_coords('02342070') # returns coordinates of station whose id is 02342070 32.37431 -84.957993 >>> dataset.stn_coords(['02342070', '14316700']) # returns coordinates of two stations ... # get area of a single station >>> dataset.area('02342070') # get coordinates of two stations >>> dataset.area(['02342070', '14316700']) ... # if fiona library is installed we can get the boundary as fiona Geometry >>> dataset.get_boundary('02342070')
- __init__(path=None, overwrite=False, timestep='H', **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str) – This can only be set for datasets which are available at multiple timesteps such as LamaHCE or LamaHIce etc.
to_netcdf (bool) – whether the data should be saved in netCDF format or not If set to true, the data will be saved in netCDF format which can take time for the first time it is created. However, it leads to faster I/O operations in subsequent accesses.
overwrite (bool) – whether to overwrite existing files or not. If set to True, the data will be redownloaded.
verbosity (int) –
This parameter determines the level of verbosity for logging messages.
0: no message will be printed
1: only important messages will be printed
>1: any higher value greater than 1 will result in more verbose output
kwargs – Any other keyword arguments for the parent
Datasetsclass
- property boundary_id_map: str
Name of the attribute in the boundary (shapefile/.gpkg) file that will be used to map the catchment/station id to the geometry of the catchment/station. This is used to create the boundary id map. if not given, then the first attribute in the boundary file will be used.
- collate_forcing_data()[source]
Collate forcing data of all stations into a single NetCDF file using multiprocessing.
- property dyn_map: Dict[str, str]
A dictionary that maps dynamic features to their names in the dataset.
- property dynamic_features: List[str]
Returns a list of dynamic features that are available in the dataset.
- Returns:
a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_dynamic_features().- Return type:
List[str]
- fetch_q(stations: List[str] = 'all')[source]
Since fetching q from other methods can be slower because of merging with other dynamic (forcing) features, this method fetches only observed streamflow data for given stations using multiprocessing.
- Returns:
xarray Dataset whose data variables are station names and dimensions are time and dynamic features
- Return type:
xr.Dataset
- fetch_stations_features(stations: str | List[str], dynamic_features: str | List[str] = 'all', static_features: str | List[str] = None, st: str | Timestamp = None, en: str | Timestamp = None, as_dataframe: bool = False, **kwargs) Tuple[DataFrame, Dict[str, DataFrame] | Dataset][source]
Reads features of more than one stations.
- Parameters:
stations – list of stations for which data is to be fetched.
dynamic_features – list of dynamic features to be fetched. if
all, then all dynamic features will be fetched.static_features (list of static features to be fetched.) – If
all, then all static features will be fetched. If None, `then no static attribute will be fetched.st – start of data to be fetched.
en – end of data to be fetched.
as_dataframe – whether to return the dynamic data as pandas dataframe. default is
xarray.Datasetobjectdict (kwargs) – additional keyword arguments
- Returns:
tuple – A tuple of static and dynamic features. Static features are always returned as
pandas.DataFramewith shape (stations, staticfeatures). The index of static features is the station/gauge ids while the columns are the static features. Dynamic features are returned as eitherxarray.Datasetor adictwith keys as station names and values aspandas.DataFramedepending upon whether as_dataframe is True or False and whether the xarray module is installed or not. If dynamic features are xarray Dataset, then it consists of data_vars equal to the number of stations and time and dynamic_features as dimensions.Raises – ValueError, if both dynamic_features and static_features are None
Examples
>>> from aqua_fetch import CAMELSH >>> dataset = CAMELSH() ... # find out station ids >>> dataset.stations() ... # get data of selected stations as xarray Dataset >>> dataset.fetch_stations_features(['01141800', '02349900', '11062000']) ... # get data of selected stations as dictionary of pandas DataFrame >>> dataset.fetch_stations_features(['01141800', '02349900', '11062000'], ... as_dataframe=True) ... # get both dynamic and static features of selected stations >>> dataset.fetch_stations_features(['01141800', '02349900', '11062000'], ... dynamic_features=['q_mm_obs', 'air_temp_C', 'pcp_mm'], static_features=['elev_catch_m'])
- q_mm(stations: str | List[str] = 'all', as_dataframe: bool = True) DataFrame[source]
returns streamflow in the units of milimeter per timestep (mm/hour). This is obtained by diving
qby area.- Parameters:
stations (str/list) – name/names of stations. Default is
all, which will return q_mm of all stationsas_dataframe (bool) – whether to return the data as pandas DataFrame. Default is True. Setting it to False will return xarray Dataset and can be faster.
- Returns:
a
pandas.DataFramewhose indices are time-steps and columns are catchment/station ids.- Return type:
pd.DataFrame or xr.Dataset
- class aqua_fetch.rr.CAMELS_AUS(path: str = None, version: int = 2, to_netcdf: bool = True, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]
Bases:
_RainfallRunoffThis is a dataset of 561 Australian catchments with 187 static features and 28 dyanmic features for each catchment. The dyanmic features are timeseries from 1950-01-01 to 2022-03-31. By default this class reads version 2 of CAMELS-AUS dataset following Fowler et al., 2024 .
If
versionis 1 then this class reads data following Fowler et al., 2021 which is a dataset of 222 Australian catchments with 161 static features and 26 dyanmic features for each catchment. The dyanmic features are timeseries from 1957-01-01 to 2018-12-31.Examples
>>> from aqua_fetch import CAMELS_AUS >>> dataset = CAMELS_AUS() ... # get data by station id >>> _, dynamic = dataset.fetch(stations='912101A', as_dataframe=True) >>> df = dynamic['912101A'] # dynamic is a dictionary of with keys as station names and values as DataFrames >>> df.shape (26388, 28) ... ... # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 561 ... # get data of 10 % of stations as dataframe >>> _, dynamic = dataset.fetch(0.1, as_dataframe=True) >>> len(dynamic) # dynamic has data for 10% of stations (56 out of 561) 56 ... ... # dynamic is a dictionary whose values are dataframes of dynamic features >>> [df.shape for df in dynamic.values()] [(26388, 28), (26388, 28), (26388, 28),... (26388, 28), (26388, 28)] ... ... get the data of a single (randomly selected) station >>> _, dynamic = dataset.fetch(stations=1, as_dataframe=True) >>> len(dynamic) # dynamic has data for 1 station 1 ... # get names of available dynamic features >>> dataset.dynamic_features ... # get only selected dynamic features >>> _, dynamic = dataset.fetch('912101A', as_dataframe=True, ... dynamic_features=['airtemp_C_awap_max', 'pcp_mm_awap', 'et_morton_actual_SILO', 'q_cms_obs']) >>> dynamic['912101A'].shape (26388, 4) ... ... # get names of available static features >>> dataset.static_features ... # get data of 10 random stations >>> _, dynamic = dataset.fetch(10, as_dataframe=True) >>> len(dynamic) # remember this is a dictionary with values as dataframe 10 ... # If we get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='912101A', static_features="all", as_dataframe=True) >>> static.shape, len(dynamic), dynamic['912101A'].shape ((1, 187), 1, (26388, 28)) ... # If we don't set as_dataframe=True and have xarray installed then the returned data will be a xarray Dataset >>> _, dynamic = dataset.fetch(10) ... type(dynamic) xarray.core.dataset.Dataset ... >>> dynamic.dims FrozenMappingWarningOnValuesAccess({'time': 26388, 'dynamic_features': 28}) ... >>> len(dynamic.data_vars) 10 ... >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape (561, 2) >>> dataset.stn_coords('912101A') # returns coordinates of station whose id is 912101A -38.214199 -71.8283 >>> dataset.stn_coords(['912101A', '912105A']) # returns coordinates of two stations ... # get area of a single station >>> dataset.area('912101A') # get coordinates of two stations >>> dataset.area(['912101A', '912105A']) ... # if fiona library is installed we can get the boundary as fiona Geometry >>> dataset.get_boundary('912101A') ... # The version 1 can be of CAMELS_AUS can be accessed as below >>> dataset = CAMELS_AUS(version=1) >>> len(dataset.stations()) 222 >>> _, dynamic = dataset.fetch(stations='912101A', as_dataframe=True) >>> dynamic['912101A'].shape (23376, 26)
- __init__(path: str = None, version: int = 2, to_netcdf: bool = True, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]
- Parameters:
path – path where the CAMELS_AUS dataset has been downloaded. This path must contain five zip files and one xlsx file. If None, then the data will be downloaded.
version – version of the dataset to download. Allowed values are 1 and 2.
to_netcdf
- property dyn_map
A dictionary that maps dynamic features to their names in the dataset.
- property dynamic_features: list
Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_dynamic_features().- Return type:
List[str]
- property end
end of data
- property static_features: List[str]
Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_static_features().- Return type:
List[str]
- class aqua_fetch.rr.CAMELS_BR(path=None, verbosity: int = 1, **kwargs)[source]
Bases:
_RainfallRunoffThis is a dataset of 897 Brazilian catchments with 67 static features and 10 dyanmic features for each catchment. The dyanmic features are timeseries from 1920-01-01 to 2019-02-28. This class downloads and processes CAMELS dataset of Brazil as provided by VP Changas et al., 2020 . The simulated streamflow of 593 and raw streamflow of 3679 stations shipped with this data is not included in dynamic features. Both can be fetched through fetch_simulated_streamflow and fetch_raw_streamflow methods.
Examples
>>> from aqua_fetch import CAMELS_BR >>> dataset = CAMELS_BR() ... # get data by station id >>> _, dynamic = dataset.fetch(stations='46035000', as_dataframe=True) >>> df = dynamic['46035000'] # dynamic is a dictionary of with keys as station names and values as DataFrames >>> df.shape (14245, 10) ... ... # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 593 ... # get data of 10 % of stations as dataframe >>> _, dynamic = dataset.fetch(0.1, as_dataframe=True) >>> len(dynamic) # dynamic has data for 10% of stations (59 out of 593) 59 ... ... # dynamic is a dictionary whose values are dataframes of dynamic features >>> [df.shape for df in dynamic.values()] [(14245, 10), (14245, 10), (14245, 10),... (14245, 10), (14245, 10)] ... ... get the data of a single (randomly selected) station >>> _, dynamic = dataset.fetch(stations=1, as_dataframe=True) >>> len(dynamic) # dynamic has data for 1 station 1 ... # get names of available dynamic features >>> dataset.dynamic_features ... # get only selected dynamic features >>> _, dynamic = dataset.fetch('46035000', as_dataframe=True, ... dynamic_features=['pcp_mm_cpc', 'aet_mm_mgb', 'airtemp_C_mean', 'q_cms_obs']) >>> dynamic['46035000'].shape (14245, 4) ... ... # get names of available static features >>> dataset.static_features ... # get data of 10 random stations >>> _, dynamic = dataset.fetch(10, as_dataframe=True) >>> len(dynamic) # remember this is a dictionary with values as dataframe 10 ... # If we get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='46035000', static_features="all", as_dataframe=True) >>> static.shape, len(dynamic), dynamic['46035000'].shape ((1, 67), 1, (14245, 10)) ... # If we don't set as_dataframe=True and have xarray installed then the returned data will be a xarray Dataset >>> _, dynamic = dataset.fetch(10) ... type(dynamic) xarray.core.dataset.Dataset
>>> dynamic.dims FrozenMappingWarningOnValuesAccess({'time': 14245, 'dynamic_features': 10}) ... >>> len(dynamic.data_vars) 10 ... >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape (593, 2) >>> dataset.stn_coords('46035000') # returns coordinates of station whose id is 46035000 -12.8686 -43.3797 >>> dataset.stn_coords(['46035000', '57170000']) # returns coordinates of two stations ... # get area of a single station >>> dataset.area('46035000') # get coordinates of two stations >>> dataset.area(['46035000', '57170000']) ... # if fiona library is installed we can get the boundary as fiona Geometry >>> dataset.get_boundary('46035000')
- __init__(path=None, verbosity: int = 1, **kwargs)[source]
- Parameters:
path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless
overwriteis set to True.
- all_stations(feature: str) List[str][source]
Tells all station ids for which a data of a specific attribute is available.
- area(stations: str | List[str] = 'all', source: str = 'gsim') Series[source]
Returns area (Km2) of all catchments as
pandas.Series- Parameters:
stations (str/list) – name/names of stations. Default is None, which will return area of all stations
source (str) – source of area calculation. It should be either
gsimorana
- Returns:
a
pandas.Serieswhose indices are catchment ids and values are areas of corresponding catchments.- Return type:
pd.Series
Examples
>>> from aqua_fetch import CAMELS_BR >>> dataset = CAMELS_BR() >>> dataset.area() # returns area of all stations >>> dataset.stn_coords('65100000') # returns area of station whose id is 912101A >>> dataset.stn_coords(['65100000', '64075000']) # returns area of two stations
- property boundary_id_map: str
Name of the attribute in the boundary (.shp/.gpkg) file that will be used to map the catchment/station id to the geometry of the catchment/station. This is used to create the boundary id map.
- property dyn_map
A dictionary that maps dynamic features to their names in the dataset.
- property dynamic_features: List[str]
Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_dynamic_features().- Return type:
List[str]
- property end
end of data
- fetch_raw_streamflow(stations: str = None) DataFrame[source]
returns raw streamflow data for one or more stations.
Example
>>> dataset = CAMELS_BR() >>> data = dataset.fetch_raw_streamflow('10500000') ... # fetch all time series data associated with a station. >>> x = dataset.fetch_raw_streamflow(dataset.all_stations())
- fetch_simulated_streamflow(stations: str = None) DataFrame[source]
returns raw streamflow data for one or more stations.
Example
>>> dataset = CAMELS_BR() >>> data = dataset.fetch_simulated_streamflow('10500000') ... # fetch all time series data associated with a station. >>> x = dataset.fetch_simulated_streamflow(dataset.all_stations())
- q_mm(stations: str | List[str] = 'all') DataFrame[source]
returns streamflow in the units of milimeter per day. he name of original timeseries is
streamflow_mm.- Parameters:
stations (str/list) – name/names of stations. Default is None, which will return area of all stations
- Returns:
a
pandas.DataFramewhose indices are time-steps and columns are catchment/station ids.- Return type:
pd.DataFrame
- property static_features
Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_static_features().- Return type:
List[str]
- property static_map: Dict[str, str]
A dictionary that maps static features to their names in the dataset.
- stations() List[str][source]
Returns a list of station ids.
Example
>>> dataset = CAMELS_BR() >>> stations = dataset.stations()
- stn_coords(stations: str | List[str] = 'all') DataFrame[source]
returns coordinates of stations as
pandas.DataFramewithlongandlatas columns.- Parameters:
stations – name/names of stations. If not given, coordinates of all stations will be returned.
- Returns:
pandas.DataFramewithlongandlatcolumns. The length of dataframe will be equal to number of stations wholse coordinates are to be fetched.- Return type:
pd.DataFrame
Examples
>>> dataset = CAMELS_BR() >>> dataset.stn_coords() # returns coordinates of all stations >>> dataset.stn_coords('65100000') # returns coordinates of station whose id is 912101A >>> dataset.stn_coords(['65100000', '64075000']) # returns coordinates of two stations
- class aqua_fetch.rr.CAMELS_CH(path=None, overwrite: bool = False, to_netcdf: bool = True, timestep: str = 'D', **kwargs)[source]
Bases:
_RainfallRunoffData of 331 Swiss catchments from Hoege et al., 2023 . The dataset consists of 209 static catchment features and 9 dynamic features. The dynamic features span from 19810101 to 20201231 with daily timestep. For daily (
D)timestep, only streamflow is available for 170 swiss catchments. The hourly (H) streamflow data is obtained from Kauzlaric et al., 2023 .Examples
>>> from aqua_fetch import CAMELS_CH >>> dataset = CAMELS_CH() ... # get data by station id >>> _, dynamic = dataset.fetch(stations='2004', as_dataframe=True) >>> df = dynamic['2004'] # dynamic is a dictionary of with keys as station names and values as DataFrames >>> df.shape (14610, 9) ... ... # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 331 ... # get data of 10 % of stations as dataframe >>> _, dynamic = dataset.fetch(0.1, as_dataframe=True) >>> len(dynamic) # dynamic has data for 10% of stations (33 out of 331) 33 ... ... # dynamic is a dictionary whose values are dataframes of dynamic features >>> [df.shape for df in dynamic.values()] [(14610, 9), (14610, 9), (14610, 9),... (14610, 9), (14610, 9)] ... ... get the data of a single (randomly selected) station >>> _, dynamic = dataset.fetch(stations=1, as_dataframe=True) >>> len(dynamic) # dynamic has data for 1 station 1 ... # get names of available dynamic features >>> dataset.dynamic_features ... # get only selected dynamic features >>> _, dynamic = dataset.fetch('2004', as_dataframe=True, ... dynamic_features=['pcp_mm', 'airtemp_C_mean', 'q_cms_obs']) >>> dynamic['2004'].shape (14610, 3) ... ... # get names of available static features >>> dataset.static_features ... # get data of 10 random stations >>> _, dynamic = dataset.fetch(10, as_dataframe=True) >>> len(dynamic) # remember this is a dictionary with values as dataframe 10 ... # If we get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='2004', static_features="all", as_dataframe=True) >>> static.shape, len(dynamic), dynamic['2004'].shape ((1, 209), 1, (14610, 9)) ... # If we don't set as_dataframe=True and have xarray installed then the returned data will be a xarray Dataset >>> _, dynamic = dataset.fetch(10) ... type(dynamic) xarray.core.dataset.Dataset ... >>> dynamic.dims FrozenMappingWarningOnValuesAccess({'time': 14610, 'dynamic_features': 9}) ... >>> len(dynamic.data_vars) 10 ... >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape (331, 2) >>> dataset.stn_coords('2004') # returns coordinates of station whose id is 2004 47.925221 8.191595 >>> dataset.stn_coords(['2004', '2007']) # returns coordinates of two stations ... # get area of a single station >>> dataset.area('2004') # get coordinates of two stations >>> dataset.area(['2004', '2007']) ... # if fiona library is installed we can get the boundary as fiona Geometry >>> dataset.get_boundary('2004')
- __init__(path=None, overwrite: bool = False, to_netcdf: bool = True, timestep: str = 'D', **kwargs)[source]
- Parameters:
path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless
overwriteis set to True.overwrite (bool) – If the data is already down then you can set it to True, to make a fresh download.
to_netcdf (bool) – whether to convert all the data into one netcdf file or not. This will fasten repeated calls to fetch etc. but will require netCDF4 package as well as xarry.
- property dyn_map
A dictionary that maps dynamic features to their names in the dataset.
- property dynamic_features: List[str]
Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_dynamic_features().- Return type:
List[str]
- property end
end of data
- glacier_attrs() DataFrame[source]
- returns a dataframe with four columns
‘glac_area’
‘glac_vol’
‘glac_mass’
‘glac_area_neighbours’
- hourly_stations() List[str][source]
IDs of those stations which have hourly data and which are also part of CAMELS-CH dataset
- property static_features
Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_static_features().- Return type:
List[str]
- class aqua_fetch.rr.CAMELS_CL(path: str = None, **kwargs)[source]
Bases:
_RainfallRunoffThis is a dataset of 516 Chilean catchments with 104 static features and 12 dyanmic features for each catchment. The dyanmic features are timeseries from 1913-02-15 to 2018-03-09. This class downloads and processes CAMELS dataset of Chile following the work of Alvarez-Garreton et al., 2018 .
Examples
>>> from aqua_fetch import CAMELS_CL >>> dataset = CAMELS_CL() ... # get data by station id >>> _, dynamic = dataset.fetch(stations='8350001', as_dataframe=True) >>> df = dynamic['8350001'] # dynamic is a dictionary of with keys as station names and values as DataFrames >>> df.shape (38374, 12) ... ... # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 516 ... # get data of 10 % of stations as dataframe >>> _, dynamic = dataset.fetch(0.1, as_dataframe=True) >>> len(dynamic) # dynamic has data for 10% of stations (51 out of 516) 51 ... ... # dynamic is a dictionary whose values are dataframes of dynamic features >>> [df.shape for df in dynamic.values()] [(38374, 12), (38374, 12), (38374, 12),... (38374, 12), (38374, 12)] ... ... get the data of a single (randomly selected) station >>> _, dynamic = dataset.fetch(stations=1, as_dataframe=True) >>> len(dynamic) # dynamic has data for 1 station 1 ... # get names of available dynamic features >>> dataset.dynamic_features ... # get only selected dynamic features >>> _, dynamic = dataset.fetch('8350001', as_dataframe=True, ... dynamic_features=['pet_mm_hargreaves', 'pcp_mm_mswep', 'airtemp_C_mean', 'q_cms_obs']) >>> dynamic['8350001'].shape (38374, 4) ... ... # get names of available static features >>> dataset.static_features ... # get data of 10 random stations >>> _, dynamic = dataset.fetch(10, as_dataframe=True) >>> len(dynamic) # remember this is a dictionary with values as dataframe 10 ... # If we get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='8350001', static_features="all", as_dataframe=True) >>> static.shape, len(dynamic), dynamic['8350001'].shape ((1, 104), 1, (38374, 12)) ... # If we don't set as_dataframe=True and have xarray installed then the returned data will be a xarray Dataset >>> _, dynamic = dataset.fetch(10) ... type(dynamic) xarray.core.dataset.Dataset ... >>> dynamic.dims FrozenMappingWarningOnValuesAccess({'time': 38374, 'dynamic_features': 12}) ... >>> len(dynamic.data_vars) 10 ... >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape (516, 2) >>> dataset.stn_coords('8350001') # returns coordinates of station whose id is 8350001 -38.214199 -71.8283 >>> dataset.stn_coords(['8350001', '3820003']) # returns coordinates of two stations ... # get area of a single station >>> dataset.area('8350001') # get coordinates of two stations >>> dataset.area(['8350001', '3820003']) ... # if fiona library is installed we can get the boundary as fiona Geometry >>> dataset.get_boundary('8350001')
- __init__(path: str = None, **kwargs)[source]
- Parameters:
path – path where the CAMELS-CL dataset has been downloaded. This path must contain five zip files and one xlsx file.
- property dyn_map
A dictionary that maps dynamic features to their names in the dataset.
- property dynamic_features: List[str]
Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_dynamic_features().- Return type:
List[str]
- property end
end of data
- property static_features: List[str]
Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_static_features().- Return type:
List[str]
- property static_map: Dict[str, str]
A dictionary that maps static features to their names in the dataset.
- stations() list[source]
Tells all station ids for which a data of a specific attribute is available.
- stn_coords(stations: str | List[str] = 'all') DataFrame[source]
returns coordinates of stations as DataFrame with
longandlatas columns.- Parameters:
stations – name/names of stations. If not given, coordinates of all stations will be returned.
- Returns:
pandas.DataFramewithlongandlatcolumns. The length of dataframe will be equal to number of stations wholse coordinates are to be fetched.- Return type:
pd.DataFrame
Examples
>>> dataset = CAMELS_CL() >>> dataset.stn_coords() # returns coordinates of all stations >>> dataset.stn_coords('12872001') # returns coordinates of station whose id is 912101A >>> dataset.stn_coords(['12872001', '12876004']) # returns coordinates of two stations
- class aqua_fetch.rr.CAMELS_COL(path=None, overwrite=False, to_netcdf: bool = True, **kwargs)[source]
Bases:
_RainfallRunoffDataset of 347 catchments from Colombia following the works of Jimenez et al., 2025. The dataset consists of 255 static catchment features and 6 dynamic features. The dynamic features span from 19810101 to 20221231 with daily timestep. The data is downloaded from Zenodo.
Examples
>>> from aqua_fetch import CAMELS_COL >>> dataset = CAMELS_COL() ... # get data by station id >>> _, dynamic = dataset.fetch(stations='35067040', as_dataframe=True) >>> df = dynamic['35067040'] # dynamic is a dictionary of with keys as station names and values as DataFrames >>> df.shape (15340, 6) ... ... # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 347 ... # get data of 10 % of stations as dataframe >>> _, dynamic = dataset.fetch(0.1, as_dataframe=True) >>> len(dynamic) # dynamic has data for 10% of stations (34 out of 347) 34 ... ... # dynamic is a dictionary whose values are dataframes of dynamic features >>> [df.shape for df in dynamic.values()] [(15340, 6), (15340, 6), (15340, 6),... (15340, 6), (15340, 6)] ... ... get the data of a single (randomly selected) station >>> _, dynamic = dataset.fetch(stations=1, as_dataframe=True) >>> len(dynamic) # dynamic has data for 1 station 1 ... # get names of available dynamic features >>> dataset.dynamic_features ... # get only selected dynamic features >>> _, dynamic = dataset.fetch('35067040', as_dataframe=True, ... dynamic_features=['pcp_mm', 'airtemp_C_mean', 'pet_mm', 'q_cms_obs']) >>> dynamic['35067040'].shape (15340, 4) ... ... # get names of available static features >>> dataset.static_features ... # get data of 10 random stations >>> _, dynamic = dataset.fetch(10, as_dataframe=True) >>> len(dynamic) # remember this is a dictionary with values as dataframe 10 ... # If we get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='35067040', static_features="all", as_dataframe=True) >>> static.shape, len(dynamic), dynamic['35067040'].shape ((1, 255), 1, (15340, 6)) ... # If we don't set as_dataframe=True and have xarray installed then the returned data will be a xarray Dataset >>> _, dynamic = dataset.fetch(10) ... type(dynamic) xarray.core.dataset.Dataset ... >>> dynamic.dims FrozenMappingWarningOnValuesAccess({'time': 15340, 'dynamic_features': 6}) ... >>> len(dynamic.data_vars) 10 ... >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape (347, 2) >>> dataset.stn_coords('35067040') # returns coordinates of station whose id is 35067040 4.746433 -73.587807 >>> dataset.stn_coords(['35067040', '21187030']) # returns coordinates of two stations ... # get area of a single station >>> dataset.area('35067040') # get coordinates of two stations >>> dataset.area(['35067040', '21187030']) ... # if fiona library is installed we can get the boundary as fiona Geometry >>> dataset.get_boundary('35067040')
- __init__(path=None, overwrite=False, to_netcdf: bool = True, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str) – This can only be set for datasets which are available at multiple timesteps such as LamaHCE or LamaHIce etc.
to_netcdf (bool) – whether the data should be saved in netCDF format or not If set to true, the data will be saved in netCDF format which can take time for the first time it is created. However, it leads to faster I/O operations in subsequent accesses.
overwrite (bool) – whether to overwrite existing files or not. If set to True, the data will be redownloaded.
verbosity (int) –
This parameter determines the level of verbosity for logging messages.
0: no message will be printed
1: only important messages will be printed
>1: any higher value greater than 1 will result in more verbose output
kwargs – Any other keyword arguments for the parent
Datasetsclass
- property dynamic_features: List[str]
Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_dynamic_features().- Return type:
List[str]
- property end: Timestamp
end of data
- class aqua_fetch.rr.CAMELS_DE(path=None, overwrite: bool = False, to_netcdf: bool = True, verbosity: int = 1, **kwargs)[source]
Bases:
_RainfallRunoffThis is the data from 1582 German catchments following the work of Loritz et al., 2024 . The data is downloaded from zenodo . This data consists of 111 static and 21 dynamic features. The dynamic features span from 1951-01-01 to 2020-12-31 with daily timestep.
Examples
>>> from aqua_fetch import CAMELS_DE >>> dataset = CAMELS_DE() ... # get data by station id >>> _, dynamic = dataset.fetch(stations='DE110260', as_dataframe=True) >>> df = dynamic['DE110260'] # dynamic is a dictionary of with keys as station names and values as DataFrames >>> df.shape (25568, 21) ... ... # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 1582 ... # get data of 10 % of stations as dataframe >>> _, dynamic = dataset.fetch(0.1, as_dataframe=True) >>> len(dynamic) # dynamic has data for 10% of stations (155 out of 1582) 155 ... ... # dynamic is a dictionary whose values are dataframes of dynamic features >>> [df.shape for df in dynamic.values()] [(25568, 21), (25568, 21), (25568, 21),... (25568, 21), (25568, 21)] ... ... get the data of a single (randomly selected) station >>> _, dynamic = dataset.fetch(stations=1, as_dataframe=True) >>> len(dynamic) # dynamic has data for 1 station 1 ... # get names of available dynamic features >>> dataset.dynamic_features ... # get only selected dynamic features >>> _, dynamic = dataset.fetch('DE110260', as_dataframe=True, ... dynamic_features=['airtemp_C_mean', 'rh_%', 'pcp_mm_mean', 'q_cms_obs']) >>> dynamic['DE110260'].shape (25568, 4) ... ... # get names of available static features >>> dataset.static_features ... # get data of 10 random stations >>> _, dynamic = dataset.fetch(10, as_dataframe=True) >>> len(dynamic) # remember this is a dictionary with values as dataframe 10 ... # If we get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='DE110260', static_features="all", as_dataframe=True) >>> static.shape, len(dynamic), dynamic['DE110260'].shape ((1, 111), 1, (25568, 21)) ... # If we don't set as_dataframe=True and have xarray installed then the returned data will be a xarray Dataset >>> _, dynamic = dataset.fetch(10) ... type(dynamic) xarray.core.dataset.Dataset ... >>> dynamic.dims FrozenMappingWarningOnValuesAccess({'time': 25568, 'dynamic_features': 21}) ... >>> len(dynamic.data_vars) 10 ... >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape (1582, 2) >>> dataset.stn_coords('DE110260') # returns coordinates of station whose id is DE110260 47.925221 8.191595 >>> dataset.stn_coords(['DE110260', 'DE110250']) # returns coordinates of two stations ... # get area of a single station >>> dataset.area('DE110260') # get coordinates of two stations >>> dataset.area(['DE110260', 'DE110250']) ... # if fiona library is installed we can get the boundary as fiona Geometry >>> dataset.get_boundary('DE110260')
- __init__(path=None, overwrite: bool = False, to_netcdf: bool = True, verbosity: int = 1, **kwargs)[source]
- Parameters:
path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless
overwriteis set to True.overwrite (bool) – If the data is already down then you can set it to True, to make a fresh download.
to_netcdf (bool) – whether to convert all the data into one netcdf file or not. This will fasten repeated calls to fetch etc. but will require netCDF4 package as well as xarray.
- property dyn_map
A dictionary that maps dynamic features to their names in the dataset.
- property dynamic_features: List[str]
Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_dynamic_features().- Return type:
List[str]
- property end
end of data
- property static_features: List[str]
Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_static_features().- Return type:
List[str]
- class aqua_fetch.rr.CAMELS_DK(path=None, overwrite=False, to_netcdf: bool = True, **kwargs)[source]
Bases:
_RainfallRunoffThis is an updated version of :py class:aqua_fetch.rr.Caravan_DK dataset . This dataset was presented by Liu et al., 2024 and is available at dataverse . This dataset consists of 119 static and 13 dynamic features from 3330 Danish catchments. The dynamic (time series) features span from 1989-01-02 to 2023-12-31 with daily timestep. However, the streamflow observations are available for only 304 catchments.
Examples
>>> from aqua_fetch import CAMELS_DK >>> dataset = CAMELS_DK() ... # get data by station id >>> _, dynamic = dataset.fetch(stations='54130033', as_dataframe=True) >>> df = dynamic['54130033'] # dynamic is a dictionary of with keys as station names and values as DataFrames >>> df.shape (12782, 13) ... ... # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 304 ... # get data of 10 % of stations as dataframe >>> _, dynamic = dataset.fetch(0.1, as_dataframe=True) >>> len(dynamic) # dynamic has data for 10% of stations (30 out of 304) 30 ... ... # dynamic is a dictionary whose values are dataframes of dynamic features >>> [df.shape for df in dynamic.values()] [(12782, 13), (12782, 13), (12782, 13),... (12782, 13), (12782, 13)] ... ... get the data of a single (randomly selected) station >>> _, dynamic = dataset.fetch(stations=1, as_dataframe=True) >>> len(dynamic) # dynamic has data for 1 station 1 ... # get names of available dynamic features >>> dataset.dynamic_features ... # get only selected dynamic features >>> _, dynamic = dataset.fetch('54130033', as_dataframe=True, ... dynamic_features=['Abstraction', 'pet_mm', 'airtemp_C_mean', 'pcp_mm', 'q_cms_obs']) >>> dynamic['54130033'].shape (12782, 5) ... ... # get names of available static features >>> dataset.static_features ... # get data of 10 random stations >>> _, dynamic = dataset.fetch(10, as_dataframe=True) >>> len(dynamic) # remember this is a dictionary with values as dataframe 10 ... # If we get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='54130033', static_features="all", as_dataframe=True) >>> static.shape, len(dynamic), dynamic['54130033'].shape ((1, 119), 1, (12782, 13)) ... # If we don't set as_dataframe=True and have xarray installed then the returned data will be a xarray Dataset >>> _, dynamic = dataset.fetch(10) ... type(dynamic) xarray.core.dataset.Dataset ... >>> dynamic.dims FrozenMappingWarningOnValuesAccess({'time': 12782, 'dynamic_features': 13}) ... >>> len(dynamic.data_vars) 10 ... >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape (304, 2) >>> dataset.stn_coords('54130033') # returns coordinates of station whose id is 54130033 55.325242 9.93079 >>> dataset.stn_coords(['54130033', '13210113']) # returns coordinates of two stations ... # get area of a single station >>> dataset.area('54130033') # get coordinates of two stations >>> dataset.area(['54130033', '13210113']) ... # if fiona library is installed we can get the boundary as fiona Geometry >>> dataset.get_boundary('54130033')
- __init__(path=None, overwrite=False, to_netcdf: bool = True, **kwargs)[source]
- Parameters:
path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless
overwriteis set to True.overwrite (bool) – If the data is already down then you can set it to True, to make a fresh download.
to_netcdf (bool) – whether to convert all the data into one netcdf file or not. This will fasten repeated calls to fetch etc but will require netCDF4 package as well as xarray.
- property dyn_map
A dictionary that maps dynamic features to their names in the dataset.
- property end: Timestamp
end of data
- property static_map: Dict[str, str]
A dictionary that maps static features to their names in the dataset.
- class aqua_fetch.rr.CAMELS_FI(path=None, overwrite=False, to_netcdf: bool = True, **kwargs)[source]
Bases:
_RainfallRunoffDataset of 320 Finnish catchments with 16 dynamic features and 106 static features. The dynamic features span from 19610101 to 20231231 with daily timestep. The data is downloaded from Zenodo.
Examples
>>> from aqua_fetch import CAMELS_FI >>> dataset = CAMELS_FI() ... # get data by station id >>> _, dynamic = dataset.fetch(stations='1156', as_dataframe=True) >>> df = dynamic['1156'] # dynamic is a dictionary of with keys as station names and values as DataFrames >>> df.shape (23010, 16) ... ... # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 320 ... # get data of 10 % of stations as dataframe >>> _, dynamic = dataset.fetch(0.1, as_dataframe=True) >>> len(dynamic) # dynamic has data for 10% of stations (32) 32 ... ... # dynamic is a dictionary whose values are dataframes of dynamic features >>> [df.shape for df in dynamic.values()] [(23010, 16), (23010, 16), (23010, 16),... (23010, 16), (23010, 16)] ... ... get the data of a single (randomly selected) station >>> _, dynamic = dataset.fetch(stations=1, as_dataframe=True) >>> len(dynamic) # dynamic has data for 1 station 1 ... # get names of available dynamic features >>> dataset.dynamic_features ... # get only selected dynamic features >>> _, dynamic = dataset.fetch('1156', as_dataframe=True, ... dynamic_features=['pcp_mm', 'snowdepth_m', 'airtemp_C_mean', 'pet_mm', 'q_cms_obs']) >>> dynamic['1156'].shape (23010, 5) ... ... # get names of available static features >>> dataset.static_features ... # get data of 10 random stations >>> _, dynamic = dataset.fetch(10, as_dataframe=True) >>> len(dynamic) # remember this is a dictionary with values as dataframe 10 ... # If we get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='1156', static_features="all", as_dataframe=True) >>> static.shape, len(dynamic), dynamic['1156'].shape ((1, 106), 1, (23010, 5)) ... # If we don't set as_dataframe=True and have xarray installed then the returned data will be a xarray Dataset >>> _, dynamic = dataset.fetch(10) ... type(dynamic) xarray.core.dataset.Dataset ... >>> dynamic.dims FrozenMappingWarningOnValuesAccess({'time': 23010, 'dynamic_features': 16}) ... >>> len(dynamic.data_vars) # -> 10 ... >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape (320, 2) >>> dataset.stn_coords('1156') # returns coordinates of station whose id is 1156 62.253101 24.444099 >>> dataset.stn_coords(['1156', '1116']) # returns coordinates of two stations ... # get area of a single station >>> dataset.area('1156') # get coordinates of two stations >>> dataset.area(['1156', '1116']) ... # if fiona library is installed we can get the boundary as fiona Geometry >>> dataset.get_boundary('1156')
- __init__(path=None, overwrite=False, to_netcdf: bool = True, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str) – This can only be set for datasets which are available at multiple timesteps such as LamaHCE or LamaHIce etc.
to_netcdf (bool) – whether the data should be saved in netCDF format or not If set to true, the data will be saved in netCDF format which can take time for the first time it is created. However, it leads to faster I/O operations in subsequent accesses.
overwrite (bool) – whether to overwrite existing files or not. If set to True, the data will be redownloaded.
verbosity (int) –
This parameter determines the level of verbosity for logging messages.
0: no message will be printed
1: only important messages will be printed
>1: any higher value greater than 1 will result in more verbose output
kwargs – Any other keyword arguments for the parent
Datasetsclass
- property end: Timestamp
end of data
- property start: Timestamp
start of data
- class aqua_fetch.rr.CAMELS_FR(path=None, overwrite=False, **kwargs)[source]
Bases:
_RainfallRunoffDataset of 654 catchments from France following the works of Delaigue et al., 2024. The dataset consists of 344 static catchment features and 22 dynamic features. The dynamic features span from 1970101 to 20211231 with daily timestep.
Examples
>>> from aqua_fetch import CAMELS_FR >>> dataset = CAMELS_FR() ... # get data by station id >>> _, dynamic = dataset.fetch(stations='J421191001', as_dataframe=True) >>> df = dynamic['J421191001'] # dynamic is a dictionary of with keys as station names and values as DataFrames >>> df.shape (12782, 22) ... ... # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 654 ... # get data of 10 % of stations as dataframe >>> _, dynamic = dataset.fetch(0.1, as_dataframe=True) >>> len(dynamic) # dynamic has data for 10% of stations (65 out of 654) 65 ... ... # dynamic is a dictionary whose values are dataframes of dynamic features >>> [df.shape for df in dynamic.values()] [(12782, 22), (12782, 22), (12782, 22),... (12782, 22), (12782, 22)] ... ... get the data of a single (randomly selected) station >>> _, dynamic = dataset.fetch(stations=1, as_dataframe=True) >>> len(dynamic) # dynamic has data for 1 station 1 ... # get names of available dynamic features >>> dataset.dynamic_features ... # get only selected dynamic features >>> _, dynamic = dataset.fetch('J421191001', as_dataframe=True, ... dynamic_features=['pcp_mm', 'spechum_gkg', 'airtemp_C_mean', 'pet_mm_pm', 'q_cms_obs']) >>> dynamic['J421191001'].shape (12782, 5) ... ... # get names of available static features >>> dataset.static_features ... # get data of 10 random stations >>> _, dynamic = dataset.fetch(10, as_dataframe=True) >>> len(dynamic) # remember this is a dictionary with values as dataframe 10 ... # If we get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='J421191001', static_features="all", as_dataframe=True) >>> static.shape, len(dynamic), dynamic['J421191001'].shape ((1, 344), 1, (12782, 22)) ... # If we don't set as_dataframe=True and have xarray installed then the returned data will be a xarray Dataset >>> _, dynamic = dataset.fetch(10) ... type(dynamic) xarray.core.dataset.Dataset ... >>> dynamic.dims FrozenMappingWarningOnValuesAccess({'time': 12782, 'dynamic_features': 22}) ... >>> len(dynamic.data_vars) 10 ... >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape (654, 2) >>> dataset.stn_coords('J421191001') # returns coordinates of station whose id is J421191001 48.006298 -4.063848 >>> dataset.stn_coords(['J421191001', '802']) # returns coordinates of two stations ... # get area of a single station >>> dataset.area('J421191001') # get coordinates of two stations >>> dataset.area(['J421191001', '802']) ... # if fiona library is installed we can get the boundary as fiona Geometry >>> dataset.get_boundary('J421191001')
- __init__(path=None, overwrite=False, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str) – This can only be set for datasets which are available at multiple timesteps such as LamaHCE or LamaHIce etc.
to_netcdf (bool) – whether the data should be saved in netCDF format or not If set to true, the data will be saved in netCDF format which can take time for the first time it is created. However, it leads to faster I/O operations in subsequent accesses.
overwrite (bool) – whether to overwrite existing files or not. If set to True, the data will be redownloaded.
verbosity (int) –
This parameter determines the level of verbosity for logging messages.
0: no message will be printed
1: only important messages will be printed
>1: any higher value greater than 1 will result in more verbose output
kwargs – Any other keyword arguments for the parent
Datasetsclass
- property dyn_map: Dict[str, str]
A dictionary that maps dynamic features to their names in the dataset.
- property end: Timestamp
end of data
- static_attrs() DataFrame[source]
combination of topographic + soil + landuse + geology + climate + hydro + climate + anthropogenic features
- Returns:
a
pandas.DataFrameof static features of all catchments of shape (654, xxxx)- Return type:
pd.DataFrame
- property static_map: Dict[str, str]
A dictionary that maps static features to their names in the dataset.
- stations() List[str][source]
Names/ids of stations/catchment/gauges or whatever that would be used to index each station in the dataset.
- ts_attrs() DataFrame[source]
daily_timeseries statistics of all catchments
- Returns:
a
pandas.DataFrameof static features of all catchments of shape (654, xxxx)- Return type:
pd.DataFrame
- class aqua_fetch.rr.CAMELS_GB(path=None, **kwargs)[source]
Bases:
_RainfallRunoffThis is a dataset of 671 catchments with 145 static features and 10 dyanmic features for each catchment following the work of Coxon et al., 2020. The dyanmic features are timeseries from 1970-10-01 to 2015-09-30. The data is downloaded from ceh website
Examples
>>> from aqua_fetch import CAMELS_GB >>> dataset = CAMELS_GB() ... # get data by station id >>> _, dynamic = dataset.fetch(stations='38017', as_dataframe=True) >>> df = dynamic['38017'] # dynamic is a dictionary of with keys as station names and values as DataFrames >>> df.shape (26388, 28) ... ... # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 671 ... # get data of 10 % of stations as dataframe >>> _, dynamic = dataset.fetch(0.1, as_dataframe=True) >>> len(dynamic) # dynamic has data for 10% of stations (67 out of 671) 67 ... ... # dynamic is a dictionary whose values are dataframes of dynamic features >>> [df.shape for df in dynamic.values()] [(26388, 28), (26388, 28), (26388, 28),... (26388, 28), (26388, 28)] ... ... get the data of a single (randomly selected) station >>> _, dynamic = dataset.fetch(stations=1, as_dataframe=True) >>> len(dynamic) # dynamic has data for 1 station 1 ... # get names of available dynamic features >>> dataset.dynamic_features ... # get only selected dynamic features >>> _, dynamic = dataset.fetch('38017', as_dataframe=True, ... dynamic_features=['windspeed_mps', 'airtemp_C_mean', 'pet_mm', 'pcp_mm', 'q_cms_obs']) >>> dynamic['38017'].shape (26388, 4) ... ... # get names of available static features >>> dataset.static_features ... # get data of 10 random stations >>> _, dynamic = dataset.fetch(10, as_dataframe=True) >>> len(dynamic) # remember this is a dictionary with values as dataframe 10 ... # If we get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='38017', static_features="all", as_dataframe=True) >>> static.shape, len(dynamic), dynamic['38017'].shape ((1, 145), 1, (26388, 28)) ... # If we don't set as_dataframe=True and have xarray installed then the returned data will be a xarray Dataset >>> _, dynamic = dataset.fetch(10) ... type(dynamic) xarray.core.dataset.Dataset ... >>> dynamic.dims FrozenMappingWarningOnValuesAccess({'time': 26388, 'dynamic_features': 28}) ... >>> len(dynamic.data_vars) 10 ... >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape (671, 2) >>> dataset.stn_coords('38017') # returns coordinates of station whose id is 38017 51.880001 -0.28 >>> dataset.stn_coords(['38017', '42001']) # returns coordinates of two stations ... # get area of a single station >>> dataset.area('38017') # get coordinates of two stations >>> dataset.area(['38017', '42001']) ... # if fiona library is installed we can get the boundary as fiona Geometry >>> dataset.get_boundary('38017')
- __init__(path=None, **kwargs)[source]
- Parameters:
path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless
overwriteis set to True.
- property dyn_map
A dictionary that maps dynamic features to their names in the dataset.
- property dynamic_features: List[str]
Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_dynamic_features().- Return type:
List[str]
- property end
end of data
- property static_features: List[str]
Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_static_features().- Return type:
List[str]
- class aqua_fetch.CAMELS_IND(path=None, overwrite=False, to_netcdf: bool = True, **kwargs)[source]
Bases:
_RainfallRunoffDataset of 472 catchments from Republic of India following the works of Mangukiya et al., 2024. The dataset consists of 210 static catchment features and 20 dynamic features. The dynamic features span from 19800101 to 20201231 with daily timestep.
Examples
>>> from aqua_fetch import CAMELS_IND >>> dataset = CAMELS_IND() ... # get data by station id >>> _, dynamic = dataset.fetch(stations='3001', as_dataframe=True) >>> df = dynamic['3001'] # dynamic is a dictionary of with keys as station names and values as DataFrames >>> df.shape (14976, 20) ... ... # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 472 ... # get data of 10 % of stations as dataframe >>> _, dynamic = dataset.fetch(0.1, as_dataframe=True) >>> len(dynamic) # dynamic has data for 10% of stations (47 out of 472) 47 ... ... # dynamic is a dictionary whose values are dataframes of dynamic features >>> [df.shape for df in dynamic.values()] [(14976, 20), (14976, 20), (14976, 20),... (14976, 20), (14976, 20)] ... ... get the data of a single (randomly selected) station >>> _, dynamic = dataset.fetch(stations=1, as_dataframe=True) >>> len(dynamic) # dynamic has data for 1 station 1 ... # get names of available dynamic features >>> dataset.dynamic_features ... # get only selected dynamic features >>> _, dynamic = dataset.fetch('3001', as_dataframe=True, ... dynamic_features=['pcp_mm', 'rh_%', 'airtemp_C_mean', 'pet_mm', 'q_cms_obs']) >>> dynamic['3001'].shape (14976, 5) ... ... # get names of available static features >>> dataset.static_features ... # get data of 10 random stations >>> _, dynamic = dataset.fetch(10, as_dataframe=True) >>> len(dynamic) # remember this is a dictionary with values as dataframe 10
# If we get both static and dynamic data >>> static, dynamic = dataset.fetch(stations=’3001’, static_features=”all”, as_dataframe=True) >>> static.shape, len(dynamic), dynamic[‘3001’].shape ((1, 210), 1, (14976, 20)) … # If we don’t set as_dataframe=True and have xarray installed then the returned data will be a xarray Dataset >>> _, dynamic = dataset.fetch(10) … type(dynamic) xarray.core.dataset.Dataset … >>> dynamic.dims FrozenMappingWarningOnValuesAccess({‘time’: 14976, ‘dynamic_features’: 20}) … >>> len(dynamic.data_vars) 10 … >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape
(472, 2)
>>> dataset.stn_coords('3001') # returns coordinates of station whose id is 3001 48.006298 -4.063848 >>> dataset.stn_coords(['3001', '17021']) # returns coordinates of two stations ... # get area of a single station >>> dataset.area('3001') # get coordinates of two stations >>> dataset.area(['3001', '17021']) ... # if fiona library is installed we can get the boundary as fiona Geometry >>> dataset.get_boundary('3001')
- __init__(path=None, overwrite=False, to_netcdf: bool = True, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str) – This can only be set for datasets which are available at multiple timesteps such as LamaHCE or LamaHIce etc.
to_netcdf (bool) – whether the data should be saved in netCDF format or not If set to true, the data will be saved in netCDF format which can take time for the first time it is created. However, it leads to faster I/O operations in subsequent accesses.
overwrite (bool) – whether to overwrite existing files or not. If set to True, the data will be redownloaded.
verbosity (int) –
This parameter determines the level of verbosity for logging messages.
0: no message will be printed
1: only important messages will be printed
>1: any higher value greater than 1 will result in more verbose output
kwargs – Any other keyword arguments for the parent
Datasetsclass
- property dyn_map
A dictionary that maps dynamic features to their names in the dataset.
- property end: Timestamp
end of data
- class aqua_fetch.rr.CAMELS_LUX(path=None, timestep: str = 'D', overwrite=False, to_netcdf: bool = True, **kwargs)[source]
Bases:
_RainfallRunoffDataset of 56 catchments from Luxembourg following the work of Nijzink et al., 2025. The dataset consists of 61 static catchment features and 25 dynamic features. The dynamic features span from 20040101 to 20211231 with daily, hourly, and 15-minute timesteps. The data is downloaded from Zenodo.
Examples
>>> from aqua_fetch import CAMELS_LUX >>> dataset = CAMELS_LUX() ... # get data by station id >>> _, dynamic = dataset.fetch(stations='ID_02', as_dataframe=True) >>> df = dynamic['ID_02'] # dynamic is a dictionary of with keys as station names and values as DataFrames >>> df.shape (6209, 25) ... ... # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 56 ... # get data of 10 % of stations as dataframe >>> _, dynamic = dataset.fetch(0.1, as_dataframe=True) >>> len(dynamic) # dynamic has data for 10% of stations (5) 5 ... ... # dynamic is a dictionary whose values are dataframes of dynamic features >>> [df.shape for df in dynamic.values()] [(6209, 25), (6209, 25), (6209, 25),... (6209, 25), (6209, 25)] ... ... get the data of a single (randomly selected) station >>> _, dynamic = dataset.fetch(stations=1, as_dataframe=True) >>> len(dynamic) # dynamic has data for 1 station 1 ... # get names of available dynamic features >>> dataset.dynamic_features ... # get only selected dynamic features >>> _, dynamic = dataset.fetch('ID_02', as_dataframe=True, ... dynamic_features=['pcp_mm_station', 'rh_%', 'airtemp_C_mean', 'pet_mm_pm', 'q_cms_obs']) >>> dynamic['ID_02'].shape (6209, 5) ... ... # get names of available static features >>> dataset.static_features ... # get data of 10 random stations >>> _, dynamic = dataset.fetch(10, as_dataframe=True) >>> len(dynamic) # remember this is a dictionary with values as dataframe 10 ... # If we get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='ID_02', static_features="all", as_dataframe=True) >>> static.shape, len(dynamic), dynamic['ID_02'].shape ((1, 61), 1, (6209, 25)) ... # If we don't set as_dataframe=True and have xarray installed then the returned data will be a xarray Dataset >>> _, dynamic = dataset.fetch(10) ... type(dynamic) xarray.core.dataset.Dataset ... >>> dynamic.dims FrozenMappingWarningOnValuesAccess({'time': 6209, 'dynamic_features': 25}) ... >>> len(dynamic.data_vars) 10 ... >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape (56, 2) >>> dataset.stn_coords('ID_02') # returns coordinates of station whose id is ID_02 49.586288 6.14908 >>> dataset.stn_coords(['ID_02', 'ID_01']) # returns coordinates of two stations ... # get area of a single station >>> dataset.area('ID_02') # get coordinates of two stations >>> dataset.area(['ID_02', 'ID_01']) ... # if fiona library is installed we can get the boundary as fiona Geometry >>> dataset.get_boundary('ID_02') ... # if we want to get hourly data we can do as below >>> dataset = CAMELS_LUX(timestep='H') >>> _, dynamic = dataset.fetch(stations='ID_02', as_dataframe=True) >>> df.shape (149016, 25) ... # if we want to get 15Min data we can do as below >>> dataset = CAMELS_LUX(timestep='15Min') >>> _, dynamic = dataset.fetch(stations='ID_02', as_dataframe=True) >>> df.shape (596061, 25)
- __init__(path=None, timestep: str = 'D', overwrite=False, to_netcdf: bool = True, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str) – This can only be set for datasets which are available at multiple timesteps such as LamaHCE or LamaHIce etc.
to_netcdf (bool) – whether the data should be saved in netCDF format or not If set to true, the data will be saved in netCDF format which can take time for the first time it is created. However, it leads to faster I/O operations in subsequent accesses.
overwrite (bool) – whether to overwrite existing files or not. If set to True, the data will be redownloaded.
verbosity (int) –
This parameter determines the level of verbosity for logging messages.
0: no message will be printed
1: only important messages will be printed
>1: any higher value greater than 1 will result in more verbose output
kwargs – Any other keyword arguments for the parent
Datasetsclass
- property dynamic_features: List[str]
Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_dynamic_features().- Return type:
List[str]
- property end: Timestamp
end of data
- property static_features: List[str]
Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_static_features().- Return type:
List[str]
- class aqua_fetch.rr.CAMELS_NZ(path: str | PathLike = None, **kwargs)[source]
Bases:
_RainfallRunoffDataset of 369 catchments from New Zealand following the works of Harrigan et al., 2025. The dataset consists of 40 static catchment features and 5 dynamic features. The dynamic features span from 19720101 to 20240802 with hourly timestep. The data is downloaded from figshare. This data comes with daily and hourly timesteps and the each can be accessed by specifying value of tiemstep argument to
DorHrespectively during initialization.Examples
>>> from aqua_fetch import CAMELS_NZ >>> dataset = CAMELS_NZ() ... # get data by station id >>> _, dynamic = dataset.fetch(stations='74321', as_dataframe=True) >>> df = dynamic['74321'] # dynamic is a dictionary of with keys as station names and values as DataFrames >>> df.shape (19208, 5) ... ... # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 347 ... # get data of 10 % of stations as dataframe >>> _, dynamic = dataset.fetch(0.1, as_dataframe=True) >>> len(dynamic) # dynamic has data for 10% of stations (34 out of 347) 34 ... ... # dynamic is a dictionary whose values are dataframes of dynamic features >>> [df.shape for df in dynamic.values()] [(19208, 5), (19208, 5), (19208, 5),... (19208, 5), (19208, 5)] ... ... get the data of a single (randomly selected) station >>> _, dynamic = dataset.fetch(stations=1, as_dataframe=True) >>> len(dynamic) # dynamic has data for 1 station 1 ... # get names of available dynamic features >>> dataset.dynamic_features ... # get only selected dynamic features >>> _, dynamic = dataset.fetch('74321', as_dataframe=True, ... dynamic_features=['pcp_mm', 'rh_%', 'airtemp_C_mean', 'pet_mm', 'q_cms_obs']) >>> dynamic['74321'].shape (19208, 4) ... ... # get names of available static features >>> dataset.static_features ... # get data of 10 random stations >>> _, dynamic = dataset.fetch(10, as_dataframe=True) >>> len(dynamic) # remember this is a dictionary with values as dataframe 10 ... # If we get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='74321', static_features="all", as_dataframe=True) >>> static.shape, len(dynamic), dynamic['74321'].shape ((1, 40), 1, (19208, 5)) ... # If we don't set as_dataframe=True and have xarray installed then the returned data will be a xarray Dataset >>> _, dynamic = dataset.fetch(10) ... type(dynamic) xarray.core.dataset.Dataset ... >>> dynamic.dims FrozenMappingWarningOnValuesAccess({'time': 19208, 'dynamic_features': 5}) ... >>> len(dynamic.data_vars) 10 ... >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape (347, 2) >>> dataset.stn_coords('74321') # returns coordinates of station whose id is 74321 -45.945599 170.101486 >>> dataset.stn_coords(['74321', '802']) # returns coordinates of two stations ... # get area of a single station >>> dataset.area('74321') # get coordinates of two stations >>> dataset.area(['74321', '802']) ... # if fiona library is installed we can get the boundary as fiona Geometry >>> dataset.get_boundary('74321') # The hourly data can be accessed by specifyng the timestep to 'H' >>> dataset = CAMELS_NZ(timestep='H') ... # get data by station id >>> _, dynamic = dataset.fetch(stations='74321', as_dataframe=True) >>> df = dynamic['74321'] # dynamic is a dictionary of with keys as station names and values as DataFrames >>> df.shape (460928, 5)
- __init__(path: str | PathLike = None, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str) – This can only be set for datasets which are available at multiple timesteps such as LamaHCE or LamaHIce etc.
to_netcdf (bool) – whether the data should be saved in netCDF format or not If set to true, the data will be saved in netCDF format which can take time for the first time it is created. However, it leads to faster I/O operations in subsequent accesses.
overwrite (bool) – whether to overwrite existing files or not. If set to True, the data will be redownloaded.
verbosity (int) –
This parameter determines the level of verbosity for logging messages.
0: no message will be printed
1: only important messages will be printed
>1: any higher value greater than 1 will result in more verbose output
kwargs – Any other keyword arguments for the parent
Datasetsclass
- property dyn_map: Dict[str, str]
A dictionary that maps dynamic features to their names in the dataset.
- property end: Timestamp
end of data
- class aqua_fetch.rr.CAMELS_SE(path: str = None, to_netcdf: bool = True, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]
Bases:
_RainfallRunoffDataset of 50 Swedish catchments following the works of Teutschbein et al., 2024 . The data is downloaded from Swedish National Data Service website . The dataset consists of 76 static catchment features and 4 dynamic features. The dynamic features span from 19610101 to 20201231 with daily timestep.
Examples
>>> from aqua_fetch import CAMELS_SE >>> dataset = CAMELS_SE() ... # get data by station id >>> _, dynamic = dataset.fetch(stations='5', as_dataframe=True) >>> df = dynamic['5'] # dynamic is a dictionary of with keys as station names and values as DataFrames >>> df.shape (21915, 4) ... ... # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 50 ... # get data of 10 % of stations as dataframe >>> _, dynamic = dataset.fetch(0.1, as_dataframe=True) >>> len(dynamic) # dynamic has data for 10% of stations (5 out of 50) 5 ... ... # dynamic is a dictionary whose values are dataframes of dynamic features >>> [df.shape for df in dynamic.values()] [(21915, 4), (21915, 4), (21915, 4),... (21915, 4), (21915, 4)] ... ... get the data of a single (randomly selected) station >>> _, dynamic = dataset.fetch(stations=1, as_dataframe=True) >>> len(dynamic) # dynamic has data for 1 station 1 ... # get names of available dynamic features >>> dataset.dynamic_features ... # get only selected dynamic features >>> _, dynamic = dataset.fetch('5', as_dataframe=True, ... dynamic_features=['q_cms_obs', 'q_mm_obs', 'pcp_mm', 'airtemp_C_mean']) >>> dynamic['5'].shape (21915, 5) ... ... # get names of available static features >>> dataset.static_features ... # get data of 10 random stations >>> _, dynamic = dataset.fetch(10, as_dataframe=True) >>> len(dynamic) # remember this is a dictionary with values as dataframe 10 ... # If we get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='5', static_features="all", as_dataframe=True) >>> static.shape, len(dynamic), dynamic['5'].shape ((1, 76), 1, (21915, 4)) ... # If we don't set as_dataframe=True and have xarray installed then the returned data will be a xarray Dataset >>> _, dynamic = dataset.fetch(10) ... type(dynamic) xarray.core.dataset.Dataset ... >>> dynamic.dims FrozenMappingWarningOnValuesAccess({'time': 21915, 'dynamic_features': 4}) ... >>> len(dynamic.data_vars) 10 ... >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape (50, 2) >>> dataset.stn_coords('5') # returns coordinates of station whose id is 5 68.0356 21.9758 >>> dataset.stn_coords(['5', '200']) # returns coordinates of two stations ... # get area of a single station >>> dataset.area('5') # get coordinates of two stations >>> dataset.area(['5', '200']) ... # if fiona library is installed we can get the boundary as fiona Geometry >>> dataset.get_boundary('5')
- __init__(path: str = None, to_netcdf: bool = True, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]
- Parameters:
path – path where the CAMELS_SE dataset has been downloaded. This path must contain five zip files and one xlsx file. If None, then the data will be downloaded.
to_netcdf
- property dyn_map
A dictionary that maps dynamic features to their names in the dataset.
- property dynamic_features: List[str]
Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_dynamic_features().- Return type:
List[str]
- property end
end of data
- property static_features
Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_static_features().- Return type:
List[str]
- class aqua_fetch.rr.CAMELS_SK(path=None, timestep: str = 'H', to_netcdf: bool = True, **kwargs)[source]
Bases:
_RainfallRunoffDataset of 178 catchments from South Korea following the work of Kim et al., 2025. The dataset consists of 215 static catchment features and 17 dynamic features. The dynamic features span from 20000101 to 20191231 with hourly timestep.
Examples
>>> from aqua_fetch import CAMELS_SK >>> dataset = CAMELS_SK() ... # get data by station id >>> _, dynamic = dataset.fetch(stations='2013615', as_dataframe=True) >>> df = dynamic['2013615'] # dynamic is a dictionary of with keys as station names and values as DataFrames >>> df.shape (175320, 17) ... ... # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 178 ... # get data of 10 % of stations as dataframe >>> _, dynamic = dataset.fetch(0.1, as_dataframe=True) >>> len(dynamic) # dynamic has data for 10% of stations (17 out of 178) 17 ... ... # dynamic is a dictionary whose values are dataframes of dynamic features >>> [df.shape for df in dynamic.values()] [(175320, 17), (175320, 17), (175320, 17),... (175320, 17), (175320, 17)] ... ... get the data of a single (randomly selected) station >>> _, dynamic = dataset.fetch(stations=1, as_dataframe=True) >>> len(dynamic) # dynamic has data for 1 station 1 ... # get names of available dynamic features >>> dataset.dynamic_features ... # get only selected dynamic features >>> _, dynamic = dataset.fetch('2013615', as_dataframe=True, ... dynamic_features=['total_precipitation', 'snow_depth', 'air_temp_obs', 'potential_evaporation', 'q_cms_obs']) >>> dynamic['2013615'].shape (175320, 17) ... ... # get names of available static features >>> dataset.static_features ... # get data of 10 random stations >>> _, dynamic = dataset.fetch(10, as_dataframe=True) >>> len(dynamic) # remember this is a dictionary with values as dataframe 10 ... # If we get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='2013615', static_features="all", as_dataframe=True) >>> static.shape, len(dynamic), dynamic['2013615'].shape ((1, 215), 1, (175320, 17)) ... # If we don't set as_dataframe=True and have xarray installed then the returned data will be a xarray Dataset >>> _, dynamic = dataset.fetch(10) ... type(dynamic) xarray.core.dataset.Dataset ... >>> dynamic.dims FrozenMappingWarningOnValuesAccess({'time': 175320, 'dynamic_features': 17}) ... >>> len(dynamic.data_vars) 10 ... >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape (178, 2) >>> dataset.stn_coords('2013615') # returns coordinates of station whose id is 2013615 35.880798 128.173096 >>> dataset.stn_coords(['2013615', '2017620']) # returns coordinates of two stations ... # get area of a single station >>> dataset.area('2013615') # get coordinates of two stations >>> dataset.area(['2013615', '2017620']) ... # if fiona library is installed we can get the boundary as fiona Geometry >>> dataset.get_boundary('2013615')
- __init__(path=None, timestep: str = 'H', to_netcdf: bool = True, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str) – This can only be set for datasets which are available at multiple timesteps such as LamaHCE or LamaHIce etc.
to_netcdf (bool) – whether the data should be saved in netCDF format or not If set to true, the data will be saved in netCDF format which can take time for the first time it is created. However, it leads to faster I/O operations in subsequent accesses.
overwrite (bool) – whether to overwrite existing files or not. If set to True, the data will be redownloaded.
verbosity (int) –
This parameter determines the level of verbosity for logging messages.
0: no message will be printed
1: only important messages will be printed
>1: any higher value greater than 1 will result in more verbose output
kwargs – Any other keyword arguments for the parent
Datasetsclass
- property dynamic_features: List[str]
Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_dynamic_features().- Return type:
List[str]
- property end: Timestamp
end of data
- property start: Timestamp
start of data
- property static_features: List[str]
Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_static_features().- Return type:
List[str]
- class aqua_fetch.rr.CAMELS_US(path: str | PathLike = None, data_source: str = 'daymet', **kwargs)[source]
Bases:
_RainfallRunoffThis is a dataset of 671 US catchments with 59 static catchment features and 8 catchment averaged dynamic features for each catchment. The dynamic features are daily timeseries from 1980-01-01 to 2014-12-31. The data is downloaded from its zenodo repository . For more details on data refer to Newman et al., 2015 , Newman et al., 2022 and Addor et al., 2017.
Please note this data is also known as “CAMELS” however, we have named it CAMELS_US to differentiate it from other CAMELS like datasts from other parts of the world.
Examples
>>> from aqua_fetch import CAMELS_US >>> dataset = CAMELS_US() ... # get data by station id >>> _, dynamic = dataset.fetch(stations='11478500', as_dataframe=True) >>> df = dynamic['11478500'] # dynamic is a dictionary of with keys as station names and values as DataFrames >>> df.shape (12784, 8) ... ... # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 671 ... # get data of 10 % of stations as dataframe >>> _, dynamic = dataset.fetch(0.1, as_dataframe=True) >>> len(dynamic) # dynamic has data for 10% of stations (67 out of 671) 67 ... ... # dynamic is a dictionary whose values are dataframes of dynamic features >>> [df.shape for df in dynamic.values()] [(12784, 8), (12784, 8), (12784, 8),... (12784, 8), (12784, 8)] ... ... get the data of a single (randomly selected) station >>> _, dynamic = dataset.fetch(stations=1, as_dataframe=True) >>> len(dynamic) # dynamic has data for 1 station 1 ... # get names of available dynamic features >>> dataset.dynamic_features ... # get only selected dynamic features >>> _, dynamic = dataset.fetch('11478500', as_dataframe=True, ... dynamic_features=['pcp_mm', 'solrad_wm2', 'airtemp_C_max', 'airtemp_C_min', 'q_cms_obs']) >>> dynamic['11478500'].shape (12784, 5) ... ... # get names of available static features >>> dataset.static_features ... # get data of 10 random stations >>> _, dynamic = dataset.fetch(10, as_dataframe=True) >>> len(dynamic) # remember this is a dictionary with values as dataframe 10 ... # If we get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='11478500', static_features="all", as_dataframe=True) >>> static.shape, len(dynamic), dynamic['11478500'].shape ((1, 59), 1, (12784, 8)) ... # If we don't set as_dataframe=True and have xarray installed then the returned data will be a xarray Dataset >>> _, dynamic = dataset.fetch(10) ... type(dynamic) xarray.core.dataset.Dataset ... >>> dynamic.dims FrozenMappingWarningOnValuesAccess({'time': 12784, 'dynamic_features': 8}) ... >>> len(dynamic.data_vars) 10 ... >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape (671, 2) >>> dataset.stn_coords('11478500') # returns coordinates of station whose id is 11478500 40.480419 -123.890877 >>> dataset.stn_coords(['11478500', '14020000']) # returns coordinates of two stations ... # get area of a single station >>> dataset.area('11478500') # get coordinates of two stations >>> dataset.area(['11478500', '14020000']) ... # if fiona library is installed we can get the boundary as fiona Geometry >>> dataset.get_boundary('11478500')
- __init__(path: str | PathLike = None, data_source: str = 'daymet', **kwargs)[source]
- Parameters:
path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless
overwriteis set to True.data_source (str) –
source of meteorological timeseries data. Allowed values are
daymet
maurer
nldas
v1p15_daymet
v1p15_nldas
- property dyn_map
A dictionary that maps dynamic features to their names in the dataset.
- property dynamic_features: List[str]
Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_dynamic_features().- Return type:
List[str]
- property end
end of data
- property static_features: List[str]
Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_static_features().- Return type:
List[str]
- class aqua_fetch.rr.Caravan_DK(path=None, overwrite=False, to_netcdf: bool = True, **kwargs)[source]
Bases:
_RainfallRunoffReads Caravan extension Denmark - Danish dataset for large-sample hydrology following the works of Koch and Schneider 2022 . The dataset is downloaded from zenodo . This dataset consists of static and dynamic features from 308 danish catchments. There are 38 dynamic (time series) features from 1981-01-02 to 2020-12-31 with daily timestep and 211 static features for each of 308 catchments.
Please note that there is an updated version of this dataset following the works of Liu et al., 2024 . This dataset is associated with the
aqua_fetch.CAMELS_DKclass which can be imported as follows:>>> from aqua_fetch import CAMELS_DK
Examples
>>> from aqua_fetch import Caravan_DK >>> dataset = Caravan_DK() ... # get data by station id >>> _, dynamic = dataset.fetch(stations='80001', as_dataframe=True) >>> df = dynamic['80001'] # dynamic is a dictionary of with keys as station names and values as DataFrames >>> df.shape (14609, 39) ... ... # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 308 ... # get data of 10 % of stations as dataframe >>> _, dynamic = dataset.fetch(0.1, as_dataframe=True) >>> len(dynamic) # dynamic has data for 10% of stations (31 out of 308) 31 ... ... # dynamic is a dictionary whose values are dataframes of dynamic features >>> [df.shape for df in dynamic.values()] [(14609, 39), (14609, 39), (14609, 39),... (14609, 39), (14609, 39)] ... ... get the data of a single (randomly selected) station >>> _, dynamic = dataset.fetch(stations=1, as_dataframe=True) >>> len(dynamic) # dynamic has data for 1 station 1 ... # get names of available dynamic features >>> dataset.dynamic_features ... # get only selected dynamic features >>> _, dynamic = dataset.fetch('80001', as_dataframe=True, ... dynamic_features=['snow_depth_water_equivalent_mean', 'temperature_2m_mean', 'q_cms_obs']) >>> dynamic['80001'].shape (14609, 3) ... ... # get names of available static features >>> dataset.static_features ... # get data of 10 random stations >>> _, dynamic = dataset.fetch(10, as_dataframe=True) >>> len(dynamic) # remember this is a dictionary with values as dataframe 10 ... # If we get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='80001', static_features="all", as_dataframe=True) >>> static.shape, len(dynamic), dynamic['80001'].shape ((1, 211), 1, (14609, 39)) ... # If we don't set as_dataframe=True and have xarray installed then the returned data will be a xarray Dataset >>> _, dynamic = dataset.fetch(10) ... type(dynamic) xarray.core.dataset.Dataset ... >>> dynamic.dims FrozenMappingWarningOnValuesAccess({'time': 14609, 'dynamic_features': 39}) ... >>> len(dynamic.data_vars) 10 ... >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape (308, 2) >>> dataset.stn_coords('80001') # returns coordinates of station whose id is 80001 57.10371 10.3516 >>> dataset.stn_coords(['80001', '240001']) # returns coordinates of two stations ... # get area of a single station >>> dataset.area('80001') # get coordinates of two stations >>> dataset.area(['80001', '240001']) ... # if fiona library is installed we can get the boundary as fiona Geometry >>> dataset.get_boundary('80001')
- __init__(path=None, overwrite=False, to_netcdf: bool = True, **kwargs)[source]
- Parameters:
path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless
overwriteis set to True.overwrite (bool) – If the data is already down then you can set it to True, to make a fresh download.
to_netcdf (bool) – whether to convert all the data into one netcdf file or not. This will fasten repeated calls to fetch etc but will require netCDF4 package as well as xarry.
- property boundary_id_map: str
Name of the attribute in the boundary (shapefile/.gpkg) file that will be used to map the catchment/station id to the geometry of the catchment/station. This is used to create the boundary id map. if not given, then the first attribute in the boundary file will be used.
- property caravan_attr_fpath
returns path to attributes_caravan_camelsdk.csv file
- caravan_static_attributes(stations='all') DataFrame[source]
- Return type:
a
pandas.DataFrameof shape (308, 10)
- property dyn_map: Dict[str, str]
A dictionary that maps dynamic features to their names in the dataset.
- property end: Timestamp
end of data
- hyd_atlas_attributes(stations='all') DataFrame[source]
- Return type:
a
pandas.DataFrameof shape (308, 196)
- property other_attr_fpath
returns path to attributes_other_camelsdk.csv file
- other_static_attributes(stations='all') DataFrame[source]
- Return type:
a
pandas.DataFrameof shape (308, 5)
- property static_map: Dict[str, str]
A dictionary that maps static features to their names in the dataset.
- stations() List[str][source]
Names/ids of stations/catchment/gauges or whatever that would be used to index each station in the dataset.
- stn_coords(stations: str | List[str] = 'all') DataFrame[source]
returns coordinates of stations as DataFrame with
longandlatas columns.- Parameters:
stations – name/names of stations. If not given, coordinates of all stations will be returned.
- Returns:
pandas.DataFramewithlongandlatcolumns. The length of dataframe will be equal to number of stations wholse coordinates are to be fetched.- Return type:
pd.DataFrame
Examples
>>> dataset = Caravan_DK() >>> dataset.stn_coords() # returns coordinates of all stations >>> dataset.stn_coords('100010') # returns coordinates of station whose id is 912101A >>> dataset.stn_coords(['100010', '210062']) # returns coordinates of two stations
- class aqua_fetch.rr.CCAM(path=None, overwrite: bool = False, to_netcdf: bool = True, **kwargs)[source]
Bases:
_RainfallRunoffDataset for Yellow River (China) catchments. The CCAM dataset was published by Hao et al., 2021 and has two sets. One set consists of catchment attributes, meteorological data, catchment boundaries of over 4000 catchments. However this data does not have streamflow data. The second set consists of streamflow, catchment attributes, catchment boundaries and meteorological data for 102 catchments of Yellow River. Since this second set conforms to the norms of CAMELS, this class uses this second set. Therefore, the
fetch,stationsand other methods/attributes of this class return data of only Yellow River catchments and not for whole china. However, the first set of data is can also be fetched using fetch_meteo method of this class. The temporal extent of both sets is from 1999 to 2020. However, the streamflow time series in first set has very large number of missing values. The data of Yellow river consists fo 16 dynamic features (time series) and 124 static features (catchment attributes).Examples
>>> from aqua_fetch import CCAM >>> dataset = CCAM() ... # get data by station id >>> _, dynamic = dataset.fetch(stations='0010', as_dataframe=True) >>> df = dynamic['0010'] # dynamic is a dictionary of with keys as station names and values as DataFrames >>> df.shape (8035, 16) ... ... # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 102 ... # get data of 10 % of stations as dataframe >>> _, dynamic = dataset.fetch(0.1, as_dataframe=True) >>> len(dynamic) # dynamic has data for 10% of stations (10 out of 102) 10 ... ... # dynamic is a dictionary whose values are dataframes of dynamic features >>> [df.shape for df in dynamic.values()] [(8035, 16), (8035, 16), (8035, 16),... (8035, 16), (8035, 16)] ... ... get the data of a single (randomly selected) station >>> _, dynamic = dataset.fetch(stations=1, as_dataframe=True) >>> len(dynamic) # dynamic has data for 1 station 1 ... # get names of available dynamic features >>> dataset.dynamic_features ... # get only selected dynamic features >>> _, dynamic = dataset.fetch('0010', as_dataframe=True, ... dynamic_features=['pcp_mm', 'airtemp_C_mean', 'evap_mm', 'rh_%', 'q_cms_obs']) >>> dynamic['0010'].shape (8035, 5) ... ... # get names of available static features >>> dataset.static_features ... # get data of 10 random stations >>> _, dynamic = dataset.fetch(10, as_dataframe=True) >>> len(dynamic) # remember this is a dictionary with values as dataframe 10 ... # If we get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='0010', static_features="all", as_dataframe=True) >>> static.shape, len(dynamic), dynamic['0010'].shape ((1, 124), 1, (8035, 8)) ... # If we don't set as_dataframe=True and have xarray installed then the returned data will be a xarray Dataset >>> _, dynamic = dataset.fetch(10) ... type(dynamic) xarray.core.dataset.Dataset ... >>> dynamic.dims FrozenMappingWarningOnValuesAccess({'time': 8035, 'dynamic_features': 16}) ... >>> len(dynamic.data_vars) 10 ... >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape (102, 2) >>> dataset.stn_coords('0010') # returns coordinates of station whose id is 0010 36.059803 112.3638 >>> dataset.stn_coords(['0010', '0104']) # returns coordinates of two stations ... # get area of a single station >>> dataset.area('0010') # get coordinates of two stations >>> dataset.area(['0010', '0104']) ... # if fiona library is installed we can get the boundary as fiona Geometry >>> dataset.get_boundary('0010')
- __init__(path=None, overwrite: bool = False, to_netcdf: bool = True, **kwargs)[source]
- Parameters:
path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless
overwriteis set to True.overwrite (bool) – If the data is already down then you can set it to True, to make a fresh download.
to_netcdf (bool) – whether to convert all the data into one netcdf file or not. This will fasten repeated calls to fetch etc but will require netCDF4 package as well as xarry.
- property dyn_map
A dictionary that maps dynamic features to their names in the dataset.
- property dynamic_features: List[str]
names of hydro-meteorological time series data for Yellow River catchments
- property end
end of data
- fetch_meteo(station: str | List[str] = 'all', features: str | List[str] = 'all', st='1990-01-01', en='2021-03-31', as_dataframe: bool = True)[source]
fetches meteorological data of 4902 chinese catchments
Examples
>>> from aqua_fetch import CCAM >>> dataset = CCAM() >>> dynamic_features = ['PRE', 'TEM', 'PRS', 'RHU', 'EVP', 'WIN', 'PET'] >>> st = '1999-01-01' >>> en = '2020-03-31' >>> xds = dataset.fetch_meteo(features=features, st=st, en=en)
- property meteo_path
path where daily meteorological data of stations is present
- class aqua_fetch.rr.Finland(path: str | PathLike = None, estreams_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
Bases:
_EStreamsData of 669 catchments of Finland. The observed streamflow data is downloaded from https://wwwi3.ymparisto.fi . The meteorological data, stattic catchment features and catchment boundaries are taken from
aqua_fetch.EStreamsfollwoing the works of Nascimento et al., 2024 . Therefore, the number of static features are 214 and dynamic features are 10 and the data is available from 2012-01-01 to 2023-06-30.Examples
>>> from aqua_fetch import Finland >>> dataset = Finland() >>> _, data = dataset.fetch(0.1) # the returned data will be a xarray Dataset >>> type(data) xarray.core.dataset.Dataset >>> data.dims FrozenMappingWarningOnValuesAccess({'time': 4199, 'dynamic_features': 10}) >>> len(data.data_vars) # number of stations for which data has been fetched 66 >>> _, data = dataset.fetch(stations=1) # get data of only one random station # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 669 # get data by station id >>> _, data = dataset.fetch(stations='FI000001') # get names of available dynamic features >>> dataset.dynamic_features # get only selected dynamic features >>> _, data = dataset.fetch(1, ... dynamic_features=['pcp_mm', 'rh_%', 'airtemp_C_mean', 'pet_mm', 'q_cms_obs']) # get names of available static features >>> dataset.static_features # get data of 10 random stations >>> _, data = dataset.fetch(10) >>> len(data.data_vars) 10 # If we want to get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='FI000001', static_features="all") >>> static.shape, len(dynamic.data_vars) ((1, 214), 1) >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape (669, 2) >>> dataset.stn_coords('FI000001') # returns coordinates of station whose id is FI000001 64.226288 27.736528 >>> dataset.stn_coords(['FI000001', 'FI000002']) # returns coordinates of two stations FI000001 64.226288 27.736528 FI000002 64.226288 27.736528
- __init__(path: str | PathLike = None, estreams_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str) – This can only be set for datasets which are available at multiple timesteps such as LamaHCE or LamaHIce etc.
to_netcdf (bool) – whether the data should be saved in netCDF format or not If set to true, the data will be saved in netCDF format which can take time for the first time it is created. However, it leads to faster I/O operations in subsequent accesses.
overwrite (bool) – whether to overwrite existing files or not. If set to True, the data will be redownloaded.
verbosity (int) –
This parameter determines the level of verbosity for logging messages.
0: no message will be printed
1: only important messages will be printed
>1: any higher value greater than 1 will result in more verbose output
kwargs – Any other keyword arguments for the parent
Datasetsclass
- fetch_q(as_dataframe: bool = True, overwrite: bool = False)[source]
downloads (if not already downloaded) and returns the daily streamflow data of Finland. either as
pandas.DataFrameor as xarray dataset.
- class aqua_fetch.rr.GRDCCaravan(path=None, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]
Bases:
_RainfallRunoffThis is a dataset of 5357 catchments from around the globe following the works of Faerber et al., 2023 . The dataset consists of 39 dynamic (timeseries) features and 211 static features. The dynamic (timeseries) data spands from 1950-01-02 to 2019-05-19.
if xarray + netCDF4 packages are installed then netcdf files will be downloaded otherwise csv files will be downloaded and used.
Examples
>>> from aqua_fetch import GRDCCaravan >>> dataset = GRDCCaravan() ... # get data by station id >>> _, dynamic = dataset.fetch(stations='GRDC_3664802', as_dataframe=True) >>> df = dynamic['GRDC_3664802'] # dynamic is a dictionary of with keys as station names and values as DataFrames >>> df.shape (26801, 39) ... ... # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 5357 ... # get data of 10 % of stations as dataframe >>> _, dynamic = dataset.fetch(0.1, as_dataframe=True) >>> len(dynamic) # dynamic has data for 10% of stations (535 out of 5357) 535 ... ... # dynamic is a dictionary whose values are dataframes of dynamic features >>> [df.shape for df in dynamic.values()] [(26801, 39), (26801, 39), (26801, 39),... (26801, 39), (26801, 39)] ... ... get the data of a single (randomly selected) station >>> _, dynamic = dataset.fetch(stations=1, as_dataframe=True) >>> len(dynamic) # dynamic has data for 1 station 1 ... # get names of available dynamic features >>> dataset.dynamic_features ... # get only selected dynamic features >>> _, dynamic = dataset.fetch('GRDC_3664802', as_dataframe=True, ... dynamic_features=['total_precipitation_sum', 'potential_evaporation_sum', 'temperature_2m_mean', 'q_cms_obs']) >>> dynamic['GRDC_3664802'].shape (26801, 4) ... ... # get names of available static features >>> dataset.static_features ... # get data of 10 random stations >>> _, dynamic = dataset.fetch(10, as_dataframe=True) >>> len(dynamic) # remember this is a dictionary with values as dataframe 10 ... # If we get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='GRDC_3664802', static_features="all", as_dataframe=True) >>> static.shape, len(dynamic), dynamic['GRDC_3664802'].shape ((1, 211), 1, (26801, 39)) ... # If we don't set as_dataframe=True and have xarray installed then the returned data will be a xarray Dataset >>> _, dynamic = dataset.fetch(10) ... type(dynamic) xarray.core.dataset.Dataset ... >>> dynamic.dims FrozenMappingWarningOnValuesAccess({'time': 26801, 'dynamic_features': 39}) ... >>> len(dynamic.data_vars) 10 ... >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape (5357, 2) >>> dataset.stn_coords('GRDC_3664802') # returns coordinates of station whose id is GRDC_3664802 -26.2271 -51.0771 >>> dataset.stn_coords(['GRDC_3664802', 'GRDC_1159337']) # returns coordinates of two stations ... # get area of a single station >>> dataset.area('GRDC_3664802') # get coordinates of two stations >>> dataset.area(['GRDC_3664802', 'GRDC_1159337']) ... # if fiona library is installed we can get the boundary as fiona Geometry >>> dataset.get_boundary('GRDC_3664802') ...
- __init__(path=None, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str) – This can only be set for datasets which are available at multiple timesteps such as LamaHCE or LamaHIce etc.
to_netcdf (bool) – whether the data should be saved in netCDF format or not If set to true, the data will be saved in netCDF format which can take time for the first time it is created. However, it leads to faster I/O operations in subsequent accesses.
overwrite (bool) – whether to overwrite existing files or not. If set to True, the data will be redownloaded.
verbosity (int) –
This parameter determines the level of verbosity for logging messages.
0: no message will be printed
1: only important messages will be printed
>1: any higher value greater than 1 will result in more verbose output
kwargs – Any other keyword arguments for the parent
Datasetsclass
- property dyn_map
A dictionary that maps dynamic features to their names in the dataset.
- property dynamic_features
Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_dynamic_features().- Return type:
List[str]
- property end
end of data
- fetch_station_features(station: str, dynamic_features: str | list | None = 'all', static_features: str | list | None = None, st: str | None = None, en: str | None = None, **kwargs) tuple[DataFrame, DataFrame][source]
Fetches features for one station.
- Parameters:
station – station id/gauge id for which the data is to be fetched.
dynamic_features (str/list, optional) – names of dynamic features/attributes to fetch
static_features – names of static features/attributes to be fetches
st (str,optional) – starting point from which the data to be fetched. By default, the data will be fetched from where it is available.
en (str, optional) – end point of data to be fetched. By default the dat will be fetched
- Returns:
A tuple of static and dynamic features, both as
pandas.DataFrame. The dataframe of static features will be of single row while the dynamic features will be of shape (time, dynamic features).- Return type:
Examples
>>> from aqua_fetch import GRDCCaravan >>> dataset = GRDCCaravan() >>> dataset.fetch_station_features('912101A')
- property static_features
Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_static_features().- Return type:
List[str]
- class aqua_fetch.rr.HYSETS(path: str, sources: Dict[str, str] = None, **kwargs)[source]
Bases:
_RainfallRunoffdatabase for hydrometeorological modeling of 14,425 North American watersheds from 1950-2023 following the work of Arsenault et al., 2020 This data has 20 dynamic features and 30 static features. Most of the dynamic features have more than one source. The data is available in netcdf format therefore, this package requires xarray and netCDF4 to be installed..
Following data_source are available.
sources
dynamic_features
SNODAS_SWE
dscharge, swe
SCDNA
discharge, pr, tasmin, tasmax
nonQC_stations
discharge, pr, tasmin, tasmax
Livneh
discharge, pr, tasmin, tasmax
ERA5
discharge, pr, tasmax, tasmin
ERAS5Land_SWE
discharge, swe
ERA5Land
discharge, pr, tasmax, tasmin
all sources contain one or more following dynamic_features with following shapes
dynamic_features
shape
time
(25202,)
watershedID
(14425,)
drainage_area
(14425,)
drainage_area_GSIM
(14425,)
flag_GSIM_boundaries
(14425,)
flag_artificial_boundaries
(14425,)
centroid_lat
(14425,)
centroid_lon
(14425,)
elevation
(14425,)
slope
(14425,)
discharge
(14425, 25202)
pr
(14425, 25202)
tasmax
(14425, 25202)
tasmin
(14425, 25202)
Examples
>>> from aqua_fetch import HYSETS >>> dataset = HYSETS() ... # get data by station id >>> _, dynamic = dataset.fetch(stations='5', as_dataframe=True) >>> df = dynamic['5'] # dynamic is a dictionary of with keys as station names and values as DataFrames >>> df.shape (27028, 20) ... ... # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 14425 ... # get data of 10 % of stations as dataframe >>> _, dynamic = dataset.fetch(0.1, as_dataframe=True) >>> len(dynamic) # dynamic has data for 10% of stations (1442 out of 14425) 1442 ... ... # dynamic is a dictionary whose values are dataframes of dynamic features >>> [df.shape for df in dynamic.values()] [(27028, 20), (27028, 20), (27028, 20),... (27028, 20), (27028, 20)] ... ... get the data of a single (randomly selected) station >>> _, dynamic = dataset.fetch(stations=1, as_dataframe=True) >>> len(dynamic) # dynamic has data for 1 station 1 ... # get names of available dynamic features >>> dataset.dynamic_features ... # get only selected dynamic features >>> _, dynamic = dataset.fetch('5', as_dataframe=True, ... dynamic_features=['evap_mm', 'pcp_mm', 'snowmelt_mm', 'swe_mm', 'q_cms_obs']) >>> dynamic['5'].shape (27028, 5) ... ... # get names of available static features >>> dataset.static_features ... # get data of 10 random stations >>> _, dynamic = dataset.fetch(10, as_dataframe=True) >>> len(dynamic) # remember this is a dictionary with values as dataframe 10 ... # If we get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='5', static_features="all", as_dataframe=True) >>> static.shape, len(dynamic), dynamic['5'].shape ((1, 30), 1, (27028, 20)) ... # If we don't set as_dataframe=True and have xarray installed then the returned data will be a xarray Dataset >>> _, dynamic = dataset.fetch(10) ... type(dynamic) xarray.core.dataset.Dataset ... >>> dynamic.dims FrozenMappingWarningOnValuesAccess({'time': 27028, 'dynamic_features': 20}) ... >>> len(dynamic.data_vars) 10 ... >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape (14425, 2) >>> dataset.stn_coords('5') # returns coordinates of station whose id is 5 47.091389 -67.731392 >>> dataset.stn_coords(['5', '12']) # returns coordinates of two stations ... # get area of a single station >>> dataset.area('5') # get coordinates of two stations >>> dataset.area(['5', '12']) ... # if fiona library is installed we can get the boundary as fiona Geometry >>> dataset.get_boundary('5')
- __init__(path: str, sources: Dict[str, str] = None, **kwargs)[source]
- Parameters:
path (str) – The path under which the data is to be saved or is saved already. If the data is alredy downloaded then provide the path under which HYSETS data is located. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless
overwriteis set to True.sources (dict) –
sources for each dynamic feature. The keys should be dynamic features and values should be sources. Available sources for the dynamic features are as below
10m_u_component_of_wind: [‘ERA5’, ‘ERA5Land’]
10m_v_component_of_wind: [‘ERA5’, ‘ERA5Land’]
2m_dewpoint: [‘ERA5’, ‘ERA5Land’]
2m_tasmax: [‘NRCAN’, ‘Livneh’, ‘QC_stations’, ‘ERA5’, ‘nonQC_stations’, ‘ERA5Land’, ‘SCDNA’]
2m_tasmin: [‘NRCAN’, ‘Livneh’, ‘QC_stations’, ‘ERA5’, ‘nonQC_stations’, ‘ERA5Land’, ‘SCDNA’]
discharge: [‘NRCAN’, ‘ERA5’, ‘ERA5Land’, ‘Livneh’, ‘nonQC_stations’, ‘SCDNA’, ‘SNODAS’, ‘QC_stations’]
evaporation: [‘ERA5’, ‘ERA5Land’]
snow_density: [‘ERA5’, ‘ERA5Land’]
snow_evaporation: [‘ERA5’, ‘ERA5Land’]
snow_water_equivalent: [‘ERA5’, ‘ERA5Land’, ‘SNODAS’]
snowfall: [‘ERA5’, ‘ERA5Land’]
snowmelt: [‘ERA5’, ‘ERA5Land’]
surface_downwards_solar_radiation: [‘ERA5’, ‘ERA5Land’]
surface_downwards_thermal_radiation: [‘ERA5’, ‘ERA5Land’]
surface_net_solar_radiation: [‘ERA5’, ‘ERA5Land’]
surface_net_thermal_radiation: [‘ERA5’, ‘ERA5Land’]
surface_pressure: [‘ERA5’, ‘ERA5Land’]
surface_runoff: [‘ERA5’, ‘ERA5Land’]
total_cloud_cover: [‘ERA5’]
total_precipitation: [‘NRCAN’, ‘Livneh’, ‘QC_stations’, ‘ERA5’, ‘nonQC_stations’, ‘ERA5Land’, ‘SCDNA’]
kwargs – arguments for
_RainfallRunoffbase class
- property OfficialID_WatershedID_map
A dictionary mapping Official_ID to Watershed_ID. For example ‘1’: ‘01AD002’
- property WatershedID_OfficialID_map
A dictionary mapping Watershed_ID to Official_ID. For example ‘01AD002’: ‘1’
- area(stations: str | List[str] = 'all', source: str = 'other') Series[source]
Returns area_gov (Km2) of all catchments as
pandas.Series- Parameters:
stations (str/list) – name/names of stations. Default is None, which will return area of all stations
source (str) – source of area calculation. It should be either
gsimorother
- Returns:
a
pandas.Serieswhose indices are catchment ids and values are areas of corresponding catchments.- Return type:
pd.Series
Examples
>>> from aqua_fetch import HYSETS >>> dataset = HYSETS() >>> dataset.area() # returns area of all stations >>> dataset.area('92') # returns area of station whose id is 912101A >>> dataset.area(['92', '142']) # returns area of two stations
- property boundary_id_map: str
Name of the attribute in the boundary (.shp/.gpkg) file that will be used to map the catchment/station id to the geometry of the catchment/station. This is used to create the boundary id map.
- property dyn_map: Dict[str, str]
A dictionary that maps dynamic features to their names in the dataset.
- property dynamic_features: List[str]
Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_dynamic_features().- Return type:
List[str]
- property end: Timestamp
end of data
- fetch_dynamic_features(station, dynamic_features='all', st=None, en=None, as_dataframe=False)[source]
Fetches dynamic features of one station.
Examples
>>> from aqua_fetch import HYSETS >>> dataset = HYSETS() >>> dyn_features = dataset.fetch_dynamic_features('station_name')
- fetch_stations_features(stations: list, dynamic_features: str | list | None = 'all', static_features: str | list | None = None, st=None, en=None, as_dataframe: bool = False, **kwargs) Tuple[DataFrame, DataFrame | Dataset][source]
returns features of multiple stations .. rubric:: Examples
>>> from aqua_fetch import HYSETS >>> dataset = HYSETS() >>> stations = dataset.stations()[0:3] >>> features = dataset.fetch_stations_features(stations)
- property static_features: List[str]
Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_static_features().- Return type:
List[str]
- property static_map: Dict[str, str]
A dictionary that maps static features to their names in the dataset.
- stations() List[str][source]
retuns a list of station names. The
Watershed_IDof the station is used as station name instead ofOfficial_ID. This is because in .nc files watershed_ID is used for stations instead of Official_ID.Official_IDstarts with 1, 2, 3 and so on whileWatershed_IDis a code from meteo agency such as01AD002for station 1.- Returns:
a list of ids of stations
- Return type:
Examples
>>> from aqua_fetch import HYSETS >>> dataset = HYSETS() ... # get name of all stations as list >>> dataset.stations()
- class aqua_fetch.rr.HYPE(time_step: str = 'daily', path=None, **kwargs)[source]
Bases:
_RainfallRunoffDownloads and preprocesses HYPE [1] dataset from Lindstroem et al., 2010 [2] . This is a rainfall-runoff dataset of Costa Rica of 564 stations from 1985 to 2019 at daily, monthly and yearly time steps.
Examples
>>> from aqua_fetch import HYPE >>> dataset = HYPE() ... # get data by station id >>> _, dynamic = dataset.fetch(stations='564', as_dataframe=True) >>> df = dynamic['564'] # dynamic is a dictionary of with keys as station names and values as DataFrames >>> df.shape (12783, 9) ... ... # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 564 ... # get data of 10 % of stations as dataframe >>> _, dynamic = dataset.fetch(0.1, as_dataframe=True) >>> len(dynamic) # dynamic has data for 10% of stations (67 out of 671) 67 ... ... # dynamic is a dictionary whose values are dataframes of dynamic features >>> [df.shape for df in dynamic.values()] [(12783, 9), (12783, 9), (12783, 9),... (12783, 9), (12783, 9)] ... ... get the data of a single (randomly selected) station >>> _, dynamic = dataset.fetch(stations=1, as_dataframe=True) >>> len(dynamic) # dynamic has data for 1 station 1 ... # get names of available dynamic features >>> dataset.dynamic_features ... # get only selected dynamic features >>> _, dynamic = dataset.fetch('564', as_dataframe=True, ... dynamic_features=['AET_mm', 'Prec_mm', 'Streamflow_mm']) >>> dynamic['564'].shape (12783, 3) ... ... # get names of available static features >>> dataset.static_features ... # get data of 10 random stations >>> _, dynamic = dataset.fetch(10, as_dataframe=True) >>> len(dynamic) # remember this is a dictionary with values as dataframe 10 ... # If we get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='564', static_features="all", as_dataframe=True) >>> static.shape, len(dynamic), dynamic['564'].shape ((1, 59), 1, (12783, 9)) ... # If we don't set as_dataframe=True and have xarray installed then the returned data will be a xarray Dataset >>> _, dynamic = dataset.fetch(10) ... type(dynamic) xarray.core.dataset.Dataset ... >>> dynamic.dims FrozenMappingWarningOnValuesAccess({'time': 12783, 'dynamic_features': 9}) ... >>> len(dynamic.data_vars) 10 ... >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape (564, 2) >>> dataset.stn_coords('564') # returns coordinates of station whose id is 564 40.480419 -123.890877 >>> dataset.stn_coords(['564', '563']) # returns coordinates of two stations ... # get area of a single station >>> dataset.area('564') # get coordinates of two stations >>> dataset.area(['564', '563']) ... # if fiona library is installed we can get the boundary as fiona Geometry >>> dataset.get_boundary('564')
- __init__(time_step: str = 'daily', path=None, **kwargs)[source]
- Parameters:
path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless
overwriteis set to True.time_step (str) – one of
daily,monthoryear**kwargs – key word arguments
- area(stations: str | List[str] = 'all') Series[source]
Returns area (Km2) of all catchments as
pandas.Series- Parameters:
stations (str/list) – name/names of stations. Default is None, which will return area of all stations
- Returns:
a
pandas.Serieswhose indices are catchment ids and values are areas of corresponding catchments.- Return type:
pd.Series
Examples
>>> from aqua_fetch import HYPE >>> dataset = HYPE() >>> dataset.area() # returns area of all stations >>> dataset.stn_coords('2') # returns area of station whose id is 912101A >>> dataset.stn_coords(['2', '605']) # returns area of two stations
- property end
end of data
- fetch_static_features(station, static_features=None)[source]
static data for HYPE is not available.
- property static_features
Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_static_features().- Return type:
List[str]
- stations() list[source]
Names/ids of stations/catchment/gauges or whatever that would be used to index each station in the dataset.
- stn_coords(stations: str | List[str] = 'all') DataFrame[source]
returns coordinates of stations as DataFrame with
longandlatas columns.- Parameters:
stations – name/names of stations. If not given, coordinates of all stations will be returned.
Examples
>>> dataset = HYPE() >>> dataset.stn_coords() # returns coordinates of all stations >>> dataset.stn_coords('2') # returns coordinates of station whose id is 912101A >>> dataset.stn_coords(['2', '605']) # returns coordinates of two stations
- class aqua_fetch.Ireland(path: str | PathLike = None, estreams_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
Bases:
_EStreamsData of 464 catchments of Ireland. Out of these 464 catchments, 280 are from OPW and 184 are from EPA. The observed streamflow data for EPA stations is downloaded from https://epawebapp.epa.ie/Hydronet/#Flow while the observed streamflow for OPW stations is downloaded from https://waterlevel.ie/hydro-data/#/overview/Waterlevel. It should be that out of 280 OPW stations, streamflow data is available for only 129 stations. The meteorological data, static catchment features and catchment boundaries are taken from
aqua_fetch.EStreamsfollwoing the works of Nascimento et al., 2024 project. Therefore, the number of static features are 214 and dynamic features are 10 and the data is available from 1992-01-01 to 2020-06-31.Examples
>>> from aqua_fetch import Ireland >>> dataset = Ireland() >>> _, data = dataset.fetch(0.1) # the returned data will be a xarray Dataset >>> type(data) xarray.core.dataset.Dataset >>> data.dims FrozenMappingWarningOnValuesAccess({'time': 26844, 'dynamic_features': 10}) >>> len(data.data_vars) # number of stations for which data has been fetched 46 >>> _, data = dataset.fetch(stations=1) # get data of only one random station # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 464 # get data by station id >>> _, data = dataset.fetch(stations='IEEP0281') # get names of available dynamic features >>> dataset.dynamic_features # get only selected dynamic features >>> _, data = dataset.fetch(1, ... dynamic_features=['pcp_mm', 'rh_%', 'airtemp_C_mean', 'pet_mm', 'q_cms_obs']) # get names of available static features >>> dataset.static_features # get data of 10 random stations >>> _, data = dataset.fetch(10) >>> len(data.data_vars) 10 # If we want to get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='IEEP0281', static_features="all") >>> static.shape, len(dynamic.data_vars) ((1, 214), 1) >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape (464, 2) >>> dataset.stn_coords('IEEP0281') # returns coordinates of station whose id is IEEP0281 52.217434 -8.494649 >>> dataset.stn_coords(['IEEP0281', 'IEEP0282']) # returns coordinates of two stations IEEP0281 52.217434 -8.494649 IEEP0282 54.284546 -6.921607
- __init__(path: str | PathLike = None, estreams_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str) – This can only be set for datasets which are available at multiple timesteps such as LamaHCE or LamaHIce etc.
to_netcdf (bool) – whether the data should be saved in netCDF format or not If set to true, the data will be saved in netCDF format which can take time for the first time it is created. However, it leads to faster I/O operations in subsequent accesses.
overwrite (bool) – whether to overwrite existing files or not. If set to True, the data will be redownloaded.
verbosity (int) –
This parameter determines the level of verbosity for logging messages.
0: no message will be printed
1: only important messages will be printed
>1: any higher value greater than 1 will result in more verbose output
kwargs – Any other keyword arguments for the parent
Datasetsclass
- class aqua_fetch.rr.Italy(path: str | PathLike = None, estreams_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
Bases:
_EStreamsData of 294 catchments of Italy. The observed streamflow data is downloaded from http://www.hiscentral.isprambiente.gov.it/hiscentral/hydromap.aspx?map=obsclient . The meteorological data, static catchment features and catchment boundaries are taken from
aqua_fetch.EStreamsfollwoing the works of Nascimento et al., 2024 . Therefore, the number of static features are 214 and dynamic features are 10 and the data is available from 1992-01-01 to 2020-06-31.Examples
>>> from aqua_fetch import Italy >>> dataset = Italy() >>> _, data = dataset.fetch(0.1) # the returned data will be a xarray Dataset >>> type(data) xarray.core.dataset.Dataset >>> data.dims FrozenMappingWarningOnValuesAccess({'time': 26844, 'dynamic_features': 10}) >>> len(data.data_vars) # number of stations for which data has been fetched 29 >>> _, data = dataset.fetch(stations=1) # get data of only one random station # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 294 # get data by station id >>> _, data = dataset.fetch(stations='ITIS0001') # get names of available dynamic features >>> dataset.dynamic_features # get only selected dynamic features >>> _, data = dataset.fetch(1, ... dynamic_features=['pcp_mm', 'rh_%', 'airtemp_C_mean', 'pet_mm', 'q_cms_obs']) # get names of available static features >>> dataset.static_features # get data of 10 random stations >>> _, data = dataset.fetch(10) >>> len(data.data_vars) 10 # If we want to get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='ITIS0001', static_features="all") >>> static.shape, len(dynamic.data_vars) ((1, 214), 1) >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape (294, 2) >>> dataset.stn_coords('ITIS0001') # returns coordinates of station whose id is ITIS0001 42.835835 13.919167 >>> dataset.stn_coords(['ITIS0001', 'ITIS0002']) # returns coordinates of two stations ITIS0001 42.835835 13.919167 ITIS0002 42.783890 13.905833
- __init__(path: str | PathLike = None, estreams_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str) – This can only be set for datasets which are available at multiple timesteps such as LamaHCE or LamaHIce etc.
to_netcdf (bool) – whether the data should be saved in netCDF format or not If set to true, the data will be saved in netCDF format which can take time for the first time it is created. However, it leads to faster I/O operations in subsequent accesses.
overwrite (bool) – whether to overwrite existing files or not. If set to True, the data will be redownloaded.
verbosity (int) –
This parameter determines the level of verbosity for logging messages.
0: no message will be printed
1: only important messages will be printed
>1: any higher value greater than 1 will result in more verbose output
kwargs – Any other keyword arguments for the parent
Datasetsclass
- class aqua_fetch.Japan(path: str | PathLike = None, gsha_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
Bases:
_GSHAData of 694 catchments of Japan from river.go.jp website . The meteorological data static catchment features and catchment boundaries taken from GSHA project. Therefore, the number of static features are 35 and dynamic features are 27 and the data is available from 1979-01-01 to 2022-12-31.
- __init__(path: str | PathLike = None, gsha_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str) – This can only be set for datasets which are available at multiple timesteps such as LamaHCE or LamaHIce etc.
to_netcdf (bool) – whether the data should be saved in netCDF format or not If set to true, the data will be saved in netCDF format which can take time for the first time it is created. However, it leads to faster I/O operations in subsequent accesses.
overwrite (bool) – whether to overwrite existing files or not. If set to True, the data will be redownloaded.
verbosity (int) –
This parameter determines the level of verbosity for logging messages.
0: no message will be printed
1: only important messages will be printed
>1: any higher value greater than 1 will result in more verbose output
kwargs – Any other keyword arguments for the parent
Datasetsclass
- class aqua_fetch.rr.LamaHCE(path=None, *, timestep: str = 'D', data_type: str = 'total_upstrm', to_netcdf: bool = False, overwrite=False, **kwargs)[source]
Bases:
_RainfallRunoffLarge-Sample Data for Hydrology and Environmental Sciences for Central Europe (mainly Austria). The dataset is downloaded from zenodo following the work of Klingler et al., 2021 . For
total_upstrmdata, there are 859 stations with 61 static features and 17 dynamic features. The temporal extent of data is from 1981-01-01 to 2019-12-31.- __init__(path=None, *, timestep: str = 'D', data_type: str = 'total_upstrm', to_netcdf: bool = False, overwrite=False, **kwargs)[source]
- Parameters:
path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless
overwriteis set to True.timestep – possible values are
Dfor daily orHfor hourly timestepdata_type – possible values are
total_upstrm,intermediate_allorintermediate_lowimp
Examples
>>> from aqua_fetch import LamaHCE # by default the timestep is daily and data_type is 'total_upstrm' >>> dataset = LamaHCE() ... # get data by station id >>> _, dynamic = dataset.fetch(stations='826', as_dataframe=True) >>> df = dynamic['826'] # dynamic is a dictionary of with keys as station names and values as DataFrames >>> df.shape (14244, 22) ... ... # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 859 ... # get data of 10 % of stations as dataframe >>> _, dynamic = dataset.fetch(0.1, as_dataframe=True) >>> len(dynamic) # dynamic has data for 10% of stations (85 out of 859) 85 ... ... # dynamic is a dictionary whose values are dataframes of dynamic features >>> [df.shape for df in dynamic.values()] [(14244, 22), (14244, 22), (14244, 22),... (14244, 22), (14244, 22)] ... ... get the data of a single (randomly selected) station >>> _, dynamic = dataset.fetch(stations=1, as_dataframe=True) >>> len(dynamic) # dynamic has data for 1 station 1 ... # get names of available dynamic features >>> dataset.dynamic_features ... # get only selected dynamic features >>> _, dynamic = dataset.fetch('826', as_dataframe=True, ... dynamic_features=['airtemp_C_mean', 'total_et', 'pcp_mm', 'q_cms_obs']) >>> dynamic['826'].shape (14244, 4) ... ... # get names of available static features >>> dataset.static_features ... # get data of 10 random stations >>> _, dynamic = dataset.fetch(10, as_dataframe=True) >>> len(dynamic) # remember this is a dictionary with values as dataframe 10 ... # If we get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='826', static_features="all", as_dataframe=True) >>> static.shape, len(dynamic), dynamic['826'].shape ((1, 84), 1, (14244, 22)) ... # If we don't set as_dataframe=True and have xarray installed then the returned data will be a xarray Dataset >>> _, dynamic = dataset.fetch(10) ... type(dynamic) xarray.core.dataset.Dataset ... >>> dynamic.dims FrozenMappingWarningOnValuesAccess({'time': 14244, 'dynamic_features': 22}) ... >>> len(dynamic.data_vars) 10 ... >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape (859, 2) >>> dataset.stn_coords('826') # returns coordinates of station whose id is 826 2995596.0 4811891.0 >>> dataset.stn_coords(['826', '819']) # returns coordinates of two stations ... # get area of a single station >>> dataset.area('826') # get coordinates of two stations >>> dataset.area(['826', '819']) ... # if fiona library is installed we can get the boundary as fiona Geometry >>> dataset.get_boundary('826') ... # the data_type can also be 'intermediate_all' >>> dataset = LamaHCE(data_type='intermediate_all') ... # or 'intermediate_lowimp' >>> dataset = LamaHCE(data_type='intermediate_lowimp') >>> len(dataset.stations()) 454 ... # the timestep can also be hourly i.e. 'H' >>> dataset = LamaHCE(timestep='H') >>> _, dynamic = dataset.fetch(stations='79', as_dataframe=True) >>> dynamic['79'].shape (341856, 16) # there are 16 dynamic features for hourly data
- property dyn_fname: str | PathLike
name of the .nc file which contains dynamic features. This file is created during dataset initialization only if to_netcdf is True and xarray is installed and the file does not already exists. The creation of this file can take some time however it leads to faster I/O operations.
- property dyn_map
A dictionary that maps dynamic features to their names in the dataset.
- property dynamic_features
Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_dynamic_features().- Return type:
List[str]
- property end
end of data
- fetch_static_features(stations: str | List[str] = 'all', static_features: str | List[str] = None) DataFrame[source]
static features of LamaHCE
- Parameters:
stations (str) – name/id of station of which to extract the data
static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.
Examples
>>> from aqua_fetch import LamaHCE >>> dataset = LamaHCE(timestep='D', data_type='total_upstrm') >>> df = dataset.fetch_static_features('99') # (1, 61) ... # get list of all static features >>> dataset.static_features >>> dataset.fetch_static_features('99', >>> static_features=['area_calc', 'elev_mean', 'agr_fra', 'sand_fra']) # (1, 4)
- fetch_stations_features(stations: list, dynamic_features='all', static_features=None, st=None, en=None, as_dataframe: bool = False, **kwargs)[source]
Reads attributes of more than one stations.
This function checks of .nc files exist, then they are not prepared and saved otherwise first nc files are prepared and then the data is read again from nc files. Upon subsequent calls, the nc files are used for reading the data.
- Parameters:
stations – list of stations for which data is to be fetched.
dynamic_features – list of dynamic attributes to be fetched. if ‘all’, then all dynamic attributes will be fetched.
static_features – list of static attributes to be fetched. If all, then all static attributes will be fetched. If None, then no static attribute will be fetched.
st – start of data to be fetched.
en – end of data to be fetched.
as_dataframe – whether to return the data as pandas dataframe. default is
xarray.Datasetobjectdict (kwargs) – additional keyword arguments
- Returns:
tuple – A tuple of static and dynamic features. Static features are always returned as
pandas.DataFramewith shape (stations, static features). The index of static features’ DataFrame is the station/gauge ids while the columns are names of the static features. Dynamic features are returned either asxarray.Datasetor a dictionary with keys as station names and values aspandas.DataFramedepending upon whether as_dataframe is True or False and whether thexarraylibrary is installed or not. If dynamic features arexarray.Dataset, then this dataset consists of data_vars equal to the number of stations and station names asxarray.Dataset.variablesand time and dynamic_features as dimensions and coordinates.Raises – ValueError, if both dynamic_features and static_features are None
Examples
>>> from aqua_fetch import CAMELS_AUS >>> dataset = CAMELS_AUS() ... # find out station ids >>> dataset.stations() ... # get data of selected stations >>> dataset.fetch_stations_features(['912101A', '912105A', '915011A'], ... as_dataframe=True)
- property static_features: List[str]
Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_static_features().- Return type:
List[str]
- property static_map: Dict[str, str]
A dictionary that maps static features to their names in the dataset.
- class aqua_fetch.rr.LamaHIce(path=None, overwrite=False, *, timestep: str = 'D', data_type: str = 'total_upstrm', to_netcdf: bool = False, **kwargs)[source]
Bases:
LamaHCEDaily and hourly hydro-meteorological time series data of river basins of Iceland following Helgason et al., 2024. The total period of dataset is from 1950 to 2021 from 111 catchments for daily and from 1976-2023 for hourly timestep. The average length of daily data is 33 years while for that of hourly it is 11 years. The dataset is available on hydroshare
Examples
>>> from aqua_fetch import LamaHIce # by default the timestep is daily and data_type is 'total_upstrm' >>> dataset = LamaHIce() ... # get data by station id >>> _, dynamic = dataset.fetch(stations='92', as_dataframe=True) >>> df = dynamic['92'] # dynamic is a dictionary of with keys as station names and values as DataFrames >>> df.shape (26298, 36) ... ... # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 111 ... # get data of 10 % of stations as dataframe >>> _, dynamic = dataset.fetch(0.1, as_dataframe=True) >>> len(dynamic) # dynamic has data for 10% of stations (11 out of 111) 11 ... ... # dynamic is a dictionary whose values are dataframes of dynamic features >>> [df.shape for df in dynamic.values()] [(26298, 36), (26298, 36), (26298, 36),... (26298, 36), (26298, 36)] ... ... get the data of a single (randomly selected) station >>> _, dynamic = dataset.fetch(stations=1, as_dataframe=True) >>> len(dynamic) # dynamic has data for 1 station 1 ... # get names of available dynamic features >>> dataset.dynamic_features ... # get only selected dynamic features >>> _, dynamic = dataset.fetch('92', as_dataframe=True, ... dynamic_features=['swe', 'pet_mm', 'pcp_mm', 'q_cms_obs']) >>> dynamic['92'].shape (26298, 4) ... ... # get names of available static features >>> dataset.static_features ... # get data of 10 random stations >>> _, dynamic = dataset.fetch(10, as_dataframe=True) >>> len(dynamic) # remember this is a dictionary with values as dataframe 10 ... # If we get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='92', static_features="all", as_dataframe=True) >>> static.shape, len(dynamic), dynamic['92'].shape ((1, 154), 1, (26298, 36)) ... # If we don't set as_dataframe=True and have xarray installed then the returned data will be a xarray Dataset >>> _, dynamic = dataset.fetch(10) ... type(dynamic) xarray.core.dataset.Dataset ... >>> dynamic.dims FrozenMappingWarningOnValuesAccess({'time': 26298, 'dynamic_features': 36}) ... >>> len(dynamic.data_vars) 10 ... >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape (111, 2) >>> dataset.stn_coords('92') # returns coordinates of station whose id is 92 571777.0 309737.0 >>> dataset.stn_coords(['92', '5']) # returns coordinates of two stations ... # get area of a single station >>> dataset.area('92') # get coordinates of two stations >>> dataset.area(['92', '5']) ... # if fiona library is installed we can get the boundary as fiona Geometry >>> dataset.get_boundary('92') ... # the data_type can also be 'intermediate_all' >>> dataset = LamaHIce(data_type='intermediate_all') ... # or 'intermediate_lowimp' >>> dataset = LamaHIce(data_type='intermediate_lowimp') >>> len(dataset.stations()) 86 ... # the timestep can also be 'H' >>> dataset = LamaHIce(timestep='H') >>> _, dynamic = dataset.fetch(stations='79', as_dataframe=True) >>> dynamic['79'].shape (412848, 28) # there are 28 dynamic features for hourly data
- __init__(path=None, overwrite=False, *, timestep: str = 'D', data_type: str = 'total_upstrm', to_netcdf: bool = False, **kwargs)[source]
- Parameters:
path (str) – If the data is alredy downloaded then provide the complete path to it. If None, then the data will be downloaded. The data is downloaded once and therefore susbsequent calls to this class will not download the data unless
overwriteis set to True.timestep – possible values are
Dfor daily orHfor hourly timestepdata_type – possible values are
total_upstrm,intermediate_allorintermediate_lowimp
- basin_attributes() DataFrame[source]
returns basin attributes which are catchment attributes, water balance all attributes and water balance filtered attributes
- Returns:
a dataframe of shape (111, 104) where 104 are the static catchment/basin attributes
- Return type:
pd.DataFrame
- property dyn_map
A dictionary that maps dynamic features to their names in the dataset.
- property end
end of data
- fetch_clim_features(stations: str | List[str] = None)[source]
Returns climate time series data for one or more stations
- Return type:
pd.DataFrame
- fetch_q(stations: str | List[str] = None, qc_flag: int = None)[source]
returns streamflow for one or more stations
- Parameters:
- Returns:
a
pandas.DataFramewhose index is the time and columns are names of stations For daily timestep, the dataframe has shape of 32630 rows and 111 columns- Return type:
pd.DataFrame
- fetch_static_features(stations: str | list = 'all', static_features: str | list = None) DataFrame[source]
fetches static features of one or more stations
- fetch_stn_meteo(stn: str, nrows: int = None) DataFrame[source]
returns climate/meteorological time series data for one station
- Returns:
a
pandas.DataFramewith 23 columns- Return type:
pd.DataFrame
- gauge_attributes() DataFrame[source]
returns gauge attributes from following two files
Gauge_attributes.csv
hydro_indices_1981_2018.csv
- Returns:
a dataframe of shape (111, 28)
- Return type:
pd.DataFrame
- property gauges_path
returns the path where gauge data files are located
- property q_dir
returns the path where q files are located
- q_mm(stations: str | List[str] = None) DataFrame[source]
returns streamflow in the units of milimeter per timestep (e.g. mm/day or mm/hour). This is obtained by diving q_cms/area
- Parameters:
stations (str/list) – name/names of stations. Default is None, which will return area of all stations
- Returns:
a
pandas.DataFramewhose indices are time-steps and columns are catchment/station ids.- Return type:
pd.DataFrame
- property q_path
path where all q files are located
- property static_map: Dict[str, str]
A dictionary that maps static features to their names in the dataset.
- class aqua_fetch.rr.NPCTRCatchments(path=None, timestep: str = 'Hourly', qflag=['AV', 'EV'], **kwargs)[source]
Bases:
_RainfallRunoffHigh-resolution streamflow and weather data (2013–2019) for seven small coastal watersheds in the northeast Pacific coastal temperate rainforest, Canada following Korver et al., 2022 . The data include 8 dynamic features at hourly and 5 min timestep and 14 static features. The dynamic features include streamflow, precipitation, temperature, relative humidity, wind speed, wind direction, and solar radiation.
Examples
>>> from aqua_fetch import NPCTRCatchments >>> ds = NPCTRCatchments() >>> ds.stations ['626', '693', '703', '708', '819', '844', '1015'] >>> len(ds.static_features) 12 >>> area = ds.area() >>> area.shape (7,) >>> coords = ds.stn_coords() >>> coords.shape (7, 2)
- __init__(path=None, timestep: str = 'Hourly', qflag=['AV', 'EV'], **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str) – This can only be set for datasets which are available at multiple timesteps such as LamaHCE or LamaHIce etc.
to_netcdf (bool) – whether the data should be saved in netCDF format or not If set to true, the data will be saved in netCDF format which can take time for the first time it is created. However, it leads to faster I/O operations in subsequent accesses.
overwrite (bool) – whether to overwrite existing files or not. If set to True, the data will be redownloaded.
verbosity (int) –
This parameter determines the level of verbosity for logging messages.
0: no message will be printed
1: only important messages will be printed
>1: any higher value greater than 1 will result in more verbose output
kwargs – Any other keyword arguments for the parent
Datasetsclass
- all_stn_coords() DataFrame[source]
Using coordinate information of Stream Sensor Nodes, assuming that stream sensors would be closer to the stream gauge. The values are taken from Table A1 of paper
- property dynamic_features: List[str]
Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_dynamic_features().- Return type:
List[str]
- property end
end of data
- fetch_static_features(stations: str | list = 'all', static_features: str | list = 'all') DataFrame[source]
Fetches all or selected static features of one or more stations.
- Parameters:
stations (str) – name/id of station of which to extract the data
static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.
- Returns:
- Return type:
pd.DataFrame
Examples
>>> from aqua_fetch import NPCTRCatchments >>> dataset = NPCTRCatchments() >>> dataset.fetch_static_features('626') >>> dataset.static_features >>> dataset.fetch_static_features('626', ... static_features=['area_km2', 'elev_catch_m', 'slope_%'])
- read_pcp()[source]
Examples
>>> ds = NPCTRCatchments() >>> pcp = ds.read_pcp() >>> pcp.shape (849472, 5) >>> pcp['Site'].nunique() 15 pcp.index[0], pcp.index[-1] (Timestamp('2013-09-09 21:00:00'), Timestamp('2019-10-01 00:00:00')) # A is accepted and E is estimated >>> pcp['Qflags'].unique() [nan, 'AV', 'EV', 'EV: Sensor malfunction due to wolf bite'] >>> ds = NPCTRCatchments(timestep='5min') >>> pcp = ds.read_pcp() >>> pcp.shape (8712098, 5) >>> pcp['Site'].nunique() 14 >>> pcp.index[0], pcp.index[-1] (Timestamp('2013-09-05 00:00:00'), Timestamp('2019-10-01 00:00:00'))
- read_rel_hum()[source]
Examples
>>> ds = NPCTRCatchments() >>> rh = ds.read_rel_hum() >>> rh.shape (849472, 4) >>> rh['Site'].nunique() 15 >>> rh.index[0], rh.index[-1] (Timestamp('2013-09-10 00:00:00'), NaT) ... getting data for 5min timestep >>> ds = NPCTRCatchments(timestep='5min') >>> rh_5m = ds.read_rel_hum() >>> rh_5m.shape (8281767, 3) >>> rh_5m['Site'].nunique() 13 >>> rh_5m.index[0], rh.index[-1] (Timestamp('2013-09-10 00:00:00'), NaT) >>> rh_5m['Qlevel'].unique() ['1', '2', '3', nan]
- read_snow_depth()[source]
Examples
>>> from aqua_fetch import NPCTRCatchments >>> ds = NPCTRCatchments() >>> snowdepth = ds.read_snow_depth() >>> snowdepth.shape (105016, 15) ... get 5min timestep data >>> ds = NPCTRCatchments(timestep='5min') >>> snowdepth = ds.read_snow_depth() >>> snowdepth.shape (105016, 15)
- read_sol_rad()[source]
Solar radiation is common among all stations so no ‘Site’ column is present in the dataframe.
Examples
>>> from aqua_fetch import NPCTRCatchments >>> ds = NPCTRCatchments() >>> solrad = ds.read_sol_rad() >>> solrad.shape (53072, 3) >>> solrad['Qflags_SolarRad'].unique() ['AV', 'EV'] >>> ds = NPCTRCatchments(timestep='5min') >>> solrad = ds.read_sol_rad() >>> solrad.shape (637108, 3) >>> solrad['SolarRadQ_flags'].nunique() 4
- read_temp()[source]
Examples
>>> from aqua_fetch import NPCTRCatchments >>> ds = NPCTRCatchments() >>> temp = ds.read_temp() >>> temp.shape (745836, 4) >>> temp['Site'].nunique() 14 >>> temp['Qflag'].unique() [nan, 'AV', 'EV'] >>> temp['Qlevel'].unique() [nan, 2., 3., 1.] >>> ds = NPCTRCatchments(timestep='5min') >>> temp_5m = ds.read_temp() >>> temp_5m.shape (8957388, 3) >>> temp_5m['Site'].nunique() 14 >>> temp_5m['Qlevel'].unique() [1, 2] >>> temp_5m['Qflags'].nunique() 5344
- read_wind_dir()[source]
>>> from aqua_fetch import NPCTRCatchments >>> ds = NPCTRCatchments() >>> winddir = ds.read_wind_dir() >>> winddir.shape (371651, 4) >>> winddir['Site'].nunique() 7 >>> winddir['Site'].unique() ['WSN626', 'SSN693', 'WSN693703', 'WSN703708', 'WSN8191015',
- ‘BuxtonEast’, ‘RefStn’]
… getting data for 5min timestep >>> ds = NPCTRCatchments(timestep=’5min’) >>> winddir = ds.read_wind_dir() >>> winddir.shape (5096864, 4) >>> winddir[‘Site’].nunique() 8 >>> winddir[‘Site’].unique() [‘WSN626’, ‘SSN693’, ‘WSN693703’, ‘WSN703708’, ‘WSN8191015’,
‘BuxtonEast’, ‘Hecate’, ‘RefStn’]
- read_wind_speed()[source]
Examples
>>> from aqua_fetch import NPCTRCatchments >>> ds = NPCTRCatchments() >>> ws = ds.read_wind_speed() >>> ws.shape (424744, 4) >>> ws['Site'].nunique() 8 >>> ws['Site'].unique() ['WSN626', 'SSN693', 'WSN693703', 'WSN703708', 'WSN8191015', 'BuxtonEast', 'Hecate', 'RefStn'] >>> ws.index[0], ws.index[-1] (Timestamp('2013-09-09 20:00:00'), Timestamp('2019-10-01 00:00:00')) ... getting data for 5min timestep >>> ds = NPCTRCatchments(timestep='5min') >>> ws = ds.read_wind_speed() >>> ws.shape (5096864, 4) >>> ws['Site'].nunique() 8
- property static_features: List[str]
Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_static_features().- Return type:
List[str]
- stations() List[str][source]
Names/ids of stations/catchment/gauges or whatever that would be used to index each station in the dataset.
- stn_coords(stations='all', sensor='SSN') DataFrame[source]
By default uses coordinate information of Stream Sensor Nodes, assuming that stream sensors would be closer to the stream gauge. The values are taken from Table A1 of paper
- class aqua_fetch.rr.Poland(path: str | PathLike = None, estreams_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
Bases:
_EStreamsData of 1287 catchments of Poland. The observed streamflow data is downloaded from https://danepubliczne.imgw.pl . The meteorological data, static catchment features and catchment boundaries are taken from
aqua_fetch.EStreamsfollwoing the works of Nascimento et al., 2024 . Therefore, the number of static features are 214 and dynamic features are 10 and the data is available from 1951-01-01 to 2023-06-30.Examples
>>> from aqua_fetch import Poland >>> dataset = Poland() >>> _, data = dataset.fetch(0.1) # the returned data will be a xarray Dataset >>> type(data) xarray.core.dataset.Dataset >>> data.dims FrozenMappingWarningOnValuesAccess({'time': 26844, 'dynamic_features': 10}) >>> len(data.data_vars) # number of stations for which data has been fetched 128 >>> _, data = dataset.fetch(stations=1) # get data of only one random station # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 1287 # get data by station id >>> _, data = dataset.fetch(stations='PL000001') # get names of available dynamic features >>> dataset.dynamic_features # get only selected dynamic features >>> _, data = dataset.fetch(1, ... dynamic_features=['pcp_mm', 'rh_%', 'airtemp_C_mean', 'pet_mm', 'q_cms_obs']) # get names of available static features >>> dataset.static_features # get data of 10 random stations >>> _, data = dataset.fetch(10) >>> len(data.data_vars) 10 # If we want to get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='PL000001', static_features="all") >>> static.shape, len(dynamic.data_vars) ((1, 214), 1) >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape (1287, 2) >>> dataset.stn_coords('PL000001') # returns coordinates of station whose id is PL000001 49.921848 18.327913 >>> dataset.stn_coords(['PL000001', 'PL000002']) # returns coordinates of two stations PL000001 49.921848 18.327913 PL000002 49.954769 18.326323
- __init__(path: str | PathLike = None, estreams_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str) – This can only be set for datasets which are available at multiple timesteps such as LamaHCE or LamaHIce etc.
to_netcdf (bool) – whether the data should be saved in netCDF format or not If set to true, the data will be saved in netCDF format which can take time for the first time it is created. However, it leads to faster I/O operations in subsequent accesses.
overwrite (bool) – whether to overwrite existing files or not. If set to True, the data will be redownloaded.
verbosity (int) –
This parameter determines the level of verbosity for logging messages.
0: no message will be printed
1: only important messages will be printed
>1: any higher value greater than 1 will result in more verbose output
kwargs – Any other keyword arguments for the parent
Datasetsclass
- property csv_files_dir: str
path where csv (obtained after extracting zip files) files will be stored
- class aqua_fetch.rr.Portugal(path: str | PathLike = None, estreams_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
Bases:
_EStreamsData of 280 catchments of Portugal. The observed streamflow data is downloaded from https://snirh.apambiente.pt . The meteorological data, static catchment features and catchment boundaries for the 280 catchments are taken from
aqua_fetch.EStreamsfollwoing the works of Nascimento et al., 2024 project. Therefore, the number of static features are 214 and dynamic features are 10 and the data is available from 1972-01-01 to 2022-12-31.Examples
>>> from aqua_fetch import Portugal >>> dataset = Portugal() >>> _, data = dataset.fetch(0.1) # the returned data will be a xarray Dataset >>> type(data) xarray.core.dataset.Dataset >>> data.dims FrozenMappingWarningOnValuesAccess({'time': 18628, 'dynamic_features': 10}) >>> len(data.data_vars) # number of stations for which data has been fetched 28 >>> _, data = dataset.fetch(stations=1) # get data of only one random station # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 280 # get data by station id >>> _, data = dataset.fetch(stations='PT000001') # get names of available dynamic features >>> dataset.dynamic_features # get only selected dynamic features >>> _, data = dataset.fetch(1, ... dynamic_features=['pcp_mm', 'rh_%', 'airtemp_C_mean', 'pet_mm', 'q_cms_obs']) # get names of available static features >>> dataset.static_features # get data of 10 random stations >>> _, data = dataset.fetch(10) >>> len(data.data_vars) 10 # If we want to get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='PT000001', static_features="all") >>> static.shape, len(dynamic.data_vars) ((1, 214), 1) >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape (280, 2) >>> dataset.stn_coords('PT000001') # returns coordinates of station whose id is PT000001 41.794998 -7.969 >>> dataset.stn_coords(['PT000001', 'PT000002']) # returns coordinates of two stations PT000001 41.794998 -7.969 PT000002 39.679001 -8.437
- __init__(path: str | PathLike = None, estreams_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str) – This can only be set for datasets which are available at multiple timesteps such as LamaHCE or LamaHIce etc.
to_netcdf (bool) – whether the data should be saved in netCDF format or not If set to true, the data will be saved in netCDF format which can take time for the first time it is created. However, it leads to faster I/O operations in subsequent accesses.
overwrite (bool) – whether to overwrite existing files or not. If set to True, the data will be redownloaded.
verbosity (int) –
This parameter determines the level of verbosity for logging messages.
0: no message will be printed
1: only important messages will be printed
>1: any higher value greater than 1 will result in more verbose output
kwargs – Any other keyword arguments for the parent
Datasetsclass
- property end: Timestamp
end of data
- fetch_q(as_dataframe: bool = True)[source]
returns the streamflow data of Portugal as xarray.Dataset or pandas.DataFrame
- Returns:
xarray.Dataset or pandas.DataFrame. If as_dataframe is True, returns pandas.DataFrame
with columns as station codes and index as time. If as_dataframe is False, returns
xarray.Dataset with station codes as variables and time as dimension.
- class aqua_fetch.RRLuleaSweden(path=None, **kwargs)[source]
Bases:
DatasetsRainfall runoff data for an urban catchment from 2016-2019 following the work of Broekhuizen et al., 2020 .
- __init__(path=None, **kwargs)[source]
- Parameters:
name – str (default=None) name of dataset
units – str, (default=None) the unit system being used
path – str (default=None) path where the data is available (manually downloaded). If None, it will be downloaded
processes – int number of processes to use for parallel processing
verbosity – int determines the amount of information to be printed
remove_zip – bool whether to remove the zip files after unz
- fetch(st: str | int | DatetimeIndex = None, en: str | int | DatetimeIndex = None)[source]
fetches rainfall runoff data
- Parameters:
st (optional) – start of data to be fetched. By default the data starts from 2016-06-16 20:50:00
en (optional) – end of data to be fetched. By default the end is 2019-09-15 18:41
- fetch_flow(st: str | int | DatetimeIndex = None, en: str | int | DatetimeIndex = None) DataFrame[source]
fetches flow data
- Parameters:
st (optional) – start of data to be fetched. By default the data starts from 2016-06-16 20:50:00
en (optional) – end of data to be fetched. By default the end is 2019-09-15 18:35:00
- Returns:
a dataframe of shape (37_618, 3) where the columns are velocity, level and flow rate
- Return type:
pd.DataFrame
Examples
>>> from aqua_fetch import RRLuleaSweden >>> dataset = RRLuleaSweden() >>> flow = dataset.fetch_flow() >>> flow.shape (37618, 3)
- fetch_pcp(st: str | int | DatetimeIndex = None, en: str | int | DatetimeIndex = None) DataFrame[source]
fetches precipitation data
- Parameters:
st (optional) – start of data to be fetched. By default the data starts from 2016-06-16 19:48:00
en (optional) – end of data to be fetched. By default the end is 2019-10-26 23:59:00
- Returns:
a dataframe of shape (967_080, 1)
- Return type:
pd.DataFrame
Examples
>>> from aqua_fetch import RRLuleaSweden >>> dataset = RRLuleaSweden() >>> pcp = dataset.fetch_pcp() >>> pcp.shape (967080, 1)
- class aqua_fetch.rr.ShyftNorway(*args, **kwargs)[source]
Bases:
_RainfallRunoffThe dataset contains observed streamflow data from 111 Norwegian catchments, as well as catchment boundaries and some catchment specific static data. For more information on this data see Silantyeva et al., 2025. Note that currently only streamflow data is included, other dynamic features may be added in future releases. Also note that observed streamflow data may slightly differ from the data from seriekart.nve.no since data at seriekart is updated regularly based upon updated rating curves.
Examples
>>> from aqua_fetch import ShyftNorway >>> dataset = ShyftNorway() ... # get data by station id >>> _, dynamic = dataset.fetch(stations='2.11.0', as_dataframe=True) >>> df = dynamic['2.11.0'] # dynamic is a dictionary of with keys as station names and values as DataFrames >>> df.shape (23376, 1) ... ... # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 111 ... # get data of 10 % of stations as dataframe >>> _, dynamic = dataset.fetch(0.1, as_dataframe=True) >>> len(dynamic) # dynamic has data for 10% of stations (11 out of 111) 11 ... ... # dynamic is a dictionary whose values are dataframes of dynamic features >>> [df.shape for df in dynamic.values()] [(23376, 1), (23376, 1), (23376, 1),... (23376, 1), (23376, 1)] ... ... get the data of a single (randomly selected) station >>> _, dynamic = dataset.fetch(stations=1, as_dataframe=True) >>> len(dynamic) # dynamic has data for 1 station 1 ... # get names of available dynamic features >>> dataset.dynamic_features ['observed_streamflow_cms'] ... # get names of available static features >>> dataset.static_features ... # get data of 10 random stations >>> _, dynamic = dataset.fetch(10, as_dataframe=True) >>> len(dynamic) # remember this is a dictionary with values as dataframe 10 ... # If we get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='2.11.0', static_features="all", as_dataframe=True) >>> static.shape, len(dynamic), dynamic['2.11.0'].shape ((1, 10), 1, (23376, 1)) ... # If we don't set as_dataframe=True and have xarray installed then the returned data will be a xarray Dataset >>> _, dynamic = dataset.fetch(10) ... type(dynamic) xarray.core.dataset.Dataset ... >>> dynamic.dims FrozenMappingWarningOnValuesAccess({'time': 23376, 'dynamic_features': 1}) ... >>> len(dynamic.data_vars) 10 # get area of a single station >>> dataset.area('2.11.0') # get coordinates of two stations >>> dataset.area(['2.11.0', '2.28.0']) ... >>> dataset.get_boundary('2.11.0')
- __init__(*args, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str) – This can only be set for datasets which are available at multiple timesteps such as LamaHCE or LamaHIce etc.
to_netcdf (bool) – whether the data should be saved in netCDF format or not If set to true, the data will be saved in netCDF format which can take time for the first time it is created. However, it leads to faster I/O operations in subsequent accesses.
overwrite (bool) – whether to overwrite existing files or not. If set to True, the data will be redownloaded.
verbosity (int) –
This parameter determines the level of verbosity for logging messages.
0: no message will be printed
1: only important messages will be printed
>1: any higher value greater than 1 will result in more verbose output
kwargs – Any other keyword arguments for the parent
Datasetsclass
- property boundary_id_map
Name of the attribute in the boundary (shapefile/.gpkg) file that will be used to map the catchment/station id to the geometry of the catchment/station. This is used to create the boundary id map. if not given, then the first attribute in the boundary file will be used.
- fetch_q(as_dataframe: bool = True)[source]
returns the streamflow data of Norway as xarray.Dataset or pandas.DataFrame
- Returns:
xarray.Dataset or pandas.DataFrame. If as_dataframe is True, returns pandas.DataFrame
with columns as station codes and index as time. If as_dataframe is False, returns
xarray.Dataset with station codes as variables and time as dimension.
- class aqua_fetch.rr.Simbi(path: str = None, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]
Bases:
_RainfallRunoffmonthly rainfall from 1905 - 2005, daily rainfall from 1920-1940, 70 daily streamflow series, and 23 monthly temperature series for 24 catchments of Haiti
Data is obtained from Bathelemy et al., 2023 while related publication is Bathelemy et al., 2024
Examples
>>> from aqua_fetch import Simbi >>> simbi = Simbi() >>> len(simbi.stations()) 24
- __init__(path: str = None, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]
- Parameters:
path – path where the Simbi dataset has been downloaded. This path must contain five zip files and one xlsx file. If None, then the data will be downloaded.
to_netcdf
- property boundary_id_map: str
Name of the attribute in the boundary (shapefile/.gpkg) file that will be used to map the catchment/station id to the geometry of the catchment/station. This is used to create the boundary id map. if not given, then the first attribute in the boundary file will be used.
- property dyn_map: Dict[str, str]
A dictionary that maps dynamic features to their names in the dataset.
- property dynamic_features
Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_dynamic_features().- Return type:
List[str]
- property end
end of data
- property static_features
Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_static_features().- Return type:
List[str]
- property static_map: Dict[str, str]
A dictionary that maps static features to their names in the dataset.
- class aqua_fetch.rr.Slovenia(path: str | PathLike = None, estreams_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
Bases:
_EStreamsData of 117 catchments of Slovenia. The observed streamflow data is downloaded from https://vode.arso.gov.si . The meteorological data, static catchment features and catchment boundaries for the 117 catchments are taken from
aqua_fetch.EStreamsfollwoing the works of Nascimento et al., 2024 project. Therefore, the number of static features are 214 and dynamic features are 10 and the data is available from 1950-01-01 to 2023-12-31 .Examples
>>> from aqua_fetch import Slovenia >>> dataset = Slovenia() >>> _, data = dataset.fetch(0.1) # the returned data will be a xarray Dataset >>> type(data) xarray.core.dataset.Dataset >>> data.dims FrozenMappingWarningOnValuesAccess({'time': 27028, 'dynamic_features': 10}) >>> len(data.data_vars) 10 >>> _, df = dataset.fetch(stations=1) # get data of only one random station # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 117 # get data by station id >>> _, data = dataset.fetch(stations='SI000090') # get names of available dynamic features >>> dataset.dynamic_features # get only selected dynamic features >>> _, data = dataset.fetch(1, ... dynamic_features=['pcp_mm', 'rh_%', 'airtemp_C_mean', 'pet_mm', 'q_cms_obs']) # get names of available static features >>> dataset.static_features # get data of 10 random stations >>> _, data = dataset.fetch(10) # If we want to get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='SI000090', static_features="all") >>> static.shape, len(dynamic.data_vars) ((1, 214), 1) >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape (117, 2) >>> dataset.stn_coords('SI000090') # returns coordinates of station whose id is SI000090 45.865093 15.460184 >>> dataset.stn_coords(['SI000090', 'SI000002']) # returns coordinates of two stations SI000090 45.865093 15.460184 SI000002 46.648823 16.059244
- __init__(path: str | PathLike = None, estreams_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str) – This can only be set for datasets which are available at multiple timesteps such as LamaHCE or LamaHIce etc.
to_netcdf (bool) – whether the data should be saved in netCDF format or not If set to true, the data will be saved in netCDF format which can take time for the first time it is created. However, it leads to faster I/O operations in subsequent accesses.
overwrite (bool) – whether to overwrite existing files or not. If set to True, the data will be redownloaded.
verbosity (int) –
This parameter determines the level of verbosity for logging messages.
0: no message will be printed
1: only important messages will be printed
>1: any higher value greater than 1 will result in more verbose output
kwargs – Any other keyword arguments for the parent
Datasetsclass
- property end: Timestamp
end of data
- fetch_q(as_dataframe: bool = True)[source]
returns the streamflow data of Portugal as xarray.Dataset or pandas.DataFrame
- Returns:
xarray.Dataset or pandas.DataFrame. If as_dataframe is True, returns pandas.DataFrame
with columns as station codes and index as time. If as_dataframe is False, returns
xarray.Dataset with station codes as variables and time as dimension.
- class aqua_fetch.rr.Spain(path: str | PathLike = None, gsha_path: str | PathLike = None, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]
Bases:
_GSHAData of 889 catchments of Spain from ceh-es website. The meteorological data static catchment features and catchment boundaries taken from GSHA project. Therefore, the number of static features are 35 and dynamic features are 27 and the data is available from 1979-01-01 to 2020-09-30.
- __init__(path: str | PathLike = None, gsha_path: str | PathLike = None, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str) – This can only be set for datasets which are available at multiple timesteps such as LamaHCE or LamaHIce etc.
to_netcdf (bool) – whether the data should be saved in netCDF format or not If set to true, the data will be saved in netCDF format which can take time for the first time it is created. However, it leads to faster I/O operations in subsequent accesses.
overwrite (bool) – whether to overwrite existing files or not. If set to True, the data will be redownloaded.
verbosity (int) –
This parameter determines the level of verbosity for logging messages.
0: no message will be printed
1: only important messages will be printed
>1: any higher value greater than 1 will result in more verbose output
kwargs – Any other keyword arguments for the parent
Datasetsclass
- daily_q_all_areas() DataFrame[source]
Daily data of gauging stations in river from all areas
Retuns
16_806_305 rows x 3
- daily_q_area(area: str) DataFrame[source]
Reads Daily data of gauging stations in river which is in afliq.csv file
- property end: Timestamp
end of data
- fetch_q(as_dataframe: bool = True)[source]
returns daily q of all stations
- Returns:
a
pandas.DataFrameof shape (39721, 1447)- Return type:
pd.DataFrame
- class aqua_fetch.Thailand(path: str | PathLike = None, gsha_path: str | PathLike = None, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]
Bases:
_GSHAData of 73 catchments of Thailand from RID project . The meteorological data static catchment features and catchment boundaries taken from GSHA project. Therefore, the number of static features are 35 and dynamic features are 27 and the data is available from 1980-01-01 to 1999-12-31.
- __init__(path: str | PathLike = None, gsha_path: str | PathLike = None, overwrite: bool = False, verbosity: int = 1, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str) – This can only be set for datasets which are available at multiple timesteps such as LamaHCE or LamaHIce etc.
to_netcdf (bool) – whether the data should be saved in netCDF format or not If set to true, the data will be saved in netCDF format which can take time for the first time it is created. However, it leads to faster I/O operations in subsequent accesses.
overwrite (bool) – whether to overwrite existing files or not. If set to True, the data will be redownloaded.
verbosity (int) –
This parameter determines the level of verbosity for logging messages.
0: no message will be printed
1: only important messages will be printed
>1: any higher value greater than 1 will result in more verbose output
kwargs – Any other keyword arguments for the parent
Datasetsclass
- property end: Timestamp
end of data
- class aqua_fetch.USGS(path: str | PathLike = None, hysets_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
Bases:
_RainfallRunoffThis class handles the hydrometeorological data for the USA. The daily and hourly discharge data is downloaded from usgs/nwis website . The data is optionally stored in a netCDF file if xarray is available. Currently the data is downloaded for only those sites/catchments that are in the HYSETS database. This is because the catchment boundaries are taken from HYSETS database using
aqua_fetch.HYSETS.For hourly timestep, “iv” service is used to download the instantaneous data which is then resampled to hourly data. Data with only
A, [92],A, [91],A, [93],A, e,Aflags is used. For daily streamflow, “dv” service is used to download the data. In this case, the data with onlyAandA, eflags is used.Examples
>>> from aqua_fetch import USGS >>> dataset = USGS() ... # get data by station id >>> _, dynamic = dataset.fetch(stations='01010000', as_dataframe=True) >>> df = dynamic['01010000'] # dynamic is a dictionary of with keys as station names and values as DataFrames >>> df.shape (27028, 20) ... ... # get name of all stations as list >>> stns = dataset.stations() >>> len(stns) 12004 ... # get data of 10 % of stations as dataframe >>> _, dynamic = dataset.fetch(0.1, as_dataframe=True) >>> len(dynamic) # dynamic has data for 10% of stations (1200 out of 12004) 1200 ... ... # dynamic is a dictionary whose values are dataframes of dynamic features >>> [df.shape for df in dynamic.values()] [(27028, 20), (27028, 20), (27028, 20),... (27028, 20), (27028, 20)] ... ... get the data of a single (randomly selected) station >>> _, dynamic = dataset.fetch(stations=1, as_dataframe=True) >>> len(dynamic) # dynamic has data for 1 station 1 ... # get names of available dynamic features >>> dataset.dynamic_features ... # get only selected dynamic features >>> _, dynamic = dataset.fetch('01010000', as_dataframe=True, ... dynamic_features=['pcp_mm', 'snowmelt_mm', 'airtemp_C_2m_min', 'swe_mm', 'q_cms_obs']) >>> dynamic['01010000'].shape (27028, 4) ... ... # get names of available static features >>> dataset.static_features ... # get data of 10 random stations >>> _, dynamic = dataset.fetch(10, as_dataframe=True) >>> len(dynamic) # remember this is a dictionary with values as dataframe 10 ... # If we get both static and dynamic data >>> static, dynamic = dataset.fetch(stations='01010000', static_features="all", as_dataframe=True) >>> static.shape, len(dynamic), dynamic['01010000'].shape ((1, 29), 1, (27028, 20)) ... # If we don't set as_dataframe=True and have xarray installed then the returned data will be a xarray Dataset >>> _, dynamic = dataset.fetch(10) ... type(dynamic) xarray.core.dataset.Dataset ... >>> dynamic.dims FrozenMappingWarningOnValuesAccess({'time': 27028, 'dynamic_features': 20}) ... >>> len(dynamic.data_vars) 10 ... >>> coords = dataset.stn_coords() # returns coordinates of all stations >>> coords.shape (671, 2) >>> dataset.stn_coords('01010000') # returns coordinates of station whose id is 01010000 -69.715556 46.700556 >>> dataset.stn_coords(['01010000', '01010070']) # returns coordinates of two stations ... # get area of a single station >>> dataset.area('01010000') # get coordinates of two stations >>> dataset.area(['01010000', '01010070']) ... # if fiona library is installed we can get the boundary as fiona Geometry >>> dataset.get_boundary('01010000')
- __init__(path: str | PathLike = None, hysets_path: str | PathLike = None, verbosity: int = 1, **kwargs)[source]
- Parameters:
path (str) – Path to store the data
- area(stations: str | List[str] = 'all') Series[source]
Returns area_gov (Km2) of all catchments as
pandas.Series- Parameters:
stations (str/list) – name/names of stations. Default is None, which will return area of all stations
- Returns:
a
pandas.Serieswhose indices are catchment ids and values are areas of corresponding catchments.- Return type:
pd.Series
Examples
>>> from aqua_fetch import USGS >>> dataset = USGS() >>> dataset.area() # returns area of all stations >>> dataset.area('912101A') # returns area of station whose id is 912101A >>> dataset.area(['912101A', '12388200']) # returns area of two stations
- property dynamic_features: List[str]
Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_dynamic_features().- Return type:
List[str]
- fetch_static_features(stations: str | List[str] = 'all', static_features: str | List[str] = 'all') DataFrame[source]
returns static atttributes of one or multiple stations
- Parameters:
stations (str) – name/id of station of which to extract the data
static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.
Examples
>>> from aqua_fetch import USGS >>> dataset = USGS() get the names of stations >>> stns = dataset.stations() >>> len(stns) 12004 get all static data of all stations >>> static_data = dataset.fetch_static_features(stns) >>> static_data.shape (12004, 27) get static data of one station only >>> static_data = dataset.fetch_static_features('01010070') >>> static_data.shape (1, 27) get the names of static features >>> dataset.static_features get only selected features of all stations >>> static_data = dataset.fetch_static_features(stns, ['area_km2', 'Elevation_m']) >>> static_data.shape (12004, 2)
- fetch_stations_features(stations: list, dynamic_features: str | list | None = 'all', static_features: str | list | None = None, st=None, en=None, as_dataframe: bool = False, **kwargs) Tuple[DataFrame, Dict[str, DataFrame] | Dataset][source]
returns features of multiple stations
Examples
>>> from aqua_fetch import USGS >>> dataset = USGS() >>> stations = dataset.stations()[0:3] >>> features = dataset.fetch_stations_features(stations)
- property static_features: List[str]
Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_static_features().- Return type:
List[str]
- property static_map: Dict[str, str]
A dictionary that maps static features to their names in the dataset.
- stations() List[str][source]
Names/ids of stations/catchment/gauges or whatever that would be used to index each station in the dataset.
- stn_coords(stations: str | List[str] = 'all') DataFrame[source]
returns coordinates of stations as DataFrame with
longandlatas columns.- Parameters:
stations – name/names of stations. If not given, coordinates of all stations will be returned.
- Returns:
pandas.DataFramewithlongandlatcolumns. The length of dataframe will be equal to number of stations wholse coordinates are to be fetched.- Return type:
pd.DataFrame
Examples
>>> dataset = USGS() >>> dataset.stn_coords() # returns coordinates of all stations >>> dataset.stn_coords('01010000') # returns coordinates of station whose id is 912101A >>> dataset.stn_coords(['01010000', '01010070']) # returns coordinates of two stations
- class aqua_fetch.rr.WaterBenchIowa(path=None, **kwargs)[source]
Bases:
_RainfallRunoffRainfall run-off dataset for Iowa (US) following the work of Demir et al., 2022 This is hourly dataset of 125 catchments with 7 static features and 3 dynamic features (pcp, et, discharge) for each catchment. The dynamic features are timeseries from 2011-10-01 12:00 to 2018-09-30 11:00.
**Note: ** Currently the coordinates and catchment boundary files are not available for this dataset.
Examples
>>> from aqua_fetch import WaterBenchIowa >>> ds = WaterBenchIowa() ... # fetch static and dynamic features of 5 stations >>> static, dynamic = ds.fetch(5, static_features='all', as_dataframe=True) >>> len(dynamic) # it is a dictionary with DataFrame 5 ... # keys of dynamic are station names and values are DataFrames >>> data = dynamic.popitem()[1] >>> data.shape (61344, 3) >>> static.shape (5, 7) ... ... # using another method >>> dynamic = ds.fetch_dynamic_features('644', as_dataframe=True) >>> dynamic['644'].shape (61344, 3) ... >>> static, dynamic = ds.fetch(stations='644', static_features="all", as_dataframe=True) >>> static.shape, dynamic['644'].shape >>> ((1, 7), (61344, 3))
- __init__(path=None, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str) – This can only be set for datasets which are available at multiple timesteps such as LamaHCE or LamaHIce etc.
to_netcdf (bool) – whether the data should be saved in netCDF format or not If set to true, the data will be saved in netCDF format which can take time for the first time it is created. However, it leads to faster I/O operations in subsequent accesses.
overwrite (bool) – whether to overwrite existing files or not. If set to True, the data will be redownloaded.
verbosity (int) –
This parameter determines the level of verbosity for logging messages.
0: no message will be printed
1: only important messages will be printed
>1: any higher value greater than 1 will result in more verbose output
kwargs – Any other keyword arguments for the parent
Datasetsclass
- property dyn_map
A dictionary that maps dynamic features to their names in the dataset.
- property end
end of data
- fetch_static_features(stations: str | List[str] = 'all', static_features: str | List[str] = 'all') DataFrame[source]
- Parameters:
stations (str) – name/id of station of which to extract the data
static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.
Examples
>>> from aqua_fetch import WaterBenchIowa >>> dataset = WaterBenchIowa() get the names of stations >>> stns = dataset.stations() >>> len(stns) 125 get all static data of all stations >>> static_data = dataset.fetch_static_features(stns) >>> static_data.shape (125, 7) get static data of one station only >>> static_data = dataset.fetch_static_features('592') >>> static_data.shape (1, 7) get the names of static features >>> dataset.static_features get only selected features of all stations >>> static_data = dataset.fetch_static_features(stns, ['slope', 'area_km2']) >>> static_data.shape (125, 2) >>> data = dataset.fetch_static_features('592', static_features=['slope', 'area_km2']) >>> data.shape (1, 2)
- property static_features: List[str]
Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_static_features().- Return type:
List[str]
The following datasets are very much similar to RainfallRunoff datasets, but they do not have observed streamflow data. They are used to provide static and dynamic features to other datasets.
- class aqua_fetch.GSHA(path=None, overwrite: bool = False, to_netcdf: bool = True, **kwargs)[source]
Bases:
_RainfallRunoffGlobal streamflow characteristics, hydrometeorology and catchment attributes following Peirong et al., 2023. The data is downloaded from its zenodo repository. It should be noted that this dataset does not contain observed streamflow data. It has 21568 stations, 26 dynamic (meteorological + storage) features with daily timestep, 21 dynamic features (landcover + streamflow indices + reservoir) with yearly timestep and 35 static features.
Examples
>>> from aqua_fetch import GSHA >>> dataset = GSHA() >>> len(dataset.stations()) 21568 >>> dataset.agencies ['arcticnet', 'AFD', 'GRDC', 'IWRIS', 'MLIT', 'HYDAT', 'ANA', 'BOM', 'CCRR', 'China', 'CHP', 'RID', 'USGS'] >>> dataset.start Timestamp('1979-01-01 00:00:00') >>> dataset.end Timestamp('2022-12-31 00:00:00') >>> dataset.static_features ['ele_mt_uav', 'slp_dg_uav', 'lat', 'long', 'area_km2', 'agency', ...] >>> len(dataset.dynamic_features) 26 >>> len(dataset.daily_dynamic_features) 26 >>> len(dataset.yearly_dynamic_features) 21 >>> dataset.fetch_static_features('1001_arcticnet') fetch static features for all stations of arcticnet agency >>> dataset.fetch_static_features(agency='arcticnet') fetch static features for all stations of arcticnet agency >>> ds.fetch_dynamic_features(agency='arcticnet')
- __init__(path=None, overwrite: bool = False, to_netcdf: bool = True, **kwargs)[source]
- Parameters:
to_netcdf (bool) – whether to convert all the data into one netcdf file or not. This will fasten repeated calls to fetch etc but will require netCDF4 package as well as xarry.
- property agencies: List[str]
returns the names of agencies as list
arcticnet: AntarcticaAFD: SpainGRDC: GlobalIWRIS: IndiaMLIT: JapanHYDAT: CanadaANA: BrazilBOM: AustraliaCCRR: ChileChinaCHP: ChinaRID: ThailandUSGS
- atlas(stations: List[str] = 'all', agency: List[str] = 'all') DataFrame[source]
The link table between GSHA watershed IDs and RiverATLAS river reach IDs, as well as the selected static attributes
- Returns:
a
pandas.DataFrameof shape (n, 24) where n is the number of stations- Return type:
pd.DataFrame
- property boundary_id_map: str
Name of the attribute in the boundary (shapefile/.gpkg) file that will be used to map the catchment/station id to the geometry of the catchment/station. This is used to create the boundary id map. if not given, then the first attribute in the boundary file will be used.
- property dyn_map
A dictionary that maps dynamic features to their names in the dataset.
- property dynamic_features: List[str]
Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_dynamic_features().- Return type:
List[str]
- property end: Timestamp
end of data
- fetch_dynamic_features(stations: List[str] | str = 'all', dynamic_features='all', st=None, en=None, as_dataframe=False, agency: List[str] = 'all') Dataset[source]
Fetches all or selected dynamic features of one station.
- Parameters:
stations (str) – name/id of station of which to extract the data
features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available dynamic features are returned.
st (Optional (default=None)) – start time from where to fetch the data.
en (Optional (default=None)) – end time untill where to fetch the data
as_dataframe (bool, optional (default=False)) – if true, the returned data is
pandas.DataFrameotherwise it is xarray dataset
Examples
>>> from aqua_fetch import GSHA >>> dataset = GSHA() >>> data = dataset.fetch_dynamic_features('1001_arcticnet', as_dataframe=True) >>> data.shape (16071, 26) >>> dataset.dynamic_features >>> stns = ['1001_arcticnet', '10062_arcticnet'] >>> data = dataset.fetch_dynamic_features(stns, ... dynamic_features=['airtemp_C_mean_era5', 'pcp_mm_mswep'])
- fetch_lai(stations: List[str] = 'all', agency: List[str] = 'all')[source]
Leaf Area Index timeseries for one or more than one station either as
xarray.Datasetorpandas.DataFrame. The data has daily timestep.
- fetch_meteo_vars(stations: List[str] = 'all', agency: List[str] = 'all')[source]
Meteorological variables from 1979-01-01 to 2022-12-31 for one or more than one station either as
xarray.Datasetor dictionary. The data has daily timestep.
- fetch_static_features(stations: str | List[str] = 'all', static_features: str | List[str] = 'all', agency: List[str] = 'all') DataFrame[source]
Returns static features of one or more stations.
- Parameters:
stations (str) – name/id of station/stations of which to extract the data
static_features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.
- Returns:
a
pandas.DataFrameof shape (stations, features)- Return type:
pd.DataFrame
Examples
>>> from aqua_fetch import GSHA >>> dataset = GSHA() get the names of stations >>> stns = dataset.stations() >>> len(stns) 21568 get all static data of all stations >>> static_data = dataset.fetch_static_features(stns) >>> static_data.shape (21568, 35) get static data of one station only >>> static_data = dataset.fetch_static_features('1001_arcticnet') >>> static_data.shape (1, 35) get the names of static features >>> dataset.static_features get only selected features of all stations >>> static_data = dataset.fetch_static_features(stns, ['ele_mt_uav', 'slp_dg_uav']) >>> static_data.shape (21568, 2) >>> data = dataset.fetch_static_features('1001_arcticnet', static_features=['slp_dg_uav', 'slp_dg_uav']) >>> data.shape (1, 2) >>> out = ds.fetch_static_features(agency='arcticnet') >>> out.shape (106, 35
- fetch_stn_dynamic_features(station: str, dynamic_features='all', st: str | Timestamp = None, en: str | Timestamp = None) DataFrame[source]
Fetches all or selected dynamic features of one station.
- Parameters:
station (str) – name/id of station of which to extract the data
features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available dynamic features are returned.
- Returns:
a
pandas.DataFrameof shape (n, features) where n is the number of days- Return type:
pd.DataFrame
Examples
>>> from aqua_fetch import GSHA >>> dataset = GSHA() >>> data = dataset.fetch_stn_dynamic_features('1001_arcticnet') >>> data.shape (16071, 26) >>> dataset.dynamic_features >>> data = dataset.fetch_stn_dynamic_features('1001_arcticnet', ... dynamic_features=['airtemp_C_mean_era5', 'pcp_mm_mswep']) >>> data.shape (16071, 2)
- fetch_storage_vars(stations: List[str] = 'all', agency: List[str] = 'all')[source]
Water storage term variables from 1979-01-01 to 2021-12-31 for one or more than one station either as
xarray.Datasetor dictionary. The data has daily timestep.
- lai_stn(stn: str) Series[source]
Daily leaf area index. As per documentation, due to satellite data quality, some watersheds might have relatively serious data missing issue. The data is from 1981-01-01 to 2020-12-31.
- Returns:
a
pandas.Seriesof shape (14571,) where 14571 is the number of days- Return type:
pd.Series
- lc_variables(stations: List[str] = 'all', agency: List[str] = 'all')[source]
Landcover variables for one or more than one station either as
xarray.Datasetor dictionary. The data has yearly timestep.
- lc_variables_stn(stn: str) DataFrame[source]
Landcover variables for a given station which have yearly timestep. Following three landcover variables are provided:
urban_fraction(%): Ratio of urban extent to the entire watershed area (percentage).
forest_fraction(%): Ratio of forest extent to the entire watershed area (percentage).
cropland_fraction(%): Ratio of cropland extent to the entire watershed area (percentage).
- Returns:
a
pandas.DataFrameof shape (n, 3) where n is the number of years- Return type:
pd.DataFrame
- meteo_vars_all_stns()[source]
Meteorological variables from 1979-01-01 to 2022-12-31 for all stations either as
xarray.Datasetor dictionary. The data has daily timestep.
- meteo_vars_stn(stn: str) DataFrame[source]
Daily meteorological variables from 1979-01-01 to 2022-12-31 for a given station.
- Returns:
a
pandas.DataFrameof shape (16071, 19) where n is the number of days- Return type:
pd.DataFrame
- reservoir_variables(stations: List[str] = 'all', agency: List[str] = 'all')[source]
Reservoir variables for one or more than one station either as
xarray.Datasetor dictionary. The data has yearly timestep.
- reservoir_variables_stn(stn: str) DataFrame[source]
Reservoir variables for a given station from 1979 to 2020 with yearly timestep. Following two reservoir variables are provided:
capacity: Reservoir capacity of the year in the watershed (m3). To avoid including too many missing values, we use the ICOLD capacity in the linked table of the GeoDAR dataset.dor: Degree of regulation of the watershed (yearly reservoir capacity/yearly mean flow). If yearly mean flow is missing, the value is substituted with the average of all mean flow values.
- Returns:
a
pandas.DataFrameof shape (42, 2) where 42 is the number of years- Return type:
pd.DataFrame
- property static_features: List[str]
Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_static_features().- Return type:
List[str]
- property static_map: Dict[str, str]
A dictionary that maps static features to their names in the dataset.
- stn_coords(stations: List[str] = 'all', agency: List[str] = 'all') DataFrame[source]
returns the latitude and longitude of stations
- Returns:
a
pandas.DataFrameof shape (n, 2) where n is the number of stations- Return type:
pd.DataFrame
Examples
>>> from aqua_fetch import GSHA >>> dataset = GSHA() >>> dataset.stn_coords('1001_arcticnet') >>> dataset.stn_coords(['1001_arcticnet', '1002_arcticnet']) get coordinates for all stations of arcticnet agency >>> dataset.stn_coords(agency='arcticnet')
- storage_vars_all_stns()[source]
Water storage term variables from 1979-01-01 to 2021-12-31 for all stations either as
xarray.Datasetor dictionary. The data has daily timestep.
- storage_vars_stn(stn: str) DataFrame[source]
Daily Water storage term variables from 1979-01-01 to 2021-12-31 for a given station.
SM_layer1: 0-7 cm soil moisture from ERA5 land soil water layer 1 (m3/m3) for 1979-2021.
SM_layer2: 7-28 cm soil moisture from ERA5 land soil water layer 2 (m3/m3) for 1979-2021.
SM_layer3: 28-100 cm soil moisture from ERA5 land soil water layer 3 (m3/m3) for 1979-2021.
SM_layer4: 100-289 cm soil moisture from ERA5 land soil water layer 4 (m3/m3) for 1979-2021.
SWDE: Snow water equivalent from ERA5 snow depth water equivalent (m of water equivalent) for 1979-2021.
groundwater(%): Groundwater percentage from GRACE-FO data assimilation (%) for 2003-2021 (weekly).
- Returns:
a
pandas.DataFrameof shape (15706, 6) where n is the number of days- Return type:
pd.DataFrame
- streamflow_indices(stations: List[str] = 'all', agency: List[str] = 'all')[source]
Landcover variables for one or more than one station either as
xarray.Datasetor dictionary. The data has yearly timestep.
- streamflow_indices_stn(stn: str) DataFrame[source]
Streamflow indices for a given station which have yearly timestep.
- Returns:
a
pandas.DataFrameof shape (n, 16) where n is the number of years- Return type:
pd.DataFrame
- uncertainty(stations: List[str] = 'all', agency: List[str] = 'all') DataFrame[source]
Uncertainty estimates of all meteorological variables over all watersheds
P_uncertainty (%) Precipitation uncertainty estimates (in percentage). Uncertainties are calculated from EM-Earth deterministic and MSWEP datasets.
T_uncertainty (%) Temperature uncertainty estimates (in percentage). Uncertainties are calculated from EUSTACE, MERRA-2, and ERA5 datasets.
EVP_uncertainty (%) Actual evapotranspiration uncertainty estimates (in percentage). Uncertainties are calculated from GLEAM and REA datasets.
LRAD_uncertainty (%) Downward longwave radiation uncertainty estimates (in percentage). Uncertainties are calculated from MERRA-2 and ERA5-land datasets.
SRAD_uncertainty (%) Downward shortwave radiation uncertainty estimates (in percentage). Uncertainties are calculated from MERRA-2 and ERA5-land datasets.
wind_uncertainty (%) Wind speed uncertainty estimates (in percentage). The u- and v- components are aggregated on each grid to obtain wind speed. Uncertainties are calculated from MERRA-2 and ERA5-land datasets.
pet_uncertainty (%) Potential evapotranspiration uncertainty estimates (in percentage). Uncertainties are calculated from GLEAM and REA datasets.
- Returns:
a
pandas.DataFrameof shape (n, 7) where n is the number of stations- Return type:
pd.DataFrame
- class aqua_fetch.EStreams(path=None, **kwargs)[source]
Bases:
_RainfallRunoffHandles EStreams data following the work of Nascimento et al., 2024 . The data is available at its zenodo repository . It should be noted that this dataset does not contain observed streamflow data. It has 17130 stations, 9 dynamic (meteorological) features with daily timestep, 27 dynamic features with yearly timestep and 214 static features. The dynamic features are from 1950-01-01 to 2023-06-30.
Examples
>>> from aqua_fetch import EStreams >>> dataset = EStreams()
- __init__(path=None, **kwargs)[source]
- Parameters:
path (str) – if provided and the directory exists, then the data will be read from this directory. If provided and the directory does not exist, then the data will be downloaded in this directory. If not provided, then the data will be downloaded in the default directory.
timestep (str) – This can only be set for datasets which are available at multiple timesteps such as LamaHCE or LamaHIce etc.
to_netcdf (bool) – whether the data should be saved in netCDF format or not If set to true, the data will be saved in netCDF format which can take time for the first time it is created. However, it leads to faster I/O operations in subsequent accesses.
overwrite (bool) – whether to overwrite existing files or not. If set to True, the data will be redownloaded.
verbosity (int) –
This parameter determines the level of verbosity for logging messages.
0: no message will be printed
1: only important messages will be printed
>1: any higher value greater than 1 will result in more verbose output
kwargs – Any other keyword arguments for the parent
Datasetsclass
- area(stations: List[str] = 'all', countries: List[str] = 'all') Series[source]
area of catchments im km2
- property dyn_map
A dictionary that maps dynamic features to their names in the dataset.
- property dynamic_features: List[str]
Returns a list of dynamic features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of dynamic features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_dynamic_features().- Return type:
List[str]
- property end: Timestamp
end of data
- fetch_dynamic_features(stations: List[str] | str = 'all', dynamic_features='all', st=None, en=None, as_dataframe=False, countries: str | List[str] = 'all')[source]
Fetches all or selected dynamic features of one station.
- Parameters:
stations (str) – name/id of station of which to extract the data
features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available dynamic features are returned.
st (Optional (default=None)) – start time from where to fetch the data.
en (Optional (default=None)) – end time untill where to fetch the data
as_dataframe (bool, optional (default=False)) – if true, the returned data is
pandas.DataFrameotherwise it isxarray.Dataset
Examples
>>> from aqua_fetch import EStreams >>> camels = EStreams() >>> camels.fetch_dynamic_features('IEEP0281', as_dataframe=True) >>> camels.dynamic_features >>> camels.fetch_dynamic_features('IEEP0281', ... features=['p_mean', 't_mean', 'pet_mean'], ... as_dataframe=True)
- fetch_stn_dynamic_features(station: str, dynamic_features='all', st: str | Timestamp = None, en: str | Timestamp = None) DataFrame[source]
Fetches all or selected dynamic features of one station.
- Parameters:
station (str) – name/id of station of which to extract the data
features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available dynamic features are returned.
- Returns:
a
pandas.DataFrameof shape (n, features) where n is the number of days- Return type:
pd.DataFrame
Examples
>>> from aqua_fetch import EStreams >>> camels = EStreams() >>> camels.fetch_stn_dynamic_features('IEEP0281') >>> camels.dynamic_features >>> camels.fetch_stn_dynamic_features('IEEP0281', ... features=['p_mean', 't_mean', 'pet_mean'])
- hydro_clim_sigs(stations: List[str] = 'all', countries: List[str] = 'all') DataFrame[source]
Returns the hydro-climatic signatures of one or more stations
- Returns:
a
pandas.DataFrameof hydro-climatic signatures of shape (stations, 31)- Return type:
pd.DataFrame
- meteo_data(stations: str | List[str] = 'all', countries: List[str] | str = 'all')[source]
Returns the meteorological data of one or more stations either as dictionary of dataframes or xarray Dataset
- meteo_data_station(station: str) DataFrame[source]
Returns the meteorological data of a single station.
- Parameters:
station (str) – name/id of station of which to extract the data
- Returns:
a
pandas.DataFrameof meteorological data of shape (time, 9)- Return type:
pd.DataFrame
- property static_features
Returns a list of static features that are available in the dataset. Since this is a method is called multiple times, it is better to cache the result and return the cached result instead of reading the data again and again or the user implementing this method in the child class in a more efficient way.
- Returns:
a list of static features that are available in the dataset. The names of the features are the same as the names used in the dataset. The names can be used to fetch the data using
fetch_static_features().- Return type:
List[str]
- property static_map: Dict[str, str]
A dictionary that maps static features to their names in the dataset.
- stations() List[str][source]
Returns a list of all station names. Note that the basin_id column is used as the station name.
- stn_coords(stations: List[str] = 'all', countries: List[str] = 'all') DataFrame[source]
Returns the coordinates of one or more stations
- Returns:
a
pandas.DataFrameof shape (stations, 2)- Return type:
pd.DataFrame
Examples
>>> from aqua_fetch import EStreams >>> dataset = EStreams() >>> dataset.stn_coords('IEEP0281') >>> dataset.stn_coords(['IEEP0281', 'IEEP0282']) >>> dataset.stn_coords(countries='IE')