Water Quality

The wq submodule contains datasets that represent surface water chemistry at various locations worldwide. Currently, it includes 16 water quality datasets, but we anticipate this number will increase in the future. The spatial and temporal coverage of these datasets are detailed in following table.

List of datasets

Summary of datasets
Dataset	Class / Function Name	Variables Covered	Temporal Coverage	Spatial Coverage	Reference
Busan Beach	`aqua_fetch.busan_beach`	14	2018 - 2019	Busan, S.Korea	Jang et al
Buzzards Bay	`aqua_fetch.BuzzardsBay`	16	1992 - 2018	Buzzards Bay (USA)	Jakuba et al.,
CamelsChem	`aqua_fetch.CamelsChem`	28	1980 - 2018	Continental USA	Sterle et al., 2024
CamelsCHChem	`aqua_fetch.CamelsCHChem`	40	1980 - 2020	Swtizerland	Nascimento et al., 2025
Surface Water Chemistry	`aqua_fetch.SWatCh`	24	1960 - 2022	Global	Lobke et al., 2022
Global River Water Quality Archive	`aqua_fetch.GRQA`	42	1898 - 2020	Global	Virro et al., 2021
water QUAlity, DIscharge and Catchment Attributes	`aqua_fetch.Quadica`	10	1950 - 2018	Germany	Ebeling et al., 2022
river chemistry for US coasts	`aqua_fetch.RC4USCoast`	21	1850 - 2020	USA	Gomez et al., 2022
Ecoli Mekong River	`aqua_fetch.ecoli_mekong`	10	2011 - 2021	Mekong river (Houay Pano)	Boithias et al., 2022
Ecoli Mekong River (Laos)	`aqua_fetch.ecoli_mekong_laos`	10	2011 - 2021	Mekong River (Laos)	Boithias et al., 2022
Ecoli Houay Pano (Laos)	`aqua_fetch.ecoli_houay_pano`	10	2011 - 2021	Houay Pano (Laos)	Boithias et al., 2022
Global River Methane	`aqua_fetch.GRiMeDB`	1	1973 - 2021	Global	Stanley et al., 2024
Oligotrend	`aqua_fetch.Oligotrend`	17	1986 - 2022	Global	Minaudo et al., 2025
Sylt Roads	`aqua_fetch.SyltRoads`	15	1973 - 2019	Red Sea (Arctic)	Rick et al., 2023
San Francisco Bay	`aqua_fetch.SanFranciscoBay`	18	1969 - 2015	San Francisco (USA)	Schraga et al., 2017
Selune River, France	`aqua_fetch.SeluneRiver`	5	2021 - 2022	Selune River, (France)	Moustapha Ba et al., 2023
Siberian Rivers Chemistry	`aqua_fetch.RiverChemSiberia`	30	1991–2012	Siberian Rivers, (Russia)	Moustapha Ba et al., 2023
White Clay Creek	`aqua_fetch.WhiteClayCreek`	2	1973 - 2019	White Clay Creek (USA)	Newbold and Damiano 2013

Functions and Classes

class aqua_fetch.BuzzardsBay(path=None, **kwargs)[source]

Bases: Datasets

Water quality measurements in Buzzards Bay from 1992 - 2018. For more details on data see Jakuba et al., data is downloaded from MBLWHOI Library

Examples

>>> from aqua_fetch import BuzzardsBay
>>> ds = BuzzardsBay()
>>> doc = ds.doc()
>>> doc.shape
(11092, 4)
>>> chla = ds.chla()
>>> chla.shape
(1028, 10)

__init__(path=None, **kwargs)[source]

Parameters:

name – str (default=None) name of dataset
units – str, (default=None) the unit system being used
path – str (default=None) path where the data is available (manually downloaded). If None, it will be downloaded
processes – int number of processes to use for parallel processing
verbosity – int determines the amount of information to be printed
remove_zip – bool whether to remove the zip files after unz

fetch(parameters: str | List[str] = 'all') → DataFrame[source]: Fetch data for the specified parameters.

class aqua_fetch.CamelsChem(path=None, **kwargs)[source]

Bases: Datasets

Water Quality data from USA following the works of Sterle et al., 2024 . This dataset has 18 water chemistry parameters from 1980-01-01 - 2018-12-31. The data is is downloaded from hydroshare Out of 671 stations, 155 stations have no water quality data. The wet deposition data consist of 12 parameters from 1985 - 2018.

Examples

>>> from aqua_fetch import CamelsChem
>>> dataset = CamelsChem(path='/path/to/dataset')
>>> stns = dataset.stations()
>>> len(stns)
671
>>> stns[0:10]
['1591400', '6350000', ... '11274500', '7295000']
>>> len(dataset.parameters)
28
>>> dataset.parameters
['cl_mg/l', 'na_mg/l', ... 'doc_mg/l']
... get longitude and latitude of stations
>>> coords = dataset.stn_coords()
>>> coords.shape
(115, 2)
...
>>> data = dataset.fetch_atm_dep()  # get atmospheric deposition data for all catchments
>>> type(data)  # the returned data is a dictionary with catchments names as keys
dict
...
>>> len(data)
671
...
>>> data = dataset.fetch_atm_dep(stations='1591400', parameters='cl')
>>> data['1591400'].shape
(34, 8)
...
>>> data = dataset.fetch_atm_dep(stations=['1591400', '6350000'], parameters=['cl', 'na'])
>>> data['1591400'].shape
(34, 16)
>>> data['6350000'].shape
(34, 16)

__init__(path=None, **kwargs)[source]

Parameters:

name – str (default=None) name of dataset
units – str, (default=None) the unit system being used
path – str (default=None) path where the data is available (manually downloaded). If None, it will be downloaded
processes – int number of processes to use for parallel processing
verbosity – int determines the amount of information to be printed
remove_zip – bool whether to remove the zip files after unz

atm_dep_data() → DataFrame[source]: reads the atmospheric deposition data

atm_dep_metadata() → DataFrame[source]: reads the atm_dep_metadata

property atm_dep_parameters: List[str]: returns the names of parameters in the atm_dep dataset

fetch(stations: str | List[str] = 'all', parameters: str | List[str] = 'all') → Dict[str, DataFrame][source]

fetches the data for the given stations and parameters

Parameters:

stations (Union[str, List[str]]) – list of stations to fetch data for
parameters (Union[str, List[str]]) – list of parameters to fetch data for

Returns:

dictionary of dataframes for each station

Return type:

Dict[str, pd.DataFrame]

Examples

>>> ds = CamelsChem(path='/path/to/data')
>>> data = ds.fetch(stations=['1591400', '6350000'], parameters=['cl_mg/l', 'na_mg/l'])
>>> data = ds.fetch('1591400', 'cl_mg/l')['1591400']
>>> data.shape # (55, 1)
... get all parameters for a station
>>> data = ds.fetch('1591400')['1591400']
>>> data.shape # (55, 28)
>>> all_data = ds.fetch()  # get all parameters of all stations
>>> len(all_data) # 516

fetch_atm_dep(stations: str | List[str] = 'all', parameters: str | List[str] = 'all') → Dict[str, DataFrame][source]

fetches the data for the given stations and parameters

Parameters:

stations (Union[str, List[str]]) – list of stations to fetch data for
parameters (Union[str, List[str]]) – list of parameters to fetch data for

Returns:

dictionary of dataframes for each station

Return type:

Dict[str, pd.DataFrame]

Examples

>>> ds = CamelsChem(path='/mnt/datawaha/hyex/atr/data')
... get data for a single station and a single parameter
>>> data = ds.fetch_atm_dep(stations='1591400', parameters='cl')
>>> print(data['1591400'].shape)  # (34, 8)
... get data for multiple stations and multiple parameters
>>> data = ds.fetch_atm_dep(stations=['1591400', '6350000'], parameters=['cl', 'na'])
>>> print(data['1591400'].shape)  # (34, 16)
>>> print(data['6350000'].shape)  # (34, 16)
.. get data for all stations and for all parameters
>>> data = ds.fetch_atm_dep()
>>> print(len(data))  # 671

gauge_and_region_names() → DataFrame[source]: reads the gauge and region names

metrics()[source]: reads metrics.xlsx which contains metadata

property parameters: List[str]: returns the names of parameters in the dataset

stations() → List[str][source]: returns the list of stations in the dataset

stn_coords() → DataFrame[source]

Returns the coordinates of all the stations in the dataset in wgs84 projection.

Returns:: A dataframe with columns ‘lat’, ‘long’
Return type:: pd.DataFrame

topography() → DataFrame[source]: reads the topography data

class aqua_fetch.CamelsCHChem(path=None, **kwargs)[source]

Bases: Datasets

Data of over 40 water quality parameters from 115 Swiss catchments following the work of Nascimento et al., 2025 The dataset is downloaded from zenodo . The water quality parameters are available as (discontinuous) timeseries from 1980-01-01 - 2020-12-31.

Examples

>>> from aqua_fetch import CamelsCHChem
>>> dataset = CamelsCHChem(path='/path/to/data')
>>> stns = dataset.stations()
>>> len(stns)
115
... find out names of stations
>>> stns[0:10]
['2009', '2011', '2016', '2018', ... '2044']
... get longitude and latitude of stations
>>> coords = dataset.stn_coords()
>>> coords.shape
(115, 2)
... get catchment-averaged parameters for catchment with the name/id 2009
>>> data = dataset.fetch_catch_avg('2009')
>>> type(data)    # the return data is a dictionary with catchment name as key
dict
>>> len(data)
1
>>> data.keys()
'2009'
>>> data['2009'].shape
(209, 32)
... get data for three catchments
>>> data = dataset.fetch_catch_avg(['2009', '2011', '2018'])
>>> data.keys()
dict_keys(['2009', '2011', '2018'])
>>> [val.shape for val in data.values()]
[(209, 32), (209, 32), (209, 32)]
>>> data['2009'].columns.tolist()
['cereal', 'maize', 'sugarbeet', ... 'gve_ha', 'delta_2h']
... find out start and end dates
>>> data['2009'].index[0], data['2009'].index[-1]
(Timestamp('1970-01-01'), Timestamp('2020-12-15'))
...
... get water quality time series
>>> data = dataset.fetch_wq_ts(stations=['2009', '2011'])
>>> data['2009'].shape
(14610, 4)
>>> data['2011'].shape
(14610, 4)
>>> data['2011'].columns
Index(['temp_sensor', 'pH_sensor', 'ec_sensor', 'O2C_sensor'], dtype='object')
>>> data = dataset.fetch_wq_ts()
>>> len(data)
115
>>> data['2009'].index[0], data['2009'].index[-1]
(Timestamp('1981-01-01 00:00:00'), Timestamp('2020-12-31 00:00:00'))
...
# get isotope data
>>> data = dataset.fetch_isotope(stations=['2009', '2016'])
>>> data['2009'].shape
(452, 4)
>>> data['2016'].shape
(450, 4)
>>> data['2009'].columns
Index(['date_start', 'date_end', 'delta_2h', 'delta_18o'], dtype='object')

__init__(path=None, **kwargs)[source]

Parameters:

name – str (default=None) name of dataset
units – str, (default=None) the unit system being used
path – str (default=None) path where the data is available (manually downloaded). If None, it will be downloaded
processes – int number of processes to use for parallel processing
verbosity – int determines the amount of information to be printed
remove_zip – bool whether to remove the zip files after unz

property catch_avg_data_path: PathLike: returns the path to the catchment average data

dyn_map() → Dict[str, str][source]: returns a dictionary mapping parameter names to their units

fetch(stations: str | List[str] = 'all', parameters: str | List[str] = 'all') → Dict[str, DataFrame][source]

fetches the data for the given stations and parameters

Parameters:

stations (Union[str, List[str]]) – list of stations to fetch data for
parameters (Union[str, List[str]]) – list of parameters to fetch data for

Returns:

dictionary of dataframes for each station

Return type:

Dict[str, pd.DataFrame]

Examples

>>> ds = CamelsCHChem(path='/path/to/data')
>>> data = ds.fetch(stations=['2009', '2011'], parameters='swisscrops')
>>> print(data['2009'].shape)  # (209, 32)
>>> print(data['2011'].shape)  # (209, 32)

fetch_catch_avg(stations: str | List[str] = 'all') → Dict[str, DataFrame][source]

fetches the catchment average data for the given stations. This covers agricultural, atmospheric deposition, landcover, livestock and rainwater isotopes data for each catchment. The agricultural and atmospheric deposition (1990-2020), landcover and livestock data is yearly but rain water isotope data has discontinuous timesteps.

Parameters:: stations (Union[str, List[str]]) – list of stations to fetch data for
Returns:: dictionary of dataframes for each station
Return type:: Dict[str, pd.DataFrame]

Examples

>>> ds = CamelsCHChem(path='/path/to/data')
>>> data = ds.fetch_catch_avg(stations=['2009', '2011'])
>>> print(data['2009'].shape)  # (209, 32)
>>> print(data['2011'].shape)  # (209, 32)

fetch_wq_ts(stations: str | List[str] = 'all', timestep: str = 'D') → Dict[str, DataFrame][source]

fetches the water quality time series data for the given station(s) at daily (D) or hourly (H) timestep. This data consists of water temperature, pH, electrical conductivity and O2C parameters for the given station(s).

Parameters:

stn (Union[str, List[str]]) – station or list of stations to fetch data for
timestep (str) – the timestep of the data, default is ‘D’ for daily data. Other option is H for hourly.

Returns:

dictionary of dataframes for each station

Return type:

Dict[str, pd.DataFrame]

Examples

>>> ds = CamelsCHChem(path='/path/to/data')
>>> data = ds.fetch_wq_ts('2009')['2009']
>>> print(data.shape)  # (14610, 4)

property gauge_md_file: PathLike: returns the gauge metadata file

metadata() → DataFrame[source]

reads the metadata file

Return type:: pd.DataFrame

stations() → List[str][source]: returns the list of stations in the dataset

stn_coords() → DataFrame[source]

Returns the coordinates of all the stations in the dataset in wgs84 projection.

Returns:: A dataframe with columns ‘lat’, ‘long’
Return type:: pd.DataFrame

class aqua_fetch.GRiMeDB(path=None, **kwargs)[source]

Bases: Datasets

Global river database of methan concentrations and fluxes from 5029 stations of 305 rivers following Stanley et al., 2023

Examples

>>> from aqua_fetch import GRiMeDB
>>> ds = GRiMeDB(path='/path/to/dataset')
>>> ds.stations()
>>> ds.streams
>>> ds.stn_coords()
>>> ds.shape
5029, 2
>>> conc = ds.concentrations(streams=['Indus River'])
>>> conc.shape
(2, 59)
>>> conc = ds.concentrations(parameters=['Q', 'NO3', 'NH4', 'TN', 'SRP', 'TP', 'DOC'])
>>> conc.shape
(25052, 7)
>>> fluxes = ds.fluxes()
>>> fluxes.shape
(7298, 52)
>>> fluxes['Site_ID'].nunique()
1903
>>> sites = ds.sites()
>>> sites['Site_ID'].nunique()
5029
>>> sites['Stream_Name'].nunique()
2722

__init__(path=None, **kwargs)[source]

Parameters:

name – str (default=None) name of dataset
units – str, (default=None) the unit system being used
path – str (default=None) path where the data is available (manually downloaded). If None, it will be downloaded
processes – int number of processes to use for parallel processing
verbosity – int determines the amount of information to be printed
remove_zip – bool whether to remove the zip files after unz

concentrations(stations: str | List[str] = 'all', streams: str | List[str] = 'all', parameters: str | List[str] = 'all')[source]

Get concentrations data.

Parameters:

stations (Union[str, List[str]], optional) – station ID or list of station IDs, by default “all”. If given, then streams must not be given. Check .stations() method for available stations.
streams (Union[str, List[str]], optional) – stream name or list of stream names, by default “all”. If given, then stations must not be given. Check .streams attribute for available streams.
parameters (Union[str, List[str]], optional) – parameters to return, by default “all”. Check .parameters attribute for available parameters.

fluxes(stations: str | List[str] = 'all') → DataFrame[source]: returns fluxes data as a pandas.DataFrame

stn_coords() → DataFrame[source]

Returns the coordinates of all the stations in the dataset in wgs84 projection.

Returns:: A dataframe with columns ‘lat’, ‘long’
Return type:: pd.DataFrame

property streams: List[str]: returns names of streams

class aqua_fetch.GRQA(download_source: bool = False, path=None, **kwargs)[source]

Bases: Datasets

Global River Water Quality Archive following the work of Virro et al., 2021 . This dataset comprises of 42 parameters for 94955 sites across 116 countries.

Examples

>>> from aqua_fetch import GRQA
>>> ds = GRQA(path="/mnt/datawaha/hyex/atr/data")
>>> ds.parameters
['TPP', 'PON', 'TEMP', 'TSS', ...]
>>> print(len(ds.parameters))
42
>>> len(ds.countries)
116
>>> len(ds.stations())
94955
>>> len(ds.parameters)
>>> coords = ds.stn_coords()
>>> coords.shape
(94955, 2)
>>> country = "Pakistan"
>>> len(ds.fetch_parameter('TEMP', country=country))
1324
>>> df = ds.fetch_parameter("TEMP", country=country)
>>> print(df.shape)
(1324, 38)
>>> df = ds.fetch_parameter("NH4N", country=country)
>>> print(df.shape)
(28, 36)

__init__(download_source: bool = False, path=None, **kwargs)[source]

Parameters:: download_source (bool) – whether to download source data or not

Parameters:

parameter (str, optional) – name of parameter
site_name (str/list, optional) – location for which data is to be fetched.
country (str/list optional (default=None))
st (str) – starting date date or index
en (str) – end date or index

Returns:

a pandas.DataFrame

Return type:

pd.DataFrame

Example

>>> from aqua_fetch import GRQA
>>> dataset = GRQA()
>>> df = dataset.fetch_parameter()
fetch data for only one country
>>> cod_pak = dataset.fetch_parameter("COD", country="Pakistan")
fetch data for only one site
>>> cod_kotri = dataset.fetch_parameter("COD", site_name="Indus River - at Kotri")
we can find out the number of data points and sites available for a specific country as below
>>> for para in dataset.parameters:
>>>     data = dataset.fetch_parameter(para, country="Germany")
>>>     if len(data)>0:
>>>         print(f"{para}, {df.shape}, {len(df['site_name'].unique())}")

sites_data() → DataFrame[source]: Returns the meta data for the dataset

stations() → List[str][source]: Returns names of stations/site_id

stn_coords()[source]

Returns the coordinates of all the stations in the dataset

Returns:: A dataframe with columns ‘lat’, ‘long’
Return type:: pd.DataFrame

class aqua_fetch.Oligotrend(path=None, **kwargs)[source]

Bases: Datasets

A global database of multi-decadal (1986-2023) timeseries of chlorophyll-a and 16 others including N and P, from 1846 unique monitoring locations across estuaries (n=238), lakes (n=687), and rivers (969). The datasets consists of 4.3 million observations and most timeseries cover the period 1986-2022 and comprise at least 15 years of Chl-a observations. For more details, see Minaudo et al., 2025 <https://doi.org/10.5194/essd-17-3411-2025>_. The data is fetched from EDI data portal.

Examples

>>> from aqua_fetch import Oligotrend
>>> ds = Oligotrend(path='/path/to/data')
get names of parameters in the dataset
>>> ds.parameters()
>>> len(ds.parameters())
17
get list of stations in the dataset
>>> ds.stations()
>>> len(ds.stations())
1846
>>> len(ds.lakes())
685
>>> len(ds.rivers())
924
>>> len(ds.estuaries())
237
get parameters of a single station
>>> data = ds.fetch_stn_parameters('lake_atlanticoceanseaboard_usa12721')
>>> data.shape
(303, 3)
get all parameters for specific stations
>>> data = ds.fetch_stns_parameters(['river_ebro_9027', 'river_elbe_elbe_10'])
>>> data['river_ebro_9027'].shape
(287, 8)
>>> data['river_elbe_elbe_10'].shape
(8154, 12)
Get only 'chla' parameter for the stations
>>> data1 = ds.fetch_stns_parameters(['river_ebro_9027', 'river_elbe_elbe_10'],
...                                 parameters=['chla'])
>>> data1['river_ebro_9027'].shape
(177, 1)
>>> data1['river_elbe_elbe_10'].shape
(413, 1)

__init__(path=None, **kwargs)[source]

Parameters:

name – str (default=None) name of dataset
units – str, (default=None) the unit system being used
path – str (default=None) path where the data is available (manually downloaded). If None, it will be downloaded
processes – int number of processes to use for parallel processing
verbosity – int determines the amount of information to be printed
remove_zip – bool whether to remove the zip files after unz

estuaries() → List[str][source]: Returns the list of stations which are estuaries in the dataset.

fetch_stn_parameters(stn: str, parameters: str | List[str] = 'all')[source]

Examples

>>> stn_df = ds.fetch_stn_parameters('lake_atlanticoceanseaboard_usa12721')
>>> stn_df.shape
(303, 3)

fetch_stns_parameters(stns: str | List[str], parameters: str | List[str] = 'all') → Dict[str, DataFrame][source]

Fetches the parameters for the given stations.

Parameters:

stns (str or list of str) – The station(s) to fetch the parameters for.
parameters (str or list of str, optional) – The parameter(s) to fetch. If ‘all’, all parameters are fetched.

Returns:

A dictionary with the station id as key and a dataframe of parameters as value.

Return type:

Dict[str, pd.DataFrame]

Examples

>>> data = ds.fetch_stns_parameters(['river_ebro_9027', 'river_elbe_elbe_10'])
>>> data['river_ebro_9027'].shape
(287, 8)
>>> data['river_elbe_elbe_10'].shape
(8154, 12)
>>> data = ds.fetch_stns_parameters(['river_ebro_9027', 'river_elbe_elbe_10'], 'chla')
>>> data['river_ebro_9027'].shape
(177, 1)
>>> data['river_elbe_elbe_10'].shape
(413, 1)

get_stations(parameter: str, ecosystm: str = 'river') → Series[source]

Returns a list of stations that have the specified parameter.

Examples

>>>> chla_stns = ds.get_stations(‘chla’) >>>> len(chla_stns) 969

gis_data() → DataFrame[source]: Returns the GIS data of the dataset.

l1_data() → DataFrame[source]: Returns the oligotrend_L1.csv file and returns as dataframe of shape 5056630, 7.

lakes() → List[str][source]: Returns the list of stations which are lakes in the dataset.

parameters() → List[str][source]: Returns the list of names of parameters in the dataset.

rivers() → List[str][source]: Returns the list of stations which are rivers in the dataset.

sources()[source]: Returns the sources of the dataset.

stations() → List[str][source]: returns the list of stations in the dataset

stn_coords() → DataFrame[source]

Returns the coordinates of all the stations in the dataset in wgs84 projection.

Returns:: A dataframe with columns ‘lat’, ‘long’
Return type:: pd.DataFrame

class aqua_fetch.Quadica(path=None, **kwargs)[source]

Bases: Datasets

This is dataset of 10 water quality parameters of Germany from 1386 stations from 1950 to 2018 at monthly timestep following the work of Ebeling et al., 2022 . The time-step is monthly and annual but the monthly timeseries data is not continuous. Following are the parameters available in this dataset:

Q : Discharge

NO3 : Nitrate

NO3N : Nitrate-N

NMin : Nitrogen mineralization

TN : Total Nitrogen

PO4 : Phosphate

PO4P : Phosphate-P

TP : Total Phosphorus

DOC : Dissolved Organic Carbon

TOC : Total Organic Carbon

Examples

>>> from aqua_fetch import Quadica
>>> dataset = Quadica()
>>> len(ds.stations())
1386
>>> coords = ds.stn_coords()
>>> coords.shape
(1386, 2)
>>> df = dataset.wrtds_monthly()
>>> df.shape
(50186, 47)
>>> df = dataset.wrtds_annual()
>>> df.shape
(4213, 46)
>>> df = dataset.pet()
>>> df.shape
(828, 1386)
>>> df = dataset.avg_temp()
>>> df.shape
(828, 1388)
>>> df = dataset.precipitation()
>>> df.shape
(828, 1388)
>>> df = dataset.catchment_attributes()
>>> df.shape
(1386, 112)
>>> df = dataset.metadata()
>>> df.shape
(1386, 60)
>>> df = dataset.monthly_medians()
>>> df.shape
(16629, 18)
>>> df = dataset.annual_medians()
>>> df.shape
(24393, 18)
>>> df = dataset.fetch_monthly()
>>> df[0].shape
(50186, 47)

__init__(path=None, **kwargs)[source]

Parameters:

name – str (default=None) name of dataset
units – str, (default=None) the unit system being used
path – str (default=None) path where the data is available (manually downloaded). If None, it will be downloaded
processes – int number of processes to use for parallel processing
verbosity – int determines the amount of information to be printed
remove_zip – bool whether to remove the zip files after unz

annual_medians() → DataFrame[source]

Annual medians over the whole time series of water quality variables and discharge

Returns:: a dataframe of shape (24393, 18)
Return type:: pd.DataFrame

monthly median average temperatures starting from 1950-01 to 2018-09

Parameters:

stations – name of stations for which data is to be retrieved. By default, data for all stations is retrieved.
st (optional) – starting point of data. By default, the data starts from 1950-01
en (optional) – end point of data. By default, the data ends at 2018-09

Returns:

a pandas.DataFrame of shape (time_steps, stations). With default input arguments, the shape is (828, 1386)

Return type:

pd.DataFrame

Examples

>>> from aqua_fetch import Quadica
>>> dataset = Quadica()
>>> df = dataset.avg_temp() # -> (828, 1388)

catchment_attributes(parameters: List[str] | str = None, stations: List[int] | int = None) → DataFrame[source]

Returns static physical catchment attributes in the form of dataframe.

Parameters:

parameters (list/str, optional, (default=None)) – name/names of static attributes to fetch
stations (list/int, optional (default=None)) – name/names of stations whose static/physical parameters are to be read

Returns:

a pandas.DataFrame of shape (stations, parameters). With default input arguments, shape is (1386, 113)

Return type:

pd.DataFrame

Examples

>>> from aqua_fetch import Quadica
>>> dataset = Quadica()
>>> cat_features = dataset.catchment_attributes()
... # get attributes of only selected stations
>>> dataset.catchment_attributes(stations=[1,2,3])

fetch_monthly(parameters: List[str] | str = None, stations: List[int] | int = 'all', median: bool = True, fnc: bool = True, fluxes: bool = True, precipitation: bool = True, avg_temp: bool = True, pet: bool = True, only_continuous: bool = True, cat_features: bool = True, max_nan_tol: int | None = 0) → Tuple[DataFrame, DataFrame][source]

Fetches monthly concentrations of water quality parameters.

Parameters:

parameters (str/list, optional (default=None)) –
name or names of water quality parameters to fetch. By default following parameters are considered
- NO3
- NO3N
- TN
- Nmin
- PO4
- PO4P
- TP
- DOC
- TOC
stations (int/list, optional (default=None)) – name or names of stations whose data is to be fetched
median (bool, optional (default=True)) – whether to fetch median concentration values or not
fnc (bool, optional (default=True)) – whether to fetch flow normalized concentrations or not
fluxes (bool, optional (default=True)) – Setting this to true will add two parameters i.e. mean_Flux_FEATURE and mean_FNFlux_FEATURE
precipitation (bool, optional (default=True)) – whether to fetch average monthly precipitation or not
avg_temp (bool, optional (default=True)) – whether to fetch average monthly temperature or not
pet (bool, optional (default=True)) – whether to fether potential evapotranspiration data or not
only_continuous (bool, optional (default=True)) – If true, will return data for only those stations who have continuos monthly timeseries data from 1993-01-01 to 2013-01-01.
cat_features (bool, optional (default=True)) – whether to fetch catchment parameters or not.
max_nan_tol (int, optional (default=0)) – setting this value to 0 will remove the whole time-series with any missing values. If None, no time-series with NaNs values will be removed.

Returns:

two dataframes whose length is same but the columns are different

a pandas.DataFrame of timeseries of parameters (stations*timesteps, dynamic_features)
a pandas.DataFrame of static parameters (stations*timesteps, catchment_features)

Return type:

tuple

Examples

>>> from aqua_fetch import Quadica
>>> dataset = Quadica()
>>> mon_dyn, mon_cat = dataset.fetch_monthly(max_nan_tol=None)
... # However, mon_dyn contains data for all parameters and many of which have
... # large number of nans. If we want to fetch data only related to TN without any
... # missing value, we can do as below
>>> mon_dyn_tn, mon_cat_tn = dataset.fetch_monthly(parameters="TN", max_nan_tol=0)
... # if we want to find out how many catchments are included in mon_dyn_tn
>>> len(mon_dyn_tn['OBJECTID'].unique())
... # 25

metadata() → DataFrame[source]

fetches the metadata about the stations as pandas’ dataframe. Each row represents metadata about one station and each column represents one feature. The R2 and pbias are regression coefficients and percent bias of WRTDS models for each parameter.

Returns:: a dataframe of shape (1386, 60)
Return type:: pd.DataFrame

monthly_medians(parameters: List[str] | str = None, stations: List[int] | int = None) → DataFrame[source]

This function reads the c_months.csv file which contains the monthly medians over the whole time series of water quality variables and discharge

Parameters:

parameters (list/str, optional, (default=None)) – name/names of parameters
stations (list/int, optional (default=None)) – stations for which

Returns:

a dataframe of shape (16629, 18). 15 of the 18 columns represent a water chemistry parameter. 16629 comes from 1386*12 where 1386 is stations and 12 is months.

Return type:

pd.DataFrame

property parameters: list: names of water quality parameters available in this dataset

average monthly potential evapotranspiration starting from 1950-01 to 2018-09

Returns:: a dataframe of shape (828, 1386), where 828 is the number of months from 1950-01 to 2018-09 and 1386 is the number of stations
Return type:: pd.DataFrame

Examples

>>> from aqua_fetch import Quadica
>>> dataset = Quadica()
>>> df = dataset.pet() # -> (828, 1386)

sums of precipitation starting from 1950-01 to 2018-09

Parameters:

stations – name of stations for which data is to be retrieved. By default, data for all stations is retrieved.
st (optional) – starting point of data. By default, the data starts from 1950-01
en (optional) – end point of data. By default, the data ends at 2018-09

Returns:

a dataframe of shape (828, 1388)

Return type:

pd.DataFrame

Examples

>>> from aqua_fetch import Quadica
>>> dataset = Quadica()
>>> df = dataset.precipitation() # -> (828, 1388)

property station_names: List[str]: names of stations

stations() → list[source]: IDs of stations for which data is available

stn_coords() → DataFrame[source]

Returns the coordinates of all the stations in the dataset in wgs84 projection.

Returns:: A dataframe with columns ‘lat’, ‘long’
Return type:: pd.DataFrame

to_DataSet(target: str = 'TP', input_features: list = None, split: str = 'temporal', lookback: int = 24, **ds_args)[source]

This function prepares data for machine learning prediction problem. It returns an instance of ai4water.preprocessing.DataSetPipeline which can be given to model.fit or model.predict

Parameters:

target (str, optional (default="TN")) – parameter to consider as target
input_features (list, optional) – names of input parameters
split (str, optional (default="temporal")) – if temporal, validation and test sets are taken from the data of each station and then concatenated. If spatial, training validation and test is decided based upon stations.
lookback (int)
**ds_args – key word arguments

Returns:

an instance of DataSetPipeline

Return type:

ai4water.preprocessing.DataSet

Example

>>> from aqua_fetch import Quadica
... # initialize the Quadica class
>>> dataset = Quadica()
... # define the input parameters
>>> inputs = ['median_Q', 'OBJECTID', 'avg_temp', 'precip', 'pet']
... # prepare data for TN as target
>>> dsp = dataset.to_DataSet("TN", inputs, lookback=24)

Annual median concentrations, flow-normalized concentrations, and mean fluxes estimated using Weighted Regressions on Time, Discharge, and Season (WRTDS) for stations with enough data availability.

Parameters:

parameters (optional)
st (optional) – starting point of data. By default, the data starts from 1992
en (optional) – end point of data. By default, the data ends at 2013

Returns:

a dataframe of shape (4213, 46)

Return type:

pd.DataFrame

Examples

>>> from aqua_fetch import Quadica
>>> dataset = Quadica()
>>> df = dataset.wrtds_annual()

Monthly median concentrations, flow-normalized concentrations and mean fluxes of water chemistry parameters. These are estimated using Weighted Regressions on Time, Discharge, and Season (WRTDS) for stations with enough data availability. This data is available for total 140 stations. The data from all stations does not start and end at the same period. Therefore, some stations have more datapoints while others have less. The maximum datapoints for a station are 576 while smallest datapoints are 244.

Parameters:

parameters (str/list, optional)
stations (int/list optional (default=None)) – name/names of satations whose data is to be retrieved.
st (optional) – starting point of data. By default, the data starts from 1992-09
en (optional) – end point of data. By default, the data ends at 2013-12

Returns:

a dataframe of shape (50186, 47)

Return type:

pd.DataFrame

Examples

>>> from aqua_fetch import Quadica
>>> dataset = Quadica()
>>> df = dataset.wrtds_monthly()

class aqua_fetch.RC4USCoast(path=None, *args, **kwargs)[source]

Bases: Datasets

Monthly river water chemistry (N, P, SIO2, DO, … etc), discharge and temperature of 140 monitoring sites of US coasts from 1950 to 2020 following the work of Gomez et al., 2022.

Examples

>>> from aqua_fetch import RC4USCoast
>>> dataset = RC4USCoast()
>>> len(dataset.stations)
140
>>> len(dataset.parameters)
27
>>> stn_coords = dataset.stn_coords()
>>> stn_coords.shape
(140, 2)

__init__(path=None, *args, **kwargs)[source]

Parameters:: path – path where the data is already downloaded. If None, the data will be downloaded into the disk.

fetch_chem(parameter, stations: List[int] | int | str = 'all', as_dataframe: bool = False, st: int | str | DatetimeIndex = None, en: int | str | DatetimeIndex = None)[source]

Returns water chemistry parameters from one or more stations.

Parameters:

parameter (list, str) – name/names of parameters to fetch
stations (list, str) – name/names of stations from which the parameters are to be fetched
as_dataframe (bool (default=False)) – whether to return data as pandas.DataFrame or xarray.Dataset
st – start time of data to be fetched. The default starting date is 19500101
en – end time of data to be fetched. The default end date is 20201201

Return type:

pd.DataFrame or xarray Dataset

Examples

>>> from aqua_fetch import RC4USCoast
>>> ds = RC4USCoast()
>>> data = ds.fetch_chem(['temp', 'do'])
>>> data
>>> data = ds.fetch_chem(['temp', 'do'], as_dataframe=True)
>>> data.shape  # this is a multi-indexed dataframe
(119280, 4)
>>> data = ds.fetch_chem(['temp', 'do'], st="19800101", en="20181230")

returns discharge data

Parameters:

stations – stations for which q is to be fetched
as_dataframe (bool (default=True)) – whether to return the data as pd.DataFrame or as xarray.Dataset
nv (int (default=0))
st – start time of data to be fetched. The default starting date is 19500101
en – end time of data to be fetched. The default end date is 20201201

Examples

>>> from aqua_fetch import RC4USCoast
>>> ds = RC4USCoast()
# get data of all stations as DataFrame
>>> q = ds.fetch_q("all")
>>> q.shape
(852, 140)  # where 140 is the number of stations
# get data of only two stations
>>> q = ds.fetch_q([1,10])
>>> q.shape
(852, 2)
# get data as xarray Dataset
>>> q = ds.fetch_q("all", as_dataframe=False)
>>> type(q)
xarray.core.dataset.Dataset
# getting data between specific periods
>>> data = ds.fetch_q("all", st="20000101", en="20181230")

property parameters: List[str]

returns names of parameters

Examples

>>> from aqua_fetch import RC4USCoast
>>> ds = RC4USCoast()
>>> len(ds.parameters)
27

property stations: List[str]

Examples

>>> from aqua_fetch import RC4USCoast
>>> ds = RC4USCoast(path=r'F:\data\RC4USCoast')
>>> len(ds.stations)
140

stn_coords() → DataFrame[source]

Returns the coordinates of all the stations in the dataset in wgs84 projection.

Returns:: A dataframe with columns ‘lat’, ‘long’
Return type:: pd.DataFrame

class aqua_fetch.RiverChemSiberia(path=None, **kwargs)[source]

Bases: Datasets

A database of water chemistry in eastern Siberian rivers following Liu et al., 2022 . The dataset consists of meteorological data, water chemistry data, and shapefiles of 7 basins in eastern Siberia. The data is collected from 1991 to 2012. The dataset is available at figshare . Following parameters are available in the dataset:

La

Lo

Ca2+

Mg2+

K+

Na+

Cl-

SO42-

HCO3-

TDS

pH

River

Basin

Subbasin

Tannual

Tmonthly

Pannual

Pmonthly

Lithology

Permafrost type

IB

Discharge

Ori_ID

Li

Sr

As

Ba

Si

87Sr/86Sr

¦Ä18O-H2O

¦Ä2H-H2O

Examples

>>> from aqua_fetch import RiverChemSiberia
>>> ds = RiverChemSiberia()
>>> ds.stations()
['Selenga-Baikal', 'Angara', 'Lena', 'Eastern-Siberia', 'Kolyma', 'Yana', 'Indigirka']
>>> len(ds.parameters)
34

__init__(path=None, **kwargs)[source]

Parameters:

name – str (default=None) name of dataset
units – str, (default=None) the unit system being used
path – str (default=None) path where the data is available (manually downloaded). If None, it will be downloaded
processes – int number of processes to use for parallel processing
verbosity – int determines the amount of information to be printed
remove_zip – bool whether to remove the zip files after unz

boundary() → DataFrame[source]: Returns the boundary data of the water chemistry in eastern Siberian rivers.

database() → DataFrame[source]: Returns the database of the water chemistry in eastern Siberian rivers.

property parameters: List[str]: Returns the parameters available in the dataset.

stations() → List[str][source]: Returns the names of (7) stations available in the dataset.

stn_coords() → DataFrame[source]: Returns the coordinates of the stations.

class aqua_fetch.SyltRoads(path=None, **kwargs)[source]

Bases: Datasets

Dataset of physico-hydro-chemical time series data at Sylt Roads from 1973 - 2019 following Rick et al., 2023 . Following parameters are available

location

Depth water [m]

Sal

Temp [°C]

[PO4]3- [µmol/l]

[NH4]+ [µmol/l]

[NO2]- [µmol/l]

[NO3]- [µmol/l]

Si(OH)4 [µmol/l]

SPM [mg/l]

pH

O2 [µmol/l]

Chl a [µg/l]

DON [µmol/l]

DOP [µmol/l]

DIN [µmol/l]

Examples

>>> from aqua_fetch import SyltRoads
>>> ds = SyltRoads()

__init__(path=None, **kwargs)[source]

Parameters:

name – str (default=None) name of dataset
units – str, (default=None) the unit system being used
path – str (default=None) path where the data is available (manually downloaded). If None, it will be downloaded
processes – int number of processes to use for parallel processing
verbosity – int determines the amount of information to be printed
remove_zip – bool whether to remove the zip files after unz

fetch(parameters: str | List[str] = 'all') → DataFrame[source]

Fetch the data from the dataset

Parameters:: parameters (str or List[str], optional) – Parameters to fetch. Default is None which will fetch all parameters
Returns:: DataFrame containing the data
Return type:: pd.DataFrame

Examples

>>> from aqua_fetch import SyltRoads
>>> ds = SyltRoads()
>>> df = ds.fetch()
>>> df.shape
(5710, 16)
>>> len(ds.parameters)
16
>>> ds.fetch(['Sal', 'Temp [°C]', 'pH']).shape
(5710, 3)

property parameters: List[str]: returns names of parameters in the dataset

stn_coords() → DataFrame[source]

Returns the coordinates of all the stations in the dataset in wgs84 projection.

Returns:: A dataframe with columns ‘lat’, ‘long’
Return type:: pd.DataFrame

class aqua_fetch.SanFranciscoBay(path=None, **kwargs)[source]

Bases: Datasets

Time series of water quality parameters from 59 stations in San-Francisco from 1969 - 2015. For details on data see Cloern et al.., 2017 and Schraga et al., 2017. Following parameters are available:

Depth

Discrete_Chlorophyll

Ratio_DiscreteChlorophyll_Pheopigment

Calculated_Chlorophyll

Discrete_Oxygen

Calculated_Oxygen

Oxygen_Percent_Saturation

Discrete_SPM

Calculated_SPM

Extinction_Coefficient

Salinity

Temperature

Sigma_t

Nitrite

Nitrate_Nitrite

Ammonium

Phosphate

Silicate

Examples

>>> from aqua_fetch import SanFranciscoBay
>>> ds = SanFranciscoBay()
>>> data = ds.data()
>>> data.shape
(212472, 19)
>>> stations = ds.stations()
>>> len(stations)
59
>>> parameters = ds.parameters()
>>> len(parameters)
18
... # fetch data for station 18
>>> stn18 = ds.fetch(stations='18')
>>> stn18.shape
(13944, 18)

__init__(path=None, **kwargs)[source]

Parameters:

name – str (default=None) name of dataset
units – str, (default=None) the unit system being used
path – str (default=None) path where the data is available (manually downloaded). If None, it will be downloaded
processes – int number of processes to use for parallel processing
verbosity – int determines the amount of information to be printed
remove_zip – bool whether to remove the zip files after unz

fetch(stations: str | List[str] = 'all', parameters: str | List[str] = 'all') → DataFrame[source]

Parameters:: parameters (Union[str, List[str]], optional) – The parameters to return. The default is ‘all’.
Returns:: DESCRIPTION.
Return type:: pd.DataFrame

stn_data(stations: str | List[str] = 'all') → DataFrame[source]: Get station metadata.

class aqua_fetch.SeluneRiver(path=None, **kwargs)[source]

Bases: Datasets

Dataset of physico-chemical variables measured at different levels, for a 2021 and 2022 for characterization of Hyporheic zone of Selune River, Manche, Normandie, France following Moustapha Ba et al., 2023 . The data is available at data.gouv.fr . The following variables are available:

water level

temperature

conductivity

oxygen

pressure

__init__(path=None, **kwargs)[source]

Parameters:

name – str (default=None) name of dataset
units – str, (default=None) the unit system being used
path – str (default=None) path where the data is available (manually downloaded). If None, it will be downloaded
processes – int number of processes to use for parallel processing
verbosity – int determines the amount of information to be printed
remove_zip – bool whether to remove the zip files after unz

data() → DataFrame[source]: Return a DataFrame of the data

class aqua_fetch.SWatCh(remove_csv_after_download: bool = False, path: str | PathLike = None, **kwargs)[source]

Bases: Datasets

The Surface Water Chemistry (SWatCh) database of 27 variables from 26322 locations as introduced in Lobke et al., 2022 . It should be noted not all the variables are available for all the locations. Following are the variables available in the dataset:

Total Phosphorus, mixed forms

Sulfate

pH

Temperature, water

Chloride

Magnesium

Calcium

Sodium

Potassium

Aluminum

Nitrate

Nitrite

Fluoride

Hardness, carbonate

Iron

Ammonium

Organic carbon

Bicarbonate

Orthophosphate

Gran acid neutralizing capacity

Alkalinity, total

Inorganic carbon

Carbonate

Alkalinity, carbonate

Hardness, non-carbonate

Carbon Dioxide, free CO2

Alkalinity, Phenolphthalein (total hydroxide+1/2 carbonate)

Examples

>>> from aqua_fetch import SWatCh
>>> ds = SWatCh()
>>> df = ds.fetch()
>>> df.shape
(3901296, 6)
>>> len(ds.parameters)
22
>>> len(ds.sites)
26322
>>> coords = ds.stn_coords()
>>> coords.shape
(26322, 2)

__init__(remove_csv_after_download: bool = False, path: str | PathLike = None, **kwargs)[source]

Parameters:: remove_csv_after_download (bool (default=False)) – if True, the csv will be removed after downloading and processing.

fetch(parameters: list | str = None, station_id: list | str = None, station_names: list | str = None) → DataFrame[source]

Parameters:

parameters (str/list (default=None)) – Names of parameters to fetch. By default, name, value, val_unit, location, lat, and long are read.
station_id (str/list (default=None)) – name/names of station id for which the data is to be fetched. By default, the data for all stations is fetched. If given, then station_names should not be given.
station_names (str/list (default=None)) – name/names of station id for which the data is to be fetched. By default, the data for all stations is fetched. If given, then station_id should not be given.

Return type:

pd.DataFrame

Examples

>>> from aqua_fetch import SWatCh
>>> ds = SWatCh()
>>> df = ds.fetch()
>>> df.shape
(3901296, 6)
>>> st_name = "Jordan Lake"
>>> df = df[df['location'] == st_name]
>>> df.shape
(4, 6)

property names: dict: tells the names of parameters in this class and their original names in SWatCh dataset in the form of a python dictionary

num_samples(parameter, station_id=None) → int[source]

Parameters:

parameter (str) – name of the water quality parameter whose samples are to be quantified.
station_id – if given, samples of parameter will be returned for only this site/sites otherwise for all sites

property parameters: list: list of water quality parameters available

property site_names: list: list of site names

property sites: list: list of site names

stn_coords()[source]

Returns the coordinates of all the stations in the dataset and ‘loc_id’ as index.

Returns:: A dataframe with columns ‘lat’, ‘long’
Return type:: pd.DataFrame

class aqua_fetch.WhiteClayCreek(path=None, **kwargs)[source]

Bases: Datasets

Time series of water quality parameters from White Clay Creek.

chl-a : 2001 - 2012

Dissolved Organic Carbon : 1977 - 2017

__init__(path=None, **kwargs)[source]

Parameters:

name – str (default=None) name of dataset
units – str, (default=None) the unit system being used
path – str (default=None) path where the data is available (manually downloaded). If None, it will be downloaded
processes – int number of processes to use for parallel processing
verbosity – int determines the amount of information to be printed
remove_zip – bool whether to remove the zip files after unz

chla() → DataFrame[source]: Chlorophyll-a data

doc() → DataFrame[source]: Dissolved Organic Carbon data

aqua_fetch.busan_beach(inputs: list = None, target: list | str = 'tetx_coppml') → DataFrame[source]

Loads the Antibiotic resitance genes (ARG) data from a recreational beach in Busan, South Korea along with environment variables.

The data is in the form of mutlivariate time series and was collected over the period of 2 years during several precipitation events. The frequency of environmental data is 30 mins while that of ARG is discontinuous. The data and its pre-processing is described in detail in Jang et al., 2021

Parameters:

inputs –
features to use as input. By default all environmental data is used which consists of following parameters
- tide_cm
- wat_temp_c
- sal_psu
- air_temp_c
- pcp_mm
- pcp3_mm
- pcp6_mm
- pcp12_mm
- wind_dir_deg
- wind_speed_mps
- air_p_hpa
- mslp_hpa
- rel_hum
target –
feature/features to use as target/output. By default tetx_coppml is used as target. Logically one or more from following can be considered as target
- ecoli
- 16s
- inti1
- Total_args
- tetx_coppml
- sul1_coppml
- blaTEM_coppml
- aac_coppml
- Total_otus
- otu_5575
- otu_273
- otu_94

Returns:

a pandas.DataFrame with inputs and target and indexed with pandas.DateTimeIndex

Return type:

pd.DataFrame

Examples

>>> from aqua_fetch import busan_beach
>>> dataframe = busan_beach()
>>> dataframe.shape
(1446, 14)
>>> dataframe = busan_beach(target=['tetx_coppml', 'sul1_coppml'])
>>> dataframe.shape
(1446, 15)

See usage here for more details.

E. coli data from Mekong river (Houay Pano) area from 2011 to 2021 Boithias et al., 2022 .

Parameters:

st (optional) – starting time. The default starting point is 2011-05-25 10:00:00
en (optional) – end time, The default end point is 2021-05-25 15:41:00
parameters (str, optional) –
names of features to use. use all to get all features. By default following input features are selected
- station_name name of station/catchment where the observation was made
- T temperature
- EC electrical conductance
- DOpercent dissolved oxygen concentration
- DO dissolved oxygen saturation
- pH pH
- ORP oxidation-reduction potential
- Turbidity turbidity
- TSS total suspended sediment concentration
- E-coli_4dilutions Eschrechia coli concentration
overwrite (bool) – whether to overwrite the downloaded file or not

Returns:

with default parameters, the shape is (1602, 10)

Return type:

pd.DataFrame

Examples

>>> from aqua_fetch import ecoli_mekong
>>> ecoli_data = ecoli_mekong()
>>> ecoli_data.shape
(1602, 10)

coli data from Mekong river (Northern Laos).

Parameters:

st – starting time
en – end time
station_name (str)
parameters (str, optional)
overwrite (bool) – whether to overwrite or not

Returns:

with default parameters, the shape is (1131, 10)

Return type:

pd.DataFrame

Examples

>>> from aqua_fetch import ecoli_mekong_laos
>>> ecoli = ecoli_mekong_laos()
>>> ecoli.shape
(1131, 10)

coli data from Mekong river (Houay Pano) area.

Parameters:

st (optional) – starting time. The default starting point is 2011-05-25 10:00:00
en (optional) – end time, The default end point is 2021-05-25 15:41:00
parameters (str, optional) –
names of features to use. use all to get all features. By default following input features are selected

station_name name of station/catchment where the observation was made T temperature EC electrical conductance DOpercent dissolved oxygen concentration DO dissolved oxygen saturation pH pH ORP oxidation-reduction potential Turbidity turbidity TSS total suspended sediment concentration E-coli_4dilutions Eschrechia coli concentration
overwrite (bool) – whether to overwrite the downloaded file or not

Returns:

with default parameters, the shape is (413, 10)

Return type:

pd.DataFrame

Examples

>>> from aqua_fetch import ecoli_houay_pano
>>> ecoli = ecoli_houay_pano()
>>> ecoli.shape
(413, 10)

coli data from Mekong river from 2016 from 29 catchments

Parameters:

st – starting time
en – end time
parameters (str, optional) – names of parameters to use. use all to get all features.
overwrite (bool) – whether to overwrite the downloaded file or not

Returns:

with default parameters, the shape is (58, 10)

Return type:

pd.DataFrame

Examples

>>> from aqua_fetch import ecoli_mekong_2016
>>> ecoli = ecoli_mekong_2016()
>>> ecoli.shape
(58, 10)