Miscellaneous
This section contains the documentation for the miscellaneous datasets available in the package.
- aqua_fetch.gw_punjab(data_type: str = 'full', country: str = None) DataFrame[source]
groundwater level (meters below ground level) dataset from Punjab region (Pakistan and north-west India) following the study of MacAllister et al., 2022.
- Parameters:
data_type (str (default="full")) – either
fullorLTS. Thefullcontains the full dataset, there are 68783 rows of observed groundwater level data from 4028 individual sites. InLTSthere are 7547 rows of groundwater level observations from 130 individual sites, which have water level data available for a period of more than 40 years and from which at least two thirds of the annual observations are available.country (str (default=None)) – the country for which data to retrieve. Either
PAKorIND.
- Returns:
a
pandas.DataFramewith datetime index- Return type:
pd.DataFrame
Examples
>>> from aqua_fetch import gw_punjab >>> full_data = gw_punjab() find out the earliest observation >>> print(full_data.sort_index().head(1)) >>> lts_data = gw_punjab() >>> lts_data.shape (68782, 4) >>> df_pak = gw_punjab(country="PAK") >>> df_pak.sort_index().dropna().head(1)
- class aqua_fetch.Weisssee(path=None, overwrite=False, **kwargs)[source]
Bases:
Datasets- __init__(path=None, overwrite=False, **kwargs)[source]
- Parameters:
name – str (default=None) name of dataset
units – str, (default=None) the unit system being used
path – str (default=None) path where the data is available (manually downloaded). If None, it will be downloaded
processes – int number of processes to use for parallel processing
verbosity – int determines the amount of information to be printed
remove_zip – bool whether to remove the zip files after unz
- class aqua_fetch.WeatherJena(path=None, obs_loc='roof')[source]
Bases:
Datasets10 minute weather dataset of Jena, Germany hosted at https://www.bgc-jena.mpg.de/wetter/index.html from 2002 onwards.
>>> from aqua_fetch import WeatherJena >>> dataset = WeatherJena() >>> data = dataset.fetch() >>> data.sum()
- __init__(path=None, obs_loc='roof')[source]
The ETP data is collected at three different locations i.e. roof, soil and saale(hall).
- Parameters:
obs_loc (str, optional (default=roof)) –
- location of observation. It can be one of following
roof
soil
saale
- fetch(st: str | int | DatetimeIndex = None, en: str | int | DatetimeIndex = None) DataFrame[source]
Fetches the time series data between given period as
pandas.DataFrame.- Parameters:
st (Optional) – start of data to be fetched. If None, the data from start (2003-01-01) will be retuned
en (Optional) – end of data to be fetched. If None, the data from till (2021-12-31) end be retuned.
- Returns:
a
pandas.DataFrameof shape (972111, 21)- Return type:
pd.DataFrame
Examples
>>> from aqua_fetch import WeatherJena >>> dataset = WeatherJena() >>> data = dataset.fetch() >>> data.shape (972111, 21) ... # get data between specific period >>> data = dataset.fetch("20110101", "20201231") >>> data.shape (525622, 21)
- class aqua_fetch.SWECanada(path=None, **kwargs)[source]
Bases:
DatasetsDaily Canadian historical Snow Water Equivalent dataset from 1928 to 2020 from Brown et al., 2019 .
Examples
>>> from aqua_fetch import SWECanada >>> swe = SWECanada() ... # get names of all available stations >>> stns = swe.stations() >>> len(stns) 2607 ... # get data of one station >>> df1 = swe.fetch('SCD-NS010') >>> df1['SCD-NS010'].shape (33816, 3) ... # get data of 10 stations >>> df5 = swe.fetch(5, st='20110101') >>> df5.keys() ['YT-10AA-SC01', 'ALE-05CA805', 'SCD-NF078', 'SCD-NF086', 'INA-07RA01B'] >>> [v.shape for v in df5.values()] [(3500, 3), (3500, 3), (3500, 3), (3500, 3), (3500, 3)] ... # get data of 0.1% of stations >>> df2 = swe.fetch(0.001, st='20110101') ... # get data of one stations starting from 2011 >>> df3 = swe.fetch('ALE-05AE810', st='20110101') >>> df3.keys() >>> ['ALE-05AE810'] >>> df4 = swe.fetch(stns[0:10], st='20110101')
- __init__(path=None, **kwargs)[source]
- Parameters:
name – str (default=None) name of dataset
units – str, (default=None) the unit system being used
path – str (default=None) path where the data is available (manually downloaded). If None, it will be downloaded
processes – int number of processes to use for parallel processing
verbosity – int determines the amount of information to be printed
remove_zip – bool whether to remove the zip files after unz
- fetch(stations: None | str | float | int | list = None, features: None | str | list = None, q_flags: None | str | list = None, st=None, en=None) dict[source]
Fetches time series data from selected stations.
- Parameters:
stations – station/stations to be retrieved. In None, then data from all stations will be returned.
features –
Names of features to be retrieved. Following features are allowed:
snwsnow water equivalent kg/m3sndsnow depth mdensnowpack bulk density kg/m3
If None, then all three features will be retrieved.
q_flags –
If None, then no qflags will be returned. Following q_flag values are available.
data_flag_snwdata_flag_sndqc_flag_snwqc_flag_snd
st – start of data to be retrieved
en – end of data to be retrived.
- Returns:
a dictionary of dataframes of shape (st:en, features + q_flags) whose length is equal to length of stations being considered.
- Return type:
- class aqua_fetch.rr.mtropics.MtropicsLaos(path=None, save_as_nc: bool = True, convert_to_csv: bool = False, **kwargs)[source]
Bases:
DatasetsDownloads and prepares hydrological, climate and land use data for Laos from Mtropics website and ird data servers.
- - fetch_lu
- - fetch_ecoli
- - fetch_rain_gauges
- - fetch_weather_station_data
- - fetch_pcp
- - fetch_hydro
- - make_regression
- __init__(path=None, save_as_nc: bool = True, convert_to_csv: bool = False, **kwargs)[source]
- Parameters:
name – str (default=None) name of dataset
units – str, (default=None) the unit system being used
path – str (default=None) path where the data is available (manually downloaded). If None, it will be downloaded
processes – int number of processes to use for parallel processing
verbosity – int determines the amount of information to be printed
remove_zip – bool whether to remove the zip files after unz
- fetch_ecoli(features: list | str = 'Ecoli_mpn100', st: str | Timestamp = '20110525 10:00:00', en: str | Timestamp = '20210406 15:05:00', remove_duplicates: bool = True) DataFrame[source]
Fetches E. coli data collected at the outlet. See Ribolzi et al., 2021 and Boithias et al., 2021 for reference. NaNs represent missing values. The data is randomly sampled between 2011 to 2021 during rainfall events. Total 368 E. coli observation points are available now.
- Parameters:
st – start of data. By default the data is fetched from the point it is available.
en – end of data. By default the data is fetched til the point it is available.
features –
coli concentration data. Following data are available
Ecoli_LL_mpn100: Lower limit of the confidence interval
Ecoli_mpn100: Stream water Escherichia coli concentration
Ecoli_UL_mpn100: Upper limit of the confidence interval
remove_duplicates – whether to remove duplicates or not. This is because some values were recorded within a minute,
- Return type:
a
pandas.DataFrameconsisting of features as columns.
- fetch_hydro(st: str | Timestamp = '20010101 00:06:00', en: str | Timestamp = '20200101 00:06:00') Tuple[DataFrame, DataFrame][source]
fetches water level (cm) and suspended particulate matter (g L-1). Both data are from 2001 to 2019 but are randomly sampled.
- Parameters:
st (optional) – starting point of data to be fetched.
en (optional) – end point of data to be fetched.
- Returns:
a tuple of pandas dataframes of water level and suspended particulate
matter.
- fetch_pcp(st: str | Timestamp = '20010101 00:06:00', en: str | Timestamp = '20200101 00:06:00', freq: str = '6min') DataFrame[source]
Fetches the precipitation data which is collected at 6 minutes time-step from 2001 to 2020.
- Parameters:
st – starting point of data to be fetched.
en – end point of data to be fetched.
freq – frequency at which the data is to be returned.
- Return type:
pandas.DataFrameof precipitation data
- fetch_physiochem(features: list | str = 'all', st: str | Timestamp = '20110525 10:00:00', en: str | Timestamp = '20210406 15:05:00') DataFrame[source]
Fetches physio-chemical features of Huoy Pano catchment Laos.
- Parameters:
st – start of data.
en – end of data.
features –
The physio-chemical features to fetch. Following features are available
TECDOpercentDOpHORPTurbidityTSS
- Return type:
Examples
>>> from aqua_fetch import MtropicsLaos >>> laos = MtropicsLaos() >>> phy_chem = laos.fetch_physiochem('T_deg') >>> phy_chem.shape (411, 1) >>> phy_chem_all = laos.fetch_physiochem(features='all') >>> phy_chem_all.shape (411, 8)
- fetch_rain_gauges(st: str | Timestamp = '20010101', en: str | Timestamp = '20191231') DataFrame[source]
fetches data from 7 rain gauges which is collected at daily time step from 2001 to 2019.
- Parameters:
st – start of data. By default the data is fetched from the point it is available.
en – end of data. By default the data is fetched til the point it is available.
- Returns:
a dataframe of 7 columns, where each column represnets a rain guage
observations. The length of dataframe depends upon range defined by
st and en arguments.
Examples
>>> from aqua_fetch import MtropicsLaos >>> laos = MtropicsLaos() >>> rg = laos.fetch_rain_gauges()
- fetch_source() DataFrame[source]
returns monthly source data for E. coli at from 2001 to 2021 obtained from here
- Return type:
pd.DataFrame of shape (252, 19)
- fetch_suro() DataFrame[source]
- returns surface runoff and soil detachment data from Houay pano,
Laos PDR.
- Returns:
a dataframe of shape (293, 13)
- Return type:
pd.DataFrame
Examples
>>> from aqua_fetch import MtropicsLaos >>> laos = MtropicsLaos() >>> suro = laos.fetch_suro()
- fetch_weather_station_data(st: str | Timestamp = '20010101 01:00:00', en: str | Timestamp = '20200101 00:00:00', freq: str = 'H') DataFrame[source]
fetches hourly weather [1]_ station data which consits of air temperature, humidity, wind speed and solar radiation.
- Parameters:
st – start of data to be feteched.
en – end of data to be fetched.
freq – frequency at which the data is to be fetched.
- Return type:
a
pandas.DataFrameconsisting of 4 columns
- make_classification(input_features: None | list = None, output_features: str | list = None, st: None | str = '20110525 14:00:00', en: None | str = '20181027 00:00:00', freq: str = '6min', threshold: int | dict = 400, lookback_steps: int = None) DataFrame[source]
Returns data for a classification problem.
- Parameters:
input_features – names of inputs to use.
output_features – feature/features to consdier as target/output/label
st – starting date of data. The default starting date is 20110525
en – end date of data
freq – frequency of data
threshold – threshold to use to determine classes. Values greater than equal to threshold are set to 1 while values smaller than threshold are set to 0. The value of 400 is chosen for E. coli to make the the number 0s and 1s balanced. It should be noted that US-EPA recommends threshold value of 400 cfu/ml.
lookback_steps – the number of previous steps to use. If this argument is used, the resultant dataframe will have (ecoli_observations * lookback_steps) rows. The resulting index will not be continuous.
- Returns:
a dataframe of shape (inputs+target, st:en)
- Return type:
pd.DataFrame
Example
>>> from aqua_fetch import MtropicsLaos >>> laos = MtropicsLaos() >>> df = laos.make_classification()
- make_regression(input_features: None | list = None, output_features: str | list = 'Ecoli_mpn100', st: None | str = '20110525 14:00:00', en: None | str = '20181027 00:00:00', freq: str = '6min', lookback_steps: int = None, replace_zeros_in_target: bool = True) DataFrame[source]
Returns data for a regression problem using hydrological, environmental, and water quality data of Huoay pano.
- Parameters:
input_features –
names of inputs to use. By default following features are used as input
air_temprel_humwind_speedsol_radwater_levelpcpsusp_pmEcoli_source
output_features (feature/features to consdier as target/output/label)
st – starting date of data
en – end date of data
freq (frequency of data)
lookback_steps (int, default=None) – the number of previous steps to use. If this argument is used, the resultant dataframe will have (ecoli_observations * lookback_steps) rows. The resulting index will not be continuous.
replace_zeros_in_target (bool, default=True) – Replace the zeroes in target column with 1s.
- Returns:
a dataframe of shape (inputs+target, st - en)
- Return type:
pd.DataFrame
Example
>>> from aqua_fetch import MtropicsLaos >>> laos = MtropicsLaos() >>> ins = ['pcp', 'air_temp'] >>> out = ['Ecoli_mpn100'] >>> reg_data = laos.make_regression(ins, out, '20110101', '20181231')
todo add HRU definition