Miscellaneous

This section contains the documentation for the miscellaneous datasets available in the package.

aqua_fetch.gw_punjab(data_type: str = 'full', country: str = None) DataFrame[source]

groundwater level (meters below ground level) dataset from Punjab region (Pakistan and north-west India) following the study of MacAllister et al., 2022.

Parameters:
  • data_type (str (default="full")) – either full or LTS. The full contains the full dataset, there are 68783 rows of observed groundwater level data from 4028 individual sites. In LTS there are 7547 rows of groundwater level observations from 130 individual sites, which have water level data available for a period of more than 40 years and from which at least two thirds of the annual observations are available.

  • country (str (default=None)) – the country for which data to retrieve. Either PAK or IND.

Returns:

a pandas.DataFrame with datetime index

Return type:

pd.DataFrame

Examples

>>> from aqua_fetch import gw_punjab
>>> full_data = gw_punjab()
find out the earliest observation
>>> print(full_data.sort_index().head(1))
>>> lts_data = gw_punjab()
>>> lts_data.shape
    (68782, 4)
>>> df_pak = gw_punjab(country="PAK")
>>> df_pak.sort_index().dropna().head(1)
class aqua_fetch.Weisssee(path=None, overwrite=False, **kwargs)[source]

Bases: Datasets

__init__(path=None, overwrite=False, **kwargs)[source]
Parameters:
  • name – str (default=None) name of dataset

  • units – str, (default=None) the unit system being used

  • path – str (default=None) path where the data is available (manually downloaded). If None, it will be downloaded

  • processes – int number of processes to use for parallel processing

  • verbosity – int determines the amount of information to be printed

  • remove_zip – bool whether to remove the zip files after unz

fetch(**kwargs)[source]

Examples

>>> from aqua_fetch import Weisssee
>>> dataset = Weisssee()
>>> data = dataset.fetch()
class aqua_fetch.WeatherJena(path=None, obs_loc='roof')[source]

Bases: Datasets

10 minute weather dataset of Jena, Germany hosted at https://www.bgc-jena.mpg.de/wetter/index.html from 2002 onwards.

>>> from aqua_fetch import WeatherJena
>>> dataset = WeatherJena()
>>> data = dataset.fetch()
>>> data.sum()
__init__(path=None, obs_loc='roof')[source]

The ETP data is collected at three different locations i.e. roof, soil and saale(hall).

Parameters:

obs_loc (str, optional (default=roof)) –

location of observation. It can be one of following
  • roof

  • soil

  • saale

property dynamic_features: List[str]

returns names of features available

fetch(st: str | int | DatetimeIndex = None, en: str | int | DatetimeIndex = None) DataFrame[source]

Fetches the time series data between given period as pandas.DataFrame.

Parameters:
  • st (Optional) – start of data to be fetched. If None, the data from start (2003-01-01) will be retuned

  • en (Optional) – end of data to be fetched. If None, the data from till (2021-12-31) end be retuned.

Returns:

a pandas.DataFrame of shape (972111, 21)

Return type:

pd.DataFrame

Examples

>>> from aqua_fetch import WeatherJena
>>> dataset = WeatherJena()
>>> data = dataset.fetch()
>>> data.shape
(972111, 21)
... # get data between specific period
>>> data = dataset.fetch("20110101", "20201231")
>>> data.shape
(525622, 21)
class aqua_fetch.SWECanada(path=None, **kwargs)[source]

Bases: Datasets

Daily Canadian historical Snow Water Equivalent dataset from 1928 to 2020 from Brown et al., 2019 .

Examples

>>> from aqua_fetch import SWECanada
>>> swe = SWECanada()
... # get names of all available stations
>>> stns = swe.stations()
>>> len(stns)
2607
... # get data of one station
>>> df1 = swe.fetch('SCD-NS010')
>>> df1['SCD-NS010'].shape
(33816, 3)
... # get data of 10 stations
>>> df5 = swe.fetch(5, st='20110101')
>>> df5.keys()
['YT-10AA-SC01', 'ALE-05CA805', 'SCD-NF078', 'SCD-NF086', 'INA-07RA01B']
>>> [v.shape for v in df5.values()]
[(3500, 3), (3500, 3), (3500, 3), (3500, 3), (3500, 3)]
... # get data of 0.1% of stations
>>> df2 = swe.fetch(0.001, st='20110101')
... # get data of one stations starting from 2011
>>> df3 = swe.fetch('ALE-05AE810', st='20110101')
>>> df3.keys()
>>> ['ALE-05AE810']
>>> df4 = swe.fetch(stns[0:10], st='20110101')
__init__(path=None, **kwargs)[source]
Parameters:
  • name – str (default=None) name of dataset

  • units – str, (default=None) the unit system being used

  • path – str (default=None) path where the data is available (manually downloaded). If None, it will be downloaded

  • processes – int number of processes to use for parallel processing

  • verbosity – int determines the amount of information to be printed

  • remove_zip – bool whether to remove the zip files after unz

fetch(stations: None | str | float | int | list = None, features: None | str | list = None, q_flags: None | str | list = None, st=None, en=None) dict[source]

Fetches time series data from selected stations.

Parameters:
  • stations – station/stations to be retrieved. In None, then data from all stations will be returned.

  • features

    Names of features to be retrieved. Following features are allowed:

    • snw snow water equivalent kg/m3

    • snd snow depth m

    • den snowpack bulk density kg/m3

    If None, then all three features will be retrieved.

  • q_flags

    If None, then no qflags will be returned. Following q_flag values are available.

    • data_flag_snw

    • data_flag_snd

    • qc_flag_snw

    • qc_flag_snd

  • st – start of data to be retrieved

  • en – end of data to be retrived.

Returns:

a dictionary of dataframes of shape (st:en, features + q_flags) whose length is equal to length of stations being considered.

Return type:

dict

fetch_station_attributes(stn, features_to_fetch, st=None, en=None) DataFrame[source]

fetches attributes of one station

class aqua_fetch.rr.mtropics.MtropicsLaos(path=None, save_as_nc: bool = True, convert_to_csv: bool = False, **kwargs)[source]

Bases: Datasets

Downloads and prepares hydrological, climate and land use data for Laos from Mtropics website and ird data servers.

- fetch_lu
- fetch_ecoli
- fetch_rain_gauges
- fetch_weather_station_data
- fetch_pcp
- fetch_hydro
- make_regression
__init__(path=None, save_as_nc: bool = True, convert_to_csv: bool = False, **kwargs)[source]
Parameters:
  • name – str (default=None) name of dataset

  • units – str, (default=None) the unit system being used

  • path – str (default=None) path where the data is available (manually downloaded). If None, it will be downloaded

  • processes – int number of processes to use for parallel processing

  • verbosity – int determines the amount of information to be printed

  • remove_zip – bool whether to remove the zip files after unz

fetch_ecoli(features: list | str = 'Ecoli_mpn100', st: str | Timestamp = '20110525 10:00:00', en: str | Timestamp = '20210406 15:05:00', remove_duplicates: bool = True) DataFrame[source]

Fetches E. coli data collected at the outlet. See Ribolzi et al., 2021 and Boithias et al., 2021 for reference. NaNs represent missing values. The data is randomly sampled between 2011 to 2021 during rainfall events. Total 368 E. coli observation points are available now.

Parameters:
  • st – start of data. By default the data is fetched from the point it is available.

  • en – end of data. By default the data is fetched til the point it is available.

  • features

    1. coli concentration data. Following data are available

    • Ecoli_LL_mpn100: Lower limit of the confidence interval

    • Ecoli_mpn100: Stream water Escherichia coli concentration

    • Ecoli_UL_mpn100: Upper limit of the confidence interval

  • remove_duplicates – whether to remove duplicates or not. This is because some values were recorded within a minute,

Return type:

a pandas.DataFrame consisting of features as columns.

fetch_hydro(st: str | Timestamp = '20010101 00:06:00', en: str | Timestamp = '20200101 00:06:00') Tuple[DataFrame, DataFrame][source]

fetches water level (cm) and suspended particulate matter (g L-1). Both data are from 2001 to 2019 but are randomly sampled.

Parameters:
  • st (optional) – starting point of data to be fetched.

  • en (optional) – end point of data to be fetched.

Returns:

  • a tuple of pandas dataframes of water level and suspended particulate

  • matter.

fetch_lu(processed=False)[source]

returns landuse data as list of shapefiles.

fetch_pcp(st: str | Timestamp = '20010101 00:06:00', en: str | Timestamp = '20200101 00:06:00', freq: str = '6min') DataFrame[source]

Fetches the precipitation data which is collected at 6 minutes time-step from 2001 to 2020.

Parameters:
  • st – starting point of data to be fetched.

  • en – end point of data to be fetched.

  • freq – frequency at which the data is to be returned.

Return type:

pandas.DataFrame of precipitation data

fetch_physiochem(features: list | str = 'all', st: str | Timestamp = '20110525 10:00:00', en: str | Timestamp = '20210406 15:05:00') DataFrame[source]

Fetches physio-chemical features of Huoy Pano catchment Laos.

Parameters:
  • st – start of data.

  • en – end of data.

  • features

    The physio-chemical features to fetch. Following features are available

    • T

    • EC

    • DOpercent

    • DO

    • pH

    • ORP

    • Turbidity

    • TSS

Return type:

a pandas.DataFrame

Examples

>>> from aqua_fetch import MtropicsLaos
>>> laos = MtropicsLaos()
>>> phy_chem = laos.fetch_physiochem('T_deg')
>>> phy_chem.shape
 (411, 1)
>>> phy_chem_all = laos.fetch_physiochem(features='all')
>>> phy_chem_all.shape
 (411, 8)
fetch_rain_gauges(st: str | Timestamp = '20010101', en: str | Timestamp = '20191231') DataFrame[source]

fetches data from 7 rain gauges which is collected at daily time step from 2001 to 2019.

Parameters:
  • st – start of data. By default the data is fetched from the point it is available.

  • en – end of data. By default the data is fetched til the point it is available.

Returns:

  • a dataframe of 7 columns, where each column represnets a rain guage

  • observations. The length of dataframe depends upon range defined by

  • st and en arguments.

Examples

>>> from aqua_fetch import MtropicsLaos
>>> laos = MtropicsLaos()
>>> rg = laos.fetch_rain_gauges()
fetch_source() DataFrame[source]

returns monthly source data for E. coli at from 2001 to 2021 obtained from here

Return type:

pd.DataFrame of shape (252, 19)

fetch_suro() DataFrame[source]
returns surface runoff and soil detachment data from Houay pano,

Laos PDR.

Returns:

a dataframe of shape (293, 13)

Return type:

pd.DataFrame

Examples

>>> from aqua_fetch import MtropicsLaos
>>> laos = MtropicsLaos()
>>> suro = laos.fetch_suro()
fetch_weather_station_data(st: str | Timestamp = '20010101 01:00:00', en: str | Timestamp = '20200101 00:00:00', freq: str = 'H') DataFrame[source]

fetches hourly weather [1]_ station data which consits of air temperature, humidity, wind speed and solar radiation.

Parameters:
  • st – start of data to be feteched.

  • en – end of data to be fetched.

  • freq – frequency at which the data is to be fetched.

Return type:

a pandas.DataFrame consisting of 4 columns

make_classification(input_features: None | list = None, output_features: str | list = None, st: None | str = '20110525 14:00:00', en: None | str = '20181027 00:00:00', freq: str = '6min', threshold: int | dict = 400, lookback_steps: int = None) DataFrame[source]

Returns data for a classification problem.

Parameters:
  • input_features – names of inputs to use.

  • output_features – feature/features to consdier as target/output/label

  • st – starting date of data. The default starting date is 20110525

  • en – end date of data

  • freq – frequency of data

  • threshold – threshold to use to determine classes. Values greater than equal to threshold are set to 1 while values smaller than threshold are set to 0. The value of 400 is chosen for E. coli to make the the number 0s and 1s balanced. It should be noted that US-EPA recommends threshold value of 400 cfu/ml.

  • lookback_steps – the number of previous steps to use. If this argument is used, the resultant dataframe will have (ecoli_observations * lookback_steps) rows. The resulting index will not be continuous.

Returns:

a dataframe of shape (inputs+target, st:en)

Return type:

pd.DataFrame

Example

>>> from aqua_fetch import MtropicsLaos
>>> laos = MtropicsLaos()
>>> df = laos.make_classification()
make_regression(input_features: None | list = None, output_features: str | list = 'Ecoli_mpn100', st: None | str = '20110525 14:00:00', en: None | str = '20181027 00:00:00', freq: str = '6min', lookback_steps: int = None, replace_zeros_in_target: bool = True) DataFrame[source]

Returns data for a regression problem using hydrological, environmental, and water quality data of Huoay pano.

Parameters:
  • input_features

    names of inputs to use. By default following features are used as input

    • air_temp

    • rel_hum

    • wind_speed

    • sol_rad

    • water_level

    • pcp

    • susp_pm

    • Ecoli_source

  • output_features (feature/features to consdier as target/output/label)

  • st – starting date of data

  • en – end date of data

  • freq (frequency of data)

  • lookback_steps (int, default=None) – the number of previous steps to use. If this argument is used, the resultant dataframe will have (ecoli_observations * lookback_steps) rows. The resulting index will not be continuous.

  • replace_zeros_in_target (bool, default=True) – Replace the zeroes in target column with 1s.

Returns:

a dataframe of shape (inputs+target, st - en)

Return type:

pd.DataFrame

Example

>>> from aqua_fetch import MtropicsLaos
>>> laos = MtropicsLaos()
>>> ins = ['pcp', 'air_temp']
>>> out = ['Ecoli_mpn100']
>>> reg_data = laos.make_regression(ins, out, '20110101', '20181231')

todo add HRU definition

surface_features(st: str | int | Timestamp = '2000-10-14', en: str | int | Timestamp = '2016-11-12') DataFrame[source]

soil surface features data