Data sets¶
All datasets used in the skforecast library and related tutorials are accessible using the skforecast.datasets.fetch_dataset()
function.
Each dataset in this collection comes with a description of its time series and a reference to its original source.
Available data sets are stored at skforecast-datasets.
In [1]:
Copied!
# Libraries
# ==============================================================================
from skforecast.datasets import fetch_dataset
# Libraries
# ==============================================================================
from skforecast.datasets import fetch_dataset
By default, the data is structured as a pandas dataframe with a datetime index and frequency. Additionally, a concise description is printed for quick reference.
In [2]:
Copied!
# Download data
# ==============================================================================
data = fetch_dataset(name="bike_sharing")
data.head()
# Download data
# ==============================================================================
data = fetch_dataset(name="bike_sharing")
data.head()
bike_sharing ------------ Hourly usage of the bike share system in the city of Washington D.C. during the years 2011 and 2012. In addition to the number of users per hour, information about weather conditions and holidays is available. Fanaee-T,Hadi. (2013). Bike Sharing Dataset. UCI Machine Learning Repository. https://doi.org/10.24432/C5W894. Shape of the dataset: (17544, 11)
c:\Users\jaesc2\Miniconda3\envs\skforecast_py11\Lib\site-packages\skforecast\datasets\datasets.py:324: FutureWarning: 'H' is deprecated and will be removed in a future version, please use 'h' instead. df = df.asfreq(freq)
Out[2]:
holiday | workingday | weather | temp | atemp | hum | windspeed | users | month | hour | weekday | |
---|---|---|---|---|---|---|---|---|---|---|---|
date_time | |||||||||||
2011-01-01 00:00:00 | 0.0 | 0.0 | clear | 9.84 | 14.395 | 81.0 | 0.0 | 16.0 | 1 | 0 | 5 |
2011-01-01 01:00:00 | 0.0 | 0.0 | clear | 9.02 | 13.635 | 80.0 | 0.0 | 40.0 | 1 | 1 | 5 |
2011-01-01 02:00:00 | 0.0 | 0.0 | clear | 9.02 | 13.635 | 80.0 | 0.0 | 32.0 | 1 | 2 | 5 |
2011-01-01 03:00:00 | 0.0 | 0.0 | clear | 9.84 | 14.395 | 75.0 | 0.0 | 13.0 | 1 | 3 | 5 |
2011-01-01 04:00:00 | 0.0 | 0.0 | clear | 9.84 | 14.395 | 75.0 | 0.0 | 1.0 | 1 | 4 | 5 |
Downloading raw data, without any preprocessing, is possible by specifying the raw=True
argument.
In [3]:
Copied!
# Download raw data
# ==============================================================================
data = fetch_dataset(name="bike_sharing", raw=True)
data.head()
# Download raw data
# ==============================================================================
data = fetch_dataset(name="bike_sharing", raw=True)
data.head()
bike_sharing ------------ Hourly usage of the bike share system in the city of Washington D.C. during the years 2011 and 2012. In addition to the number of users per hour, information about weather conditions and holidays is available. Fanaee-T,Hadi. (2013). Bike Sharing Dataset. UCI Machine Learning Repository. https://doi.org/10.24432/C5W894. Shape of the dataset: (17544, 12)
Out[3]:
date_time | holiday | workingday | weather | temp | atemp | hum | windspeed | users | month | hour | weekday | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2011-01-01 00:00:00 | 0.0 | 0.0 | clear | 9.84 | 14.395 | 81.0 | 0.0 | 16.0 | 1 | 0 | 5 |
1 | 2011-01-01 01:00:00 | 0.0 | 0.0 | clear | 9.02 | 13.635 | 80.0 | 0.0 | 40.0 | 1 | 1 | 5 |
2 | 2011-01-01 02:00:00 | 0.0 | 0.0 | clear | 9.02 | 13.635 | 80.0 | 0.0 | 32.0 | 1 | 2 | 5 |
3 | 2011-01-01 03:00:00 | 0.0 | 0.0 | clear | 9.84 | 14.395 | 75.0 | 0.0 | 13.0 | 1 | 3 | 5 |
4 | 2011-01-01 04:00:00 | 0.0 | 0.0 | clear | 9.84 | 14.395 | 75.0 | 0.0 | 1.0 | 1 | 4 | 5 |