Data sets¶

All datasets used in the skforecast library and related tutorials are accessible using the skforecast.datasets.fetch_dataset() function. Each dataset in this collection comes with a description of its time series and a reference to its original source.

Available data sets are stored at skforecast-datasets.

In [1]:

Copied!

# Libraries
# ==============================================================================
from skforecast.datasets import fetch_dataset
# Libraries
# ==============================================================================
from skforecast.datasets import fetch_dataset

By default, the data is structured as a pandas dataframe with a datetime index and frequency. Additionally, a concise description is printed for quick reference.

In [2]:

Copied!





# Download data 
# ==============================================================================
data = fetch_dataset(name="bike_sharing")
data.head()
# Download data 
# ==============================================================================
data = fetch_dataset(name="bike_sharing")
data.head()

bike_sharing
------------
Hourly usage of the bike share system in the city of Washington D.C. during the
years 2011 and 2012. In addition to the number of users per hour, information
about weather conditions and holidays is available.
Fanaee-T,Hadi. (2013). Bike Sharing Dataset. UCI Machine Learning Repository.
https://doi.org/10.24432/C5W894.
Shape of the dataset: (17544, 11)

c:\Users\jaesc2\Miniconda3\envs\skforecast_py11\Lib\site-packages\skforecast\datasets\datasets.py:324: FutureWarning: 'H' is deprecated and will be removed in a future version, please use 'h' instead.
  df = df.asfreq(freq)

Out[2]:

	holiday	workingday	weather	temp	atemp	hum	windspeed	users	month	hour	weekday
date_time
2011-01-01 00:00:00	0.0	0.0	clear	9.84	14.395	81.0	0.0	16.0	1	0	5
2011-01-01 01:00:00	0.0	0.0	clear	9.02	13.635	80.0	0.0	40.0	1	1	5
2011-01-01 02:00:00	0.0	0.0	clear	9.02	13.635	80.0	0.0	32.0	1	2	5
2011-01-01 03:00:00	0.0	0.0	clear	9.84	14.395	75.0	0.0	13.0	1	3	5
2011-01-01 04:00:00	0.0	0.0	clear	9.84	14.395	75.0	0.0	1.0	1	4	5

Downloading raw data, without any preprocessing, is possible by specifying the raw=True argument.

In [3]:

Copied!





# Download raw data 
# ==============================================================================
data = fetch_dataset(name="bike_sharing", raw=True)
data.head()
# Download raw data 
# ==============================================================================
data = fetch_dataset(name="bike_sharing", raw=True)
data.head()

bike_sharing
------------
Hourly usage of the bike share system in the city of Washington D.C. during the
years 2011 and 2012. In addition to the number of users per hour, information
about weather conditions and holidays is available.
Fanaee-T,Hadi. (2013). Bike Sharing Dataset. UCI Machine Learning Repository.
https://doi.org/10.24432/C5W894.
Shape of the dataset: (17544, 12)

Out[3]:

	date_time	weather	temp	atemp	hum	users	month	hour	weekday
0	2011-01-01 00:00:00	clear	9.84	14.395	81.0	16.0	1	0	5
1	2011-01-01 01:00:00	clear	9.02	13.635	80.0	40.0	1	1	5
2	2011-01-01 02:00:00	clear	9.02	13.635	80.0	32.0	1	2	5
3	2011-01-01 03:00:00	clear	9.84	14.395	75.0	13.0	1	3	5
4	2011-01-01 04:00:00	clear	9.84	14.395	75.0	1.0	1	4	5