Calendar features¶
Calendar features serve as key elements in time series forecasting. These features decompose date and time into basic units such as year, month, day, weekday, etc., allowing models to identify recurring patterns, understand seasonal variations, and identify trends. Calendar features can be used as exogenous variables because they are known for the period for which predictions are being made (the forecast horizon).
Dates and time in Pandas
Pandas provides a comprehensive set of capabilities tailored for handling time series data in various domains. Using the NumPy datetime64
and timedelta64
data types, Pandas combines a wide range of functionality from various Python libraries while introducing a wealth of novel tools to effectively manipulate time series data. This includes:
Easily parse date and time data from multiple sources and formats.
Generate sequences of fixed-frequency dates and time spans.
Streamline the manipulation and conversion of date-time information, including time zones.
Facilitate the resampling or conversion of time series data to specific frequencies.
For an in-depth exploration of Pandas' comprehensive time series and date capabilities, please refer to this resource.
Libraries and data¶
# Libraries
# ==============================================================================
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from skforecast.datasets import fetch_dataset
# Downloading data
# ==============================================================================
data = fetch_dataset(name="bike_sharing", raw=True)
data = data[['date_time', 'users']]
data.head()
bike_sharing ------------ Hourly usage of the bike share system in the city of Washington D.C. during the years 2011 and 2012. In addition to the number of users per hour, information about weather conditions and holidays is available. Fanaee-T,Hadi. (2013). Bike Sharing Dataset. UCI Machine Learning Repository. https://doi.org/10.24432/C5W894. Shape of the dataset: (17544, 12)
date_time | users | |
---|---|---|
0 | 2011-01-01 00:00:00 | 16.0 |
1 | 2011-01-01 01:00:00 | 40.0 |
2 | 2011-01-01 02:00:00 | 32.0 |
3 | 2011-01-01 03:00:00 | 13.0 |
4 | 2011-01-01 04:00:00 | 1.0 |
Extract calendar features¶
To take advantage of the date-time functionality offered by Pandas, the column of interest must be stored as datetime
. Although not required, it is recommended to set it as an index for further integration with skforecast.
# Preprocess data
# ==============================================================================
data['date_time'] = pd.to_datetime(data['date_time'], format='%Y-%m-%d %H:%M:%S')
data = data.set_index('date_time')
data = data.asfreq('H')
data = data.sort_index()
data.head()
users | |
---|---|
date_time | |
2011-01-01 00:00:00 | 16.0 |
2011-01-01 01:00:00 | 40.0 |
2011-01-01 02:00:00 | 32.0 |
2011-01-01 03:00:00 | 13.0 |
2011-01-01 04:00:00 | 1.0 |
Next, several features are created from the date and time information: year, month, day of the week, and hour.
# Create calendar features
# ==============================================================================
data['year'] = data.index.year
data['month'] = data.index.month
data['day_of_week'] = data.index.dayofweek
data['hour'] = data.index.hour
data.head()
users | year | month | day_of_week | hour | |
---|---|---|---|---|---|
date_time | |||||
2011-01-01 00:00:00 | 16.0 | 2011 | 1 | 5 | 0 |
2011-01-01 01:00:00 | 40.0 | 2011 | 1 | 5 | 1 |
2011-01-01 02:00:00 | 32.0 | 2011 | 1 | 5 | 2 |
2011-01-01 03:00:00 | 13.0 | 2011 | 1 | 5 | 3 |
2011-01-01 04:00:00 | 1.0 | 2011 | 1 | 5 | 4 |