Documentation Index
Fetch the complete documentation index at: https://nixtlaverse.nixtla.io/llms.txt
Use this file to discover all available pages before exploring further.
Dataset input requirements
In this example we will go through the dataset input requirements of the
core.NeuralForecast class.
The core.NeuralForecast methods operate as global models that receive
a set of time series rather than single series. The class uses
cross-learning technique to fit flexible-shared models such as neural
networks improving its generalization capabilities as shown by the M4
international forecasting competition (Smyl 2019, Semenoglou 2021).
You can run these experiments using GPU with Google Colab.
Multiple time series
Store your time series in a pandas dataframe in long format, that is,
each row represents an observation for a specific series and timestamp.
Let’s see an example using the datasetsforecast library.
Y_df = pd.concat( [series1, series2, ...])
%%capture
!pip install datasetsforecast
import pandas as pd
from datasetsforecast.m3 import M3
Y_df, *_ = M3.load('./data', group='Yearly')
Y_df.groupby('unique_id').head(2)
| unique_id | ds | y |
|---|
| 0 | Y1 | 1975-12-31 | 940.66 |
| 1 | Y1 | 1976-12-31 | 1084.86 |
| 20 | Y10 | 1975-12-31 | 2160.04 |
| 21 | Y10 | 1976-12-31 | 2553.48 |
| 40 | Y100 | 1975-12-31 | 1424.70 |
| … | … | … | … |
| 18260 | Y97 | 1976-12-31 | 1618.91 |
| 18279 | Y98 | 1975-12-31 | 1164.97 |
| 18280 | Y98 | 1976-12-31 | 1277.87 |
| 18299 | Y99 | 1975-12-31 | 1870.00 |
| 18300 | Y99 | 1976-12-31 | 1307.20 |
Y_df is a dataframe with three columns: unique_id with a unique
identifier for each time series, a column ds with the datestamp and a
column y with the values of the series.
Single time series
If you have only one time series, you have to include the unique_id
column. Consider, for example, the
AirPassengers
dataset.
Y_df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/air_passengers.csv')
Y_df
| timestamp | value |
|---|
| 0 | 1949-01-01 | 112 |
| 1 | 1949-02-01 | 118 |
| 2 | 1949-03-01 | 132 |
| 3 | 1949-04-01 | 129 |
| 4 | 1949-05-01 | 121 |
| … | … | … |
| 139 | 1960-08-01 | 606 |
| 140 | 1960-09-01 | 508 |
| 141 | 1960-10-01 | 461 |
| 142 | 1960-11-01 | 390 |
| 143 | 1960-12-01 | 432 |
In this example Y_df only contains two columns: timestamp, and
value. To use NeuralForecast we have to include the unique_id
column and rename the previous ones.
Y_df['unique_id'] = 1. # We can add an integer as identifier
Y_df = Y_df.rename(columns={'timestamp': 'ds', 'value': 'y'})
Y_df = Y_df[['unique_id', 'ds', 'y']]
Y_df
| unique_id | ds | y |
|---|
| 0 | 1.0 | 1949-01-01 | 112 |
| 1 | 1.0 | 1949-02-01 | 118 |
| 2 | 1.0 | 1949-03-01 | 132 |
| 3 | 1.0 | 1949-04-01 | 129 |
| 4 | 1.0 | 1949-05-01 | 121 |
| … | … | … | … |
| 139 | 1.0 | 1960-08-01 | 606 |
| 140 | 1.0 | 1960-09-01 | 508 |
| 141 | 1.0 | 1960-10-01 | 461 |
| 142 | 1.0 | 1960-11-01 | 390 |
| 143 | 1.0 | 1960-12-01 | 432 |
References