API Reference
Feature engineering
Create exogenous regressors for your models
source
fourier
fourier (df:Union[pandas.core.frame.DataFrame,polars.dataframe.frame.Dat aFrame], freq:str, season_length:int, k:int, h:int=0, id_col:str='unique_id', time_col:str='ds')
Compute fourier seasonal terms for training and forecasting
Type | Default | Details | |
---|---|---|---|
df | Union | Dataframe with ids, times and values for the exogenous regressors. | |
freq | str | Frequency of the data. Must be a valid pandas or polars offset alias, or an integer. | |
season_length | int | Number of observations per unit of time. Ex: 24 Hourly data. | |
k | int | Maximum order of the fourier terms | |
h | int | 0 | Forecast horizon. |
id_col | str | unique_id | Column that identifies each serie. |
time_col | str | ds | Column that identifies each timestep, its values can be timestamps or integers. |
Returns | Tuple | Original DataFrame with the computed features |
import pandas as pd
from utilsforecast.data import generate_series
series = generate_series(5, equal_ends=True)
transformed_df, future_df = fourier(series, freq='D', season_length=7, k=2, h=1)
transformed_df
unique_id | ds | y | sin1_7 | sin2_7 | cos1_7 | cos2_7 | |
---|---|---|---|---|---|---|---|
0 | 0 | 2000-10-05 | 0.428973 | -0.974927 | 0.433894 | -0.222526 | -0.900964 |
1 | 0 | 2000-10-06 | 1.423626 | -0.781835 | -0.974926 | 0.623486 | -0.222531 |
2 | 0 | 2000-10-07 | 2.311782 | -0.000005 | -0.000009 | 1.000000 | 1.000000 |
3 | 0 | 2000-10-08 | 3.192191 | 0.781829 | 0.974930 | 0.623493 | -0.222512 |
4 | 0 | 2000-10-09 | 4.148767 | 0.974929 | -0.433877 | -0.222517 | -0.900972 |
… | … | … | … | … | … | … | … |
1096 | 4 | 2001-05-10 | 4.058910 | -0.974927 | 0.433888 | -0.222523 | -0.900967 |
1097 | 4 | 2001-05-11 | 5.178157 | -0.781823 | -0.974934 | 0.623500 | -0.222495 |
1098 | 4 | 2001-05-12 | 6.133142 | -0.000002 | -0.000003 | 1.000000 | 1.000000 |
1099 | 4 | 2001-05-13 | 0.403709 | 0.781840 | 0.974922 | 0.623479 | -0.222548 |
1100 | 4 | 2001-05-14 | 1.081779 | 0.974928 | -0.433882 | -0.222520 | -0.900970 |
future_df
unique_id | ds | sin1_7 | sin2_7 | cos1_7 | cos2_7 | |
---|---|---|---|---|---|---|
0 | 0 | 2001-05-15 | 0.433871 | -0.781813 | -0.900975 | 0.623513 |
1 | 1 | 2001-05-15 | 0.433871 | -0.781813 | -0.900975 | 0.623513 |
2 | 2 | 2001-05-15 | 0.433871 | -0.781813 | -0.900975 | 0.623513 |
3 | 3 | 2001-05-15 | 0.433871 | -0.781813 | -0.900975 | 0.623513 |
4 | 4 | 2001-05-15 | 0.433871 | -0.781813 | -0.900975 | 0.623513 |
source
trend
trend (df:Union[pandas.core.frame.DataFrame,polars.dataframe.frame.DataF rame], freq:str, h:int=0, id_col:str='unique_id', time_col:str='ds')
Add a trend column with consecutive integers for training and forecasting
Type | Default | Details | |
---|---|---|---|
df | Union | Dataframe with ids, times and values for the exogenous regressors. | |
freq | str | Frequency of the data. Must be a valid pandas or polars offset alias, or an integer. | |
h | int | 0 | Forecast horizon. |
id_col | str | unique_id | Column that identifies each serie. |
time_col | str | ds | Column that identifies each timestep, its values can be timestamps or integers. |
Returns | Tuple | Original DataFrame with the computed features |
series = generate_series(5, equal_ends=True)
transformed_df, future_df = trend(series, freq='D', h=1)
transformed_df
unique_id | ds | y | trend | |
---|---|---|---|---|
0 | 0 | 2000-10-05 | 0.428973 | 152.0 |
1 | 0 | 2000-10-06 | 1.423626 | 153.0 |
2 | 0 | 2000-10-07 | 2.311782 | 154.0 |
3 | 0 | 2000-10-08 | 3.192191 | 155.0 |
4 | 0 | 2000-10-09 | 4.148767 | 156.0 |
… | … | … | … | … |
1096 | 4 | 2001-05-10 | 4.058910 | 369.0 |
1097 | 4 | 2001-05-11 | 5.178157 | 370.0 |
1098 | 4 | 2001-05-12 | 6.133142 | 371.0 |
1099 | 4 | 2001-05-13 | 0.403709 | 372.0 |
1100 | 4 | 2001-05-14 | 1.081779 | 373.0 |
future_df
unique_id | ds | trend | |
---|---|---|---|
0 | 0 | 2001-05-15 | 374.0 |
1 | 1 | 2001-05-15 | 374.0 |
2 | 2 | 2001-05-15 | 374.0 |
3 | 3 | 2001-05-15 | 374.0 |
4 | 4 | 2001-05-15 | 374.0 |
source
pipeline
pipeline (df:Union[pandas.core.frame.DataFrame,polars.dataframe.frame.Da taFrame], features:List[Callable], freq:str, h:int=0, id_col:str='unique_id', time_col:str='ds')
Compute several features for training and forecasting
Type | Default | Details | |
---|---|---|---|
df | Union | Dataframe with ids, times and values for the exogenous regressors. | |
features | List | List of features to compute. Must take only df, freq, h, id_col and time_col (other arguments must be fixed). | |
freq | str | Frequency of the data. Must be a valid pandas or polars offset alias, or an integer. | |
h | int | 0 | Forecast horizon. |
id_col | str | unique_id | Column that identifies each serie. |
time_col | str | ds | Column that identifies each timestep, its values can be timestamps or integers. |
Returns | Tuple | Original DataFrame with the computed features |
features = [
trend,
partial(fourier, season_length=7, k=1),
partial(fourier, season_length=28, k=1),
]
transformed_df, future_df = pipeline(
series,
features=features,
freq='D',
h=1,
)
transformed_df
unique_id | ds | y | trend | sin1_7 | cos1_7 | sin1_28 | cos1_28 | |
---|---|---|---|---|---|---|---|---|
0 | 0 | 2000-10-05 | 0.428973 | 152.0 | -0.974927 | -0.222526 | 0.433885 | -9.009683e-01 |
1 | 0 | 2000-10-06 | 1.423626 | 153.0 | -0.781835 | 0.623486 | 0.222522 | -9.749276e-01 |
2 | 0 | 2000-10-07 | 2.311782 | 154.0 | -0.000005 | 1.000000 | 0.000001 | -1.000000e+00 |
3 | 0 | 2000-10-08 | 3.192191 | 155.0 | 0.781829 | 0.623493 | -0.222520 | -9.749281e-01 |
4 | 0 | 2000-10-09 | 4.148767 | 156.0 | 0.974929 | -0.222517 | -0.433883 | -9.009693e-01 |
… | … | … | … | … | … | … | … | … |
1096 | 4 | 2001-05-10 | 4.058910 | 369.0 | -0.974927 | -0.222523 | 0.900969 | 4.338843e-01 |
1097 | 4 | 2001-05-11 | 5.178157 | 370.0 | -0.781823 | 0.623500 | 0.974929 | 2.225177e-01 |
1098 | 4 | 2001-05-12 | 6.133142 | 371.0 | -0.000002 | 1.000000 | 1.000000 | 4.251100e-07 |
1099 | 4 | 2001-05-13 | 0.403709 | 372.0 | 0.781840 | 0.623479 | 0.974927 | -2.225243e-01 |
1100 | 4 | 2001-05-14 | 1.081779 | 373.0 | 0.974928 | -0.222520 | 0.900969 | -4.338835e-01 |
future_df
unique_id | ds | trend | sin1_7 | cos1_7 | sin1_28 | cos1_28 | |
---|---|---|---|---|---|---|---|
0 | 0 | 2001-05-15 | 374.0 | 0.433871 | -0.900975 | 0.781829 | -0.623493 |
1 | 1 | 2001-05-15 | 374.0 | 0.433871 | -0.900975 | 0.781829 | -0.623493 |
2 | 2 | 2001-05-15 | 374.0 | 0.433871 | -0.900975 | 0.781829 | -0.623493 |
3 | 3 | 2001-05-15 | 374.0 | 0.433871 | -0.900975 | 0.781829 | -0.623493 |
4 | 4 | 2001-05-15 | 374.0 | 0.433871 | -0.900975 | 0.781829 | -0.623493 |