Documentation Index
Fetch the complete documentation index at: https://nixtlaverse.nixtla.io/llms.txt
Use this file to discover all available pages before exploring further.
The core methods of StatsForecast provide a comprehensive interface for fitting, predicting, forecasting, and evaluating statistical forecasting models on large sets of time series.
Overview
The main methods include:
StatsForecast.fit - Fit statistical models
StatsForecast.predict - Predict using fitted models
StatsForecast.forecast - Memory-efficient predictions without storing models
StatsForecast.cross_validation - Temporal cross-validation
StatsForecast.plot - Visualization of forecasts and historical data
StatsForecast Class
StatsForecast
Bases: _StatsForecast
The StatsForecast class allows you to efficiently fit multiple StatsForecast models
for large sets of time series. It operates on a DataFrame df with at least three columns:
ids, times, and targets.
The class has a memory-efficient StatsForecast.forecast method that avoids storing partial
model outputs, while the StatsForecast.fit and StatsForecast.predict methods with the
Scikit-learn interface store the fitted models.
The StatsForecast class offers parallelization utilities with Dask, Spark, and Ray back-ends.
See distributed computing example here.
StatsForecast.fit
fit(df, prediction_intervals=None, id_col='unique_id', time_col='ds', target_col='y')
Fit statistical models to time series data.
Fits all models specified in the constructor to each time series in the input
DataFrame. The fitted models are stored internally and can be used later with
the predict method. This follows the scikit-learn fit/predict interface.
Parameters:
| Name | Type | Description | Default |
|---|
df | DataFrame | Input DataFrame containing time series data. Must have columns for series identifiers, timestamps, and target values. Can optionally include exogenous features. | required |
prediction_intervals | ConformalIntervals | Configuration for calibrating prediction intervals using Conformal Prediction. If provided, the models will be prepared to generate prediction intervals. | None |
id_col | str | Name of the column containing unique identifiers for each time series. | ‘unique_id’ |
time_col | str | Name of the column containing timestamps or time indices. Values can be timestamps (datetime) or integers. | ‘ds’ |
target_col | str | Name of the column containing the target variable to forecast. | ‘y’ |
Returns:
| Name | Type | Description |
|---|
StatsForecast | StatsForecast | Returns self with fitted models stored in the fitted_ attribute. This allows for method chaining. |
StatsForecast.predict
predict(h, X_df=None, level=None)
Generate forecasts using previously fitted models.
Uses the models fitted via the fit method to generate predictions for the
specified forecast horizon. This follows the scikit-learn fit/predict interface.
Parameters:
| Name | Type | Description | Default |
|---|
h | int | Forecast horizon, the number of time steps ahead to predict. | required |
X_df | DataFrame | DataFrame containing future exogenous variables. Required if any models use exogenous features. Must have the same structure as training data and include future values for all time series and forecast horizon. | None |
level | List[float] | Confidence levels between 0 and 100 for prediction intervals (e.g., [80, 95] for 80% and 95% intervals). If provided with models configured for prediction intervals, the output will include lower and upper bounds. | None |
Returns:
| Type | Description |
|---|
DataFrame | DataFrame with forecasts for each model. Contains the series identifiers, future timestamps, and one column per model with point predictions. If level is specified, includes additional columns for prediction interval bounds (e.g., ‘model-lo-95’, ‘model-hi-95’). |
StatsForecast.fit_predict
fit_predict(h, df, X_df=None, level=None, prediction_intervals=None, id_col='unique_id', time_col='ds', target_col='y')
Fit models and generate predictions in a single step.
Combines the fit and predict methods in a single operation. The fitted models
are stored internally in the fitted_ attribute for later use, making this method
suitable when you need both training and immediate predictions.
Parameters:
| Name | Type | Description | Default |
|---|
h | int | Forecast horizon, the number of time steps ahead to predict. | required |
df | DataFrame | Input DataFrame containing time series data. Must have columns for series identifiers, timestamps, and target values. Can optionally include exogenous features. | required |
X_df | DataFrame | DataFrame containing future exogenous variables. Required if any models use exogenous features. Must include future values for all time series and forecast horizon. | None |
level | List[float] | Confidence levels between 0 and 100 for prediction intervals (e.g., [80, 95]). Required if prediction_intervals is specified. | None |
prediction_intervals | ConformalIntervals | Configuration for calibrating prediction intervals using Conformal Prediction. | None |
id_col | str | Name of the column containing unique identifiers for each time series. | ‘unique_id’ |
time_col | str | Name of the column containing timestamps or time indices. Values can be timestamps (datetime) or integers. | ‘ds’ |
target_col | str | Name of the column containing the target variable to forecast. | ‘y’ |
Returns:
| Type | Description |
|---|
DataFrame | DataFrame with forecasts containing series identifiers, future timestamps, and predictions from each model. Includes prediction intervals if level is specified. |
StatsForecast.forecast
forecast(h, df, X_df=None, level=None, fitted=False, prediction_intervals=None, id_col='unique_id', time_col='ds', target_col='y')
Generate forecasts with memory-efficient model training.
This is the primary forecasting method that trains models and generates predictions
without storing fitted model objects. It is more memory-efficient than fit_predict
when you don’t need to inspect or reuse the fitted models. Models are trained and
used for forecasting within each time series, then discarded.
Parameters:
| Name | Type | Description | Default |
|---|
h | int | Forecast horizon, the number of time steps ahead to predict. | required |
df | DataFrame | Input DataFrame containing time series data. Must have columns for series identifiers, timestamps, and target values. Can optionally include exogenous features for training. | required |
X_df | DataFrame | DataFrame containing future exogenous variables. Required if any models use exogenous features. Must include future values for all time series and forecast horizon. | None |
level | List[float] | Confidence levels between 0 and 100 for prediction intervals (e.g., [80, 95]). | None |
fitted | bool | If True, stores in-sample (fitted) predictions which can be retrieved using forecast_fitted_values(). | False |
prediction_intervals | ConformalIntervals | Configuration for calibrating prediction intervals using Conformal Prediction. | None |
id_col | str | Name of the column containing unique identifiers for each time series. | ‘unique_id’ |
time_col | str | Name of the column containing timestamps or time indices. Values can be timestamps (datetime) or integers. | ‘ds’ |
target_col | str | Name of the column containing the target variable to forecast. | ‘y’ |
Returns:
| Type | Description |
|---|
DataFrame | DataFrame with forecasts containing series identifiers, future timestamps, and predictions from each model. Includes prediction intervals if level is specified. |
StatsForecast.cross_validation
cross_validation(h, df, n_windows=1, step_size=1, test_size=None, input_size=None, level=None, fitted=False, refit=True, prediction_intervals=None, id_col='unique_id', time_col='ds', target_col='y')
Perform temporal cross-validation for model evaluation.
Evaluates model performance across multiple time windows using a time series
cross-validation approach. This method trains models on expanding or rolling
windows and generates forecasts for each validation period, providing robust
assessment of forecast accuracy and generalization.
Parameters:
| Name | Type | Description | Default |
|---|
h | int | Forecast horizon for each validation window. | required |
df | DataFrame | Input DataFrame containing time series data with columns for series identifiers, timestamps, and target values. | required |
n_windows | int | Number of validation windows to create. Cannot be specified together with test_size. | 1 |
step_size | int | Number of time steps between consecutive validation windows. Smaller values create overlapping windows. | 1 |
test_size | int | Total size of the test period. If provided, n_windows is computed automatically. Overrides n_windows if specified. | None |
input_size | int | Maximum number of training observations to use for each window. If None, uses expanding windows with all available history. If specified, uses rolling windows of fixed size. | None |
level | List[float] | Confidence levels between 0 and 100 for prediction intervals (e.g., [80, 95]). | None |
fitted | bool | If True, stores in-sample predictions for each window, accessible via cross_validation_fitted_values(). | False |
refit | bool or int | Controls model refitting frequency. If True, refits models for every window. If False, fits once and uses the forward method. If an integer n, refits every n windows. Models must implement the forward method when refit is not True. | True |
prediction_intervals | ConformalIntervals | Configuration for calibrating prediction intervals using Conformal Prediction. Requires level to be specified. | None |
id_col | str | Name of the column containing unique identifiers for each time series. | ‘unique_id’ |
time_col | str | Name of the column containing timestamps or time indices. | ‘ds’ |
target_col | str | Name of the column containing the target variable. | ‘y’ |
Returns:
| Type | Description |
|---|
DataFrame | DataFrame with cross-validation results including series identifiers, cutoff dates (last training observation), forecast dates, actual values, and predictions from each model for all windows. |
StatsForecast.plot
plot(df, forecasts_df=None, unique_ids=None, plot_random=True, models=None, level=None, max_insample_length=None, plot_anomalies=False, engine='matplotlib', id_col='unique_id', time_col='ds', target_col='y', resampler_kwargs=None)
Visualize time series data with forecasts and prediction intervals.
Creates plots showing historical data, forecasts, and optional prediction intervals
for time series. Supports multiple plotting engines and interactive visualization.
Parameters:
| Name | Type | Description | Default |
|---|
df | DataFrame | Input DataFrame containing historical time series data with columns for series identifiers, timestamps, and target values. | required |
forecasts_df | DataFrame | DataFrame with forecast results from forecast() or cross_validation(). Should contain series identifiers, timestamps, and model predictions. | None |
unique_ids | List[str] or ndarray | Specific series identifiers to plot. If None and plot_random is True, series are selected randomly. | None |
plot_random | bool | Whether to randomly select series to plot when unique_ids is not specified. | True |
models | List[str] | Names of specific models to include in the plot. If None, plots all models present in forecasts_df. | None |
level | List[float] | Confidence levels to plot as shaded regions around forecasts (e.g., [80, 95]). Only applicable if prediction intervals are present in forecasts_df. | None |
max_insample_length | int | Maximum number of historical observations to display. Useful for focusing on recent history when series are long. | None |
plot_anomalies | bool | If True, highlights observations that fall outside prediction intervals as anomalies. | False |
engine | str | Plotting library to use. Options are ‘matplotlib’ (static plots), ‘plotly’ (interactive plots), or ‘plotly-resampler’ (interactive with downsampling for large datasets). | ‘matplotlib’ |
id_col | str | Name of the column containing series identifiers. | ‘unique_id’ |
time_col | str | Name of the column containing timestamps. | ‘ds’ |
target_col | str | Name of the column containing the target variable. | ‘y’ |
resampler_kwargs | Dict | Additional keyword arguments passed to the plotly-resampler constructor when engine='plotly-resampler'. For further customization (e.g., ‘show_dash’), call this method, store the returned object, and add arguments to its show_dash method. | None |
Returns:
| Type | Description |
|---|
| Plotting object from the selected engine (matplotlib Figure, plotly Figure, or | |
| FigureResampler object), which can be further customized or displayed. | |
StatsForecast.save
save(path=None, max_size=None, trim=False)
Save the StatsForecast instance to disk using pickle.
Serializes the StatsForecast object including all fitted models and configuration
to a file for later use. The saved object can be loaded with the load() method
to restore the exact state for making predictions.
Parameters:
| Name | Type | Description | Default |
|---|
path | str or Path | File path where the object will be saved. If None, creates a filename in the current directory using the format ‘StatsForecast_YYYY-MM-DD_HH-MM-SS.pkl’ with the current UTC timestamp. | None |
max_size | str | Maximum allowed size for the serialized object. Should be specified as a number followed by a unit: ‘B’, ‘KB’, ‘MB’, or ‘GB’ (e.g., ‘100MB’, ‘1.5GB’). If the object exceeds this size, an OSError is raised. | None |
trim | bool | If True, removes fitted values from forecast() and cross_validation() before saving to reduce file size. These values are not needed for generating new predictions. | False |
StatsForecast.load
Load a previously saved StatsForecast instance from disk.
Deserializes a StatsForecast object that was saved using the save() method,
restoring all fitted models and configuration. The loaded object is ready to
generate predictions immediately.
Parameters:
| Name | Type | Description | Default |
|---|
path | str or Path | File path to the saved StatsForecast pickle file. Must point to a file created by the save() method. | required |
Returns:
| Name | Type | Description |
|---|
StatsForecast | | The deserialized StatsForecast instance with all fitted models and configuration restored, ready for prediction. |
Usage Examples
Basic Forecasting
from statsforecast import StatsForecast
from statsforecast.models import AutoARIMA, Naive
from statsforecast.utils import generate_series
# Generate example data
panel_df = generate_series(n_series=9, equal_ends=False, engine='pandas')
# Instantiate StatsForecast class
fcst = StatsForecast(
models=[AutoARIMA(), Naive()],
freq='D',
n_jobs=1,
verbose=True
)
# Efficiently predict
fcsts_df = fcst.forecast(df=panel_df, h=4, fitted=True)
Cross-Validation
from statsforecast import StatsForecast
from statsforecast.models import Naive
from statsforecast.utils import AirPassengersDF as panel_df
# Instantiate StatsForecast class
fcst = StatsForecast(
models=[Naive()],
freq='D',
n_jobs=1,
verbose=True
)
# Perform cross-validation
cv_df = fcst.cross_validation(df=panel_df, h=14, n_windows=2)
Prediction Intervals
import pandas as pd
import numpy as np
from statsforecast import StatsForecast
from statsforecast.models import SeasonalNaive, AutoARIMA
from statsforecast.utils import AirPassengers as ap
# Prepare data
ap_df = pd.DataFrame({'ds': np.arange(ap.size), 'y': ap})
ap_df['unique_id'] = 0
# Forecast with prediction intervals
sf = StatsForecast(
models=[
SeasonalNaive(season_length=12),
AutoARIMA(season_length=12)
],
freq=1,
n_jobs=1
)
ap_ci = sf.forecast(df=ap_df, h=12, level=(80, 95))
# Plot with confidence intervals
sf.plot(ap_df, ap_ci, level=[80], engine="matplotlib")
from statsforecast import StatsForecast
from statsforecast.models import AutoARIMA
from statsforecast.utils import ConformalIntervals
sf = StatsForecast(
models=[
AutoARIMA(season_length=12),
AutoARIMA(
season_length=12,
prediction_intervals=ConformalIntervals(n_windows=2, h=12),
alias='ConformalAutoARIMA'
),
],
freq=1,
n_jobs=1
)
ap_ci = sf.forecast(df=ap_df, h=12, level=(80, 95))
Advanced Features
Integer Datestamps
The StatsForecast class can work with integer datestamps instead of datetime objects:
from statsforecast import StatsForecast
from statsforecast.models import HistoricAverage
from statsforecast.utils import AirPassengers as ap
import pandas as pd
import numpy as np
# Create dataframe with integer datestamps
int_ds_df = pd.DataFrame({'ds': np.arange(1, len(ap) + 1), 'y': ap})
int_ds_df.insert(0, 'unique_id', 'AirPassengers')
# Use freq=1 for integer datestamps
fcst = StatsForecast(models=[HistoricAverage()], freq=1)
forecast = fcst.forecast(df=int_ds_df, h=7)
External Regressors
Every column after y is considered an external regressor and will be passed to models that support them:
from statsforecast import StatsForecast
from statsforecast.utils import generate_series
import pandas as pd
# Create data with external regressors
series_xreg = generate_series(10_000, equal_ends=True)
series_xreg['intercept'] = 1
series_xreg['dayofweek'] = series_xreg['ds'].dt.dayofweek
series_xreg = pd.get_dummies(series_xreg, columns=['dayofweek'], drop_first=True)
# Split train/validation
dates = sorted(series_xreg['ds'].unique())
valid_start = dates[-14]
train_mask = series_xreg['ds'] < valid_start
series_train = series_xreg[train_mask]
series_valid = series_xreg[~train_mask]
X_valid = series_valid.drop(columns=['y'])
# Forecast with external regressors
fcst = StatsForecast(models=[your_model], freq='D')
xreg_res = fcst.forecast(df=series_train, h=14, X_df=X_valid)
Distributed Computing
The StatsForecast class offers parallelization utilities with Dask, Spark and Ray backends for distributed computing. See the distributed computing examples for more information.