Use case · Forecasting
Autonomous AI forecasting — from CSV to deployed prediction API.
Drop your time-series data, describe the horizon, get a forecast. OctOpus picks the right model family across NeuralForecast, Chronos, TiRex, TimesFM, and tree-based models with lags — runs experiments, validates on a real out-of-sample holdout, and deploys a prediction endpoint.
TL;DR. Forecasting is OctOpus's strongest wedge. The agent rotates from a strong tree-based baseline (LightGBM with lags + calendar features) through NeuralForecast (NBEATS, PatchTST, xLSTM, TFT) to foundation models (Chronos, TiRex, TimesFM, Moirai), keeping whichever wins on out-of-sample holdout error.
What OctOpus forecasts well
- Demand and inventory — per-SKU, per-store, per-warehouse. Daily, weekly, monthly horizons.
- Revenue and KPI — per-customer, per-segment, per-product. Multi-step ahead with confidence bounds.
- Energy and load — power consumption, renewable generation, grid balancing.
- Sensor and IoT — environmental, industrial, vehicle telemetry.
- Hydrology and earth science — rainfall-runoff, river stage, flood horizons. Includes physics-informed feature engineering.
- Hierarchical panels — thousands of related series trained as a single global model.
Models the agent rotates through
| Tier | Family | When the agent picks it |
|---|---|---|
| 1 · Baseline | LightGBM with lag, rolling, calendar features | Almost always — fast, strong, interpretable. |
| 2 · Tuned GBM | XGBoost / CatBoost with Optuna | When tier 1 has headroom and the data justifies a search. |
| 3 · Deep / modern | NeuralForecast — NBEATS, PatchTST, xLSTM, TFT | Long horizons, multi-series panels, exogenous signals. |
| 4 · Foundation | Chronos, TiRex, TimesFM, Moirai | Zero-shot first; cold-start data; benchmark anchor. |
| 5 · Stacking | Linear / GBM stacker over diverse base learners | When the residuals of different families are uncorrelated. |
How a forecasting run looks
- Profile. The agent reads the schema, detects the time column, infers granularity (daily / hourly / weekly), spots multiple series, and flags exogenous candidates.
- Plan. Writes a research spec: target, horizon, validation strategy (rolling-origin or fixed cutoff), and which families to try in which order.
- Run. Generates a fresh
train.pyper experiment, executes in a sandbox, reads the metrics. - Diagnose. When something fails, the agent reads the traceback, writes a targeted fix (dtype, missing dependency, cardinality blow-up), and retries.
- Validate. The winner is scored on a holdout slice the LLM never sees.
- Deploy. A prediction endpoint plus a downloadable bundle (
train.py,model.pkl,deploy.zip).
What you get back
- Forecast with confidence intervals at the requested horizon.
- Holdout error per series and aggregate (sMAPE, MAPE, MASE, MAE — the agent picks per data shape).
- Calibration plot and residual diagnostics.
- The exact
train.pythe agent wrote — fully inspectable, fully reproducible. - A deployable prediction API.