Use case · Classification
Autonomous AI classification — binary, multi-class, multi-label.
Drop a labeled dataset, get a deployed classifier. OctOpus detects the task shape, picks the model family — tabular GBM, deep tabular, or NLP transformer — handles imbalance, calibrates probabilities, and exposes a prediction API.
TL;DR. Classification is the workhorse shape. OctOpus's strength here is automatic task-shape detection (binary / multi-class / multi-label / NLP), automatic imbalance handling, automatic threshold tuning on the holdout, and calibrated probabilities — all without you specifying any of it.
What OctOpus classifies well
- Risk and fraud — transaction fraud, application risk, claim review.
- Lead and customer scoring — propensity to convert, propensity to expand.
- Ticket and intent routing — multi-class support routing, NLU intent.
- Sentiment and content moderation — review sentiment, toxicity, topic tagging.
- Quality control — pass/fail, severity classification.
- Medical / scientific labeling — diagnostic codes, taxonomy assignment, sample classification.
Models the agent rotates through
| Data shape | Tier 1 baseline | Tier 2-3 escalation | Tier 4 foundation |
|---|---|---|---|
| Tabular (small, n < 500) | Ridge / ElasticNet | CatBoost | TabPFN zero-shot |
| Tabular (mid, 500-10k) | CatBoost / LightGBM | XGBoost + Optuna · TabPFN | TabPFN |
| Tabular (large, 10k+) | LightGBM | XGBoost + Optuna · TabNet · FT-Transformer | — |
| Text / NLP | TF-IDF + LogReg | Linear SVM · Gradient boost over embeddings | HuggingFace transformer fine-tune |
| Image (Enterprise) | ResNet18 fine-tune | EfficientNet · ViT | — |
How OctOpus handles the things classifiers usually break on
- Imbalanced labels. Class weights, focal loss, threshold tuning on the holdout — applied when severity is detected.
- Calibration. Isotonic / sigmoid calibration when raw scores aren't trustworthy. Calibration curve reported.
- High-cardinality categoricals. CatBoost native handling, target encoding with leak protection, hashing for very wide cats.
- Leaky features. The agent flags suspicious features that look like the target shifted in time.
- Multi-label. Per-label threshold tuning and per-label F1 reporting.
What you get back
- Class probabilities and predicted label per row.
- ROC, PR curve, confusion matrix, per-class metrics.
- Calibration curve and reliability diagram.
- SHAP feature importance for explainability.
- The exact
train.pyandmodel.pkl. - A deployed prediction API.