Use case · Lead scoring
Autonomous AI lead scoring — calibrated propensity-to-convert, in minutes.
Drop your CRM export — leads, opportunities, firmographics, engagement events. OctOpus profiles the data, picks the right model, calibrates conversion probabilities, validates on a real holdout, and ships a scoring endpoint your Salesforce or HubSpot can call.
TL;DR. Most lead-scoring projects die in a six-week consulting engagement. OctOpus runs the same project end-to-end — autonomously — in under an hour. CatBoost or LightGBM with calibrated probabilities, a real out-of-time holdout, SHAP attributions for sales-team trust, and a deployable scoring API your CRM can call.
Lead-scoring problems OctOpus handles well
- B2B MQL → SQL scoring — convert marketing-qualified leads to sales-qualified with calibrated probability.
- B2B opportunity-to-close — predict close probability per opportunity for pipeline forecast and forecast accuracy.
- B2C propensity-to-purchase — score in-funnel visitors on likelihood to buy within a window.
- Upsell / cross-sell propensity — which existing customers are likely to expand and on what product.
- Reactivation propensity — which churned or dormant customers will respond to a win-back offer.
- ABM account-fit scoring — firmographic + intent signal scoring for target account lists.
Models the agent rotates through
| Tier | Family | When the agent picks it |
|---|---|---|
| 1 · Baseline | CatBoost with isotonic calibration | Almost always — strong on heterogeneous categorical CRM data. |
| 2 · Tuned GBM | LightGBM / XGBoost with Optuna, time-based CV | When tier 1 has headroom on lift at the top decile. |
| 3 · Small-data | TabPFN, FT-Transformer | n < 10k labeled conversions; rich categorical interactions. |
| 4 · Stacking | Linear stacker over GBM + neural base learners | When residuals are uncorrelated. |
| Always | Calibrated logistic regression as benchmark | For sales-team interpretability and "why" explanations. |
How a lead-scoring run looks
- Profile. Detects the lead/opportunity ID column, the outcome label, the time of creation vs time of conversion (essential for avoiding label leakage), and engagement-event aggregation candidates.
- Plan. Writes a research spec: AUC and lift-at-top-decile as primary metrics. Isotonic calibration. Time-based holdout to mimic deployment conditions.
- Run. Generates a fresh
train.pyper experiment, executes in sandbox, emits calibration plot and per-feature importance. - Diagnose. When something fails (label leakage, NaN explosions on engagement counts, target collapse), the agent writes a targeted fix and retries.
- Validate. Out-of-time holdout the LLM never sees — scoring on tomorrow's leads, not yesterday's.
- Deploy. Scoring endpoint plus a deploy bundle. Enterprise customers can push scores back to Salesforce / HubSpot via a connector.
What sales and revenue ops get back
- Calibrated conversion probability per lead, opportunity, or account.
- Lift curves and decile reports — pick the cutoff that fits your SDR capacity.
- Per-record SHAP attributions so SDRs see "why" a lead is hot.
- Out-of-time holdout report you can hand to revenue ops for sign-off.
- The exact
train.pythe agent wrote — inspectable and reproducible. - Scoring endpoint, or Salesforce / HubSpot push integration on Enterprise.