DATA PREP CHARTS PLAN EXP RESULTS
Data Analysis

Ask & explore

summarise, plot, segments, BI — answered in plain English the moment you load the file.

"how many companies?" · "plot ARR" →
Data Science

Train a model

forecast, predict, classify, optimize — OctOpus runs experiments and ships a model.pkl.

"predict who hits $10M ARR" →
Data Engineering

Clean & transform

clean, dedupe, normalise, ETL — every message becomes a transform step on your data.

"drop dupes · normalise arr" →
Connect your data
21 connectors · or drop a CSV / Parquet / ZIP
CSV / Parquet
file upload
BigQuery
warehouse
Snowflake
warehouse
Databricks
warehouse
Redshift
warehouse
Postgres
database
MySQL
database
MongoDB
database
Amazon S3
object store
Google Sheets
SaaS
Notion
SaaS
ClickHouse
warehouse
yc_recent_batches.csv
100 companies · 17 cols · batches S22 → S24
company_namebatchindustryarr_usdpost_money_valuation_usdmom_growth_pctmonths_since_demo_day
YukonSecS23Security541,15342M14.212
NovaOSW23Dev Tools532,89038M9.818
SpireAPIS22API Infra463,72455M11.424
EchoLabsW24AI / ML421,224120M22.56
IrisPayS22Fintech403,95068M8.124
HaloLabsS23AI / ML253,50472M17.812
ZeroBioW23Biotech235,96128M6.218
FluxSecS24Security204,23815M28.43
ArcEngineS24Dev Tools193,45412M31.03
KaiTechW24AI / ML191,92385M19.26
Data Engineering · live transforms
DE mode on
DROP post_money_valuation_usd remove column
FORMAT founded text → date · YYYY-MM
NEW COL arr_per_employee = arr_usd / employees
MERGE + funding_rounds.csv left join on company_id
Result · 97 rows · 6 cols recipe saved
company founded arr_musd arr_per_employee NEW last_round MERGED
YukonSec2023-040.5431.8kSeed
NovaOS2023-010.5329.6kSeries A
SpireAPI2022-080.4624.4kSeries A
EchoLabs2024-020.4242.1kSeed
IrisPay2022-060.4018.4kSeries B
ARR distribution across industries
100 companies · median $23.6k · skew right
AI / ML
$284k
Fintech
$231k
Dev Tools
$190k
Security
$148k
API Infra
$112k
Biotech
$64k
Dashboard · YC cohort overview
auto-generated
$7.06M
Total ARR
97
Companies
14.8%
Median MoM growth
AI / ML
Top industry
ARR by industry
ARR trend · S22 → S24
Industry mix
MoM growth vs ARR
Research plan program.md
targetreached_10m_arr (binary derived from arr_usd)
taskclassification
metricROC-AUC, holdout-validated
splitgroup k-fold by batch (so later cohorts can't leak into training)
featuresindustry, founders, technical_founders, employees, last_round_usd, runway_months, mom_growth_pct, has_paying_customers, months_since_demo_day
experiments1) CatBoost · 2) LightGBM (Optuna 30 trials) · 3) TabPFN · 4) stacking ensemble
guardrailsleakage detection, holdout outside agent workspace, scrubbed env vars
Research loop
running 4 experiments
1
CatBoost · baseline
AUC 0.82
2
LightGBM · Optuna ×30
AUC 0.85
3
TabPFN · zero-shot
AUC 0.87
4
Stacking ensemble · 3 base learners
AUC 0.89
WINNER
Stacking ensemble · validated winner
0.89AUC · holdout
Octopus
Hi, I'm OctOpus, your AI Data Scientist. Drop a dataset and tell me your goal — I can train a model or answer questions about your data.
21:29
Octopus
Got it. I can see your yc_recent_batches.csv dataset. Profiling it now…
21:30
drop duplicate rows, normalise arr to $M, and flag companies past $10M ARR
21:30
Octopus
DE mode — applied 3 transform steps: removed 3 duplicates, normalised arr_usd → arr_musd, and derived reached_10m_arr. Saved as a reusable recipe — your data is clean and model-ready.
21:30
Octopus
GOALWhat's your goal?
Skip
21:30
Data insight (Ask OctOpus)
21:30
Octopus
Insights · FocusWhat should I inspect?
Skip
21:30
Summary
21:30
Octopus

Here's what I see in this dataset:

It covers Y Combinator companies from recent batches — 17 columns capturing identity, funding, growth, and team composition.

Key financial signals available: arr_usd, last_round_usd, post_money_valuation_usd, runway_months. mom_growth_pct is the velocity signal.

Team data: founders, technical_founders, employees capture team composition.

Stage spread: has_paying_customers is a binary flag separating pre-revenue from revenue-generating companies.

Targets worth modeling: post_money_valuation_usd, runway_months, or a derived reached_10m_arr binary.

21:31
what's the total arr and the top companies?
21:31
Octopus
Across 100 companies, total ARR is $7,056,039. Top by ARR is YukonSec at $541,153 — there's a notable spread between leaders and the rest.
company namearr_usd
YukonSec541,153
NovaOS532,890
SpireAPI463,724
EchoLabs421,224
IrisPay403,950
+ 95 more rows
21:31
show me arr distribution across industries
21:31
Octopus

ARR distribution across industries

AI/ML Fintech DevTools Sec API Bio SaaS Other
arr_usd ranges $0 → $541k, median $23.6k, mean $110.2k (right-skewed). AI/ML leads with $284k median.
21:31
build a dashboard of the key metrics
21:31
Octopus
Dashboard ready. Built 4 KPI tiles and 4 charts — ARR by industry, ARR trend, industry mix, and growth vs ARR. Opened it in the workspace; export as PNG or embed anytime.
21:32
train a model that predicts $10M arr
21:32
Octopus
Plan ready. Group-fold by batch so later cohorts can't leak. Open the Plan tab to inspect, or type go.
21:32
looks good, run it
21:32
Octopus
Running 4 experiments — CatBoost baseline, LightGBM with Optuna, TabPFN zero-shot, and a stacking ensemble of the strongest base learners.
21:33
Octopus
Done. Best model: stacking ensemble at AUC 0.89 (holdout). model.pkl ready in the Results tab.
21:35
OctOpus is idle
Ask anything about your data or model
Benchmarks under the demo

See the public leaderboard right after the product walkthrough.

Same story, same page: first watch how OctOpus works, then inspect the reference-backed benchmark results. The leaderboard keeps the raw deltas visible against AutoML baselines instead of hiding the misses.

Leaderboard
Reference-backed datasets only · latest run per dataset
Dataset Task Winner Score vs Baselines Runtime Status
Loading benchmarks…
Pulling the latest public leaderboard.
Score
loading

Reproduce any benchmark in the product

If someone wants to sanity-check the numbers, they can run the same dataset in OctOpus directly and compare the final holdout metric against the published baseline.

# Reproduce a public benchmark
1. Open OctOpus
2. Upload the same dataset used in the benchmark
3. Use the same target column and prediction goal
4. Run the full research loop
5. Compare the final metric against the published baseline