Production benchmark
What we run in production
Everyone in construction tech claims coverage. Here is the actual workload on real bid packages.
All-time counts below. Per-family accuracy publishes Q3 2026.
Production benchmark
Everyone in construction tech claims coverage. Here is the actual workload on real bid packages.
All-time counts below. Per-family accuracy publishes Q3 2026.
Design-spec extractions are one slice of total prompt volume, counted separately because spec pages dominate MEP bid packages.
Six stages, thirty-eight prompts, fifteen families. Each execution maps to one prompt on a real project document.
Family · prompts
You would not quote a $50M mechanical package without reading the spec. Software should show the same receipts: which prompts run, how often, on real documents.
The usual pitch: huge efficiency gains and vague AI claims. The useful question: show production workload you can verify.
Production workload as of Q2 2026: 3.3M prompts all-time, 1.6M design-spec extractions (a subset of that volume), and thirty-eight production prompts in six stages and fifteen families on real project documents.
Six stages (Classify, Tables, Extract, Project & Contacts, Email & Bids, Validate) covering fifteen prompt families and thirty-eight prompts total. The grid above lists each family; the number beside a family is how many prompts it contains.
Q2 2026. All-time prompt counts use the same methodology we have always used for measurement.
Q3 2026. Accuracy by prompt family will appear in the grid when it ships.
Forward a bid package. We run the same production pipeline we publish in our benchmark — equipment, specs, and schedules extracted with source page references you can verify.
All-time prompt counts as of Q2 2026. Per-family accuracy publishes Q3 2026.