Production benchmark

What we run in production

Everyone in construction tech claims coverage. Here is the actual workload on real bid packages.

All-time counts below. Per-family accuracy publishes Q3 2026.

3.3M
prompts all-time
1.6M
design specs

Design-spec extractions are one slice of total prompt volume, counted separately because spec pages dominate MEP bid packages.

Six stages, thirty-eight prompts, fifteen families. Each execution maps to one prompt on a real project document.

Family · prompts

Classify

  • Document & drawing classification3
  • Duplicate project check1

Tables

  • Bounding-box detection2
  • Linking & alternates2

Extract — Equipment & Specs

  • Equipment6
  • Specs & notes3
  • Design specs2

Project & Contacts

  • Project identity2
  • Scope-sheet Q&A2
  • Organization for contact1

Email & Bids

  • Intake & parsing3
  • Email → project4
  • Email → project update3

Validate

  • Extraction validation2
  • Metadata & quantity validation2

You would not quote a $50M mechanical package without reading the spec. Software should show the same receipts: which prompts run, how often, on real documents.

The usual pitch: huge efficiency gains and vague AI claims. The useful question: show production workload you can verify.

Same pipeline we run for customers, published here so you can verify workload.

Updated quarterly · Q2 2026

All-time prompt counts as of Q2 2026.

Editorial context and operator interviews live at From the field. That is a different format from this benchmark.

FAQ

4 articles

Production workload as of Q2 2026: 3.3M prompts all-time, 1.6M design-spec extractions (a subset of that volume), and thirty-eight production prompts in six stages and fifteen families on real project documents.

Six stages (Classify, Tables, Extract, Project & Contacts, Email & Bids, Validate) covering fifteen prompt families and thirty-eight prompts total. The grid above lists each family; the number beside a family is how many prompts it contains.

Q2 2026. All-time prompt counts use the same methodology we have always used for measurement.

Q3 2026. Accuracy by prompt family will appear in the grid when it ships.

See it on your documents

Forward a bid package. We run the same production pipeline we publish in our benchmark — equipment, specs, and schedules extracted with source page references you can verify.

  1. Forward a bid package to bids@buildvision.io (or link Outlook when you sign up).
  2. We process schedules, specs, and equipment lists with the production pipeline.
  3. You receive structured extractions with source page references to verify.

All-time prompt counts as of Q2 2026. Per-family accuracy publishes Q3 2026.