AI estimating tools that work on a 5-sheet bath remodel break in predictable ways on an 80-sheet pharmaceutical compounding TI. The break shows up as missing scope categories, hallucinated quantities, generic line items that do not match the actual plan, or simply a request body cap that prevents the upload from completing. The BuildCrux multi-pass pipeline is built specifically to not break in those ways on commercial scope. This page documents the architecture.
This is the pipeline as it actually runs in production today. Everything below is observable in the BuildCrux dashboard during a commercial estimating run.
Why commercial TI breaks single-pass AI
- Request body cap: most LLM APIs cap upload bodies at 25-32 MB; commercial plan sets routinely exceed that. Single-pass tools either reject the upload or strip pages silently.
- Context window dilution: 80 sheets of architectural detail compete for the models attention. Important detail on the structural sheet gets missed because the cover sheet, finish schedule, and equipment cut sheets are also competing for context.
- Reasoning time cap: serverless function timeouts (typically 60-300 seconds) cannot accommodate the 4-12 minute reasoning passes required for genuine commercial complexity.
- No tool use: most single-pass AI estimating tools do not give the model a compute_area or lookup_unit_cost tool. Without tools, the model estimates instead of measures, and applies generic unit costs instead of calibrated ones.
- No scope hierarchy: model outputs whatever line items emerge from its attention pattern. Customer-facing commercial proposals need scope grouped by trade with code-driven categories explicit.
Pipeline overview
| Component | Job | Model / tool | Typical commercial runtime |
|---|---|---|---|
| Anthropic Files API upload | Move plan set up to 500 MB to AI accessible storage | Files API | 15-90 seconds depending on file size |
| Pass 1 | Identify every sheet, tag by type | Sonnet 4 | 30-90 seconds |
| Mode detection + confirmation | Surface standard vs detailed pipeline modal | Deterministic logic + UI | ~10 seconds (user input) |
| Pass 2 | Multi-discipline quantity takeoff | Sonnet 4 + compute_area tool | 2-6 minutes |
| Pass 3 (standard) | Priced estimate, residential or simple commercial | Sonnet 4 + lookup_unit_cost | 1-3 minutes |
| Pass 3 (detailed) | Priced estimate, complex commercial | Opus 4.7 + lookup_unit_cost + streaming | 4-12 minutes |
| Post-process | Scope-driven categories, baselines, display name | Deterministic code | <2 seconds |
| Total commercial 30-sheet | — | — | 5-10 minutes |
| Total commercial 80-sheet | — | — | 10-15 minutes |
Pass 1: Multi-discipline sheet identification
The first pass reads every page of the plan set and tags it. For an 80-sheet pharma compounding plan set, Pass 1 completes in 38 to 75 seconds. The output is a structured inventory that drives the rest of the pipeline.
- Architectural sheets: cover, sheet index, existing conditions, demolition, proposed floor plan, elevation, section, finish schedule, door/hardware schedule, wall types.
- Engineering sheets: structural (foundations, slab cuts, equipment supports), electrical (load calcs, panel schedules, lighting, controls), plumbing (USP-compliant water, lab waste, gas), mechanical (HVAC, classified-space pressure, exhaust).
- Fire protection sheets: sprinkler plans, head schedule, modifications, special-hazard suppression.
- Specialty equipment sheets: cut sheets, equipment schedules, install detail.
- Non-drawing pages: energy reports, geotech excerpts, addenda, specifications. Flagged so Pass 2 and Pass 3 skip them — a meaningful efficiency win on commercial plan sets where 5 to 15 percent of pages are non-drawing.
Auto commercial detection + mode confirmation
Between Pass 1 and Pass 2, the pipeline detects whether the plan set is commercial or residential based on Pass 1 output: sheet count, discipline mix (presence of structural / fire / specialty equipment sheets), scope keywords in cover and general notes. A confirmation modal surfaces:
- Standard pipeline (Sonnet 4 for Pass 3): 1 credit, ~3-5 minute total runtime, appropriate for residential and simple commercial.
- Detailed pipeline (Opus 4.7 for Pass 3): 15 credits, ~10-15 minute total runtime, appropriate for multi-discipline commercial TI with scope-driven categories.
The contractor confirms before any credits are deducted. The recommended mode is pre-selected based on detection (commercial defaults to detailed pipeline; residential defaults to standard). Contractor can override either direction.
Pass 2: Multi-discipline quantity takeoff
Pass 2 uses the sheet inventory from Pass 1 and runs takeoff. The model has access to compute_area, a deterministic measurement tool that runs on PDF coordinates the model hands it. When the model needs to know the square footage of a classified-space partition or the linear feet of sealed ductwork, it invokes compute_area and gets a measured answer instead of estimating from the rendered image.
Pass 2 output for the $686K pharma compounding TI included:
Pass 2 takeoff output for the $686K pharma compounding TI: 48 quantity items across 12 trade groups.
| Trade group | Items | Quantity examples |
|---|---|---|
| Demo | 3 items | 2,950 sf TI to studs + dumpster + protection |
| Hazmat abatement | 2 items | 1,850 sf asbestos + 420 sf lead paint |
| Structural | 2 items | 85 lf slab cuts + 125 sf slab patch |
| Framing/partitions | 6 items | 385 lf classified partition + 95 lf non-classified + 1,840 sf sealed ceiling |
| Plumbing | 3 items | 1 USP water loop + 12 fixtures + 85 lf lab waste |
| HVAC | 5 items | 1 dedicated AHU + 24 HEPA boxes + 485 lf sealed duct |
| Electrical | 5 items | 400A-to-800A panel + 48 classified receptacles + 64 cleanroom LEDs |
| Fire protection | 3 items | 85 sprinkler heads + 32 smoke detectors + special hazard |
| Specialty equipment | 7 items | 2 LAF + 2 BSC + 4 pass-throughs + 1 sterilizer + 3 refrigeration + 6 workstations + commissioning |
| Roof/exterior | 2 items | 4 curb cuts + 8 wall penetrations |
| Finishes/detail | 7 items | epoxy paint + standard paint + casework + doors + signage + glazing |
| Closeout | 3 items | final clean + validation support + punch |
Pass 3: Opus 4.7 priced estimate with streaming
Pass 3 is the heaviest reasoning pass. On commercial multi-discipline plan sets the model needs to reason about which scope categories apply, which unit costs are appropriate for classified vs non-classified scope, how commercial uplift varies by trade (specialty equipment uplift is different from finishes uplift), and how to group output into a customer-facing line-item structure.
BuildCrux runs commercial Pass 3 on Opus 4.7 with two infrastructure pieces that single-pass AI tools typically lack:
- Streaming API: Opus 4.7 reasoning on 80-sheet plan sets can take 8 to 12 minutes. Standard serverless function timeouts (60-300 seconds) cannot accommodate this. Streaming keeps the request open during the long reasoning run and surfaces partial output as it completes.
- 1M context beta: the full plan set, the Pass 1 inventory, the Pass 2 takeoff, and the system prompt all need to coexist in context for Pass 3 to reason accurately. The 1M context beta accommodates this on the largest commercial plan sets.
Pass 3 also enforces the 3-tier line-item structure described in the estimating-guide page: universal categories (always present), scope-driven categories (present when triggered by scope), trade-detail categories (present when scope is granular enough). The structure prevents output from being a flat list of 80 line items; it produces customer-readable 40 to 60 line-item estimates grouped by trade.
Post-processing: scope-driven categories + validation
Pass 3 output is not the final estimate. Deterministic post-processing applies:
- Scope-driven category validation: confirm all 5 commercial categories surfaced where triggered (fire, hazmat, structural, specialty equipment, roof). If a category is triggered by Pass 1 sheet inventory but missing from Pass 3 output, flag for contractor review.
- Universal baselines: confirm every estimate includes the universal categories (demolition, general conditions, final clean, dumpster). Add baselines if Pass 3 omitted them.
- Scope filter application: if the user requested a sub-bid scope filter (e.g. millwork-only), strip line items outside the filter.
- Display name auto-generation: pull address from cover sheet, version number from history, generate display name like "1234 Main St, Suite 100 - V1".
- Cost telemetry: log actual model usage and per-line-item costs to ai_usage_log for margin tracking.
Scope filter for sub-trade contractors
Commercial TI sub-trade contractors (electrical, mechanical, fire protection, specialty equipment) often bid off the same plan set as the GC but need only their scope in the output. The scope filter lets a sub generate a single-discipline estimate from the same plan set:
- Pass 1 still tags every sheet (no behavior change).
- Pass 2 still computes takeoff across all disciplines (no behavior change).
- Pass 3 system prompt receives the scope filter as a constraint: produce only line items inside the filter scope. The model can still reference cross-discipline detail (e.g. classified-space requirements that affect electrical), but output is filtered.
- Post-processing applies a second filter pass to catch any leakage.
In testing on the $686K pharma plan set: full-scope output $686,646; millwork-only filter output $86,400 with clean line items; electrical-only filter output $138,820 with clean line items; HVAC-only filter output $225,975 with clean line items. No contamination between filter scopes.
Try the pipeline on your next commercial bid
14-day free trial. Upload a 30 to 80 page plan set. Watch all three passes run.
Get StartedFrequently asked questions
Why use Opus 4.7 instead of Sonnet 4 for commercial?+
Opus 4.7 is the most capable reasoning model in the Anthropic family. On commercial multi-discipline scope (30+ sheets, scope-driven categories, classified vs non-classified space differentiation), Opus produces measurably better line-item structure and scope-category capture than Sonnet. For residential or simple commercial (5-15 sheets, single-discipline dominant), Sonnet is faster and equally accurate.
What happens if the Anthropic API is down or Opus 4.7 is rate-limited?+
Pipeline falls back: Pass 3 retries on Opus first, then falls back to Sonnet 4 with a flag in the output noting the fallback. The contractor sees a clear notice and can re-run later on Opus if they want the higher-quality output. The pipeline does not silently degrade.
How much does a commercial pipeline run cost in credits?+
Commercial AI estimates use 15 credits (versus 1 credit for residential). Credit pricing varies by tier — Crew tier ($149/mo) includes 200 standard credits; the worst-case Opus run on a 80-sheet pharma TI costs approximately $4-5 in actual AI inference cost, well inside the credit margin. Overage pricing is calibrated to maintain 70%+ gross margin even at worst-case AI cost.
Does the pipeline work on hand-marked-up plan sets (markups over original PDF)?+
Yes, with the caveat that AI accuracy on handwritten markups is lower than on clean digital plans. The pipeline reads markups but interprets them as approximate rather than authoritative. Best practice on commercial TI is to ask the architect for a clean revision before bidding; second-best is to flag the marked-up areas during the contractor review step.
Can I see the intermediate output from Pass 1 and Pass 2?+
Yes. The BuildCrux dashboard exposes Pass 1 sheet inventory and Pass 2 quantity list as inspectable intermediate outputs. This is useful for sanity-checking the pipeline before committing to a Pass 3 Opus run that uses 15 credits.
How does the pipeline handle plan sets larger than 500 MB?+
The Anthropic Files API caps at 500 MB. Plan sets larger than that are split per-page using pdf-lib in the upload step, then re-assembled in the model context. The split is rare in practice — even 100-sheet plan sets typically come in under 250 MB. The 500 MB ceiling is a long-tail safety net.
The bottom line
AI estimating works on commercial scope because of architecture, not magic. Multi-pass pipelines with tool use, streaming infrastructure, 1M context, and scope-driven category enforcement beat single-pass prompting on 80-sheet commercial plan sets. BuildCrux is the only AI estimating tool that publishes both the architecture and the validation results. If you bid commercial TI work, this is the pipeline you want behind the takeoff.
See the $686K pharma TI this pipeline produced