AI estimating in 2026 looks the same from the outside: upload a plan set, wait a few minutes, see a priced estimate. The architecture behind that user experience varies wildly. Single-pass tools ask one large language model question and hope. Multi-pass pipelines like the one inside BuildCrux break the work into discrete steps, each with its own context and tools, each verifiable. The difference shows up in accuracy on real plan sets — particularly on larger plan sets and commercial scope where single-pass fails predictably.
This page documents the BuildCrux pipeline as it actually runs in production today. We do not call out competitors by name, but the architecture differences below are observable to anyone who runs the same 30-sheet plan set through multiple AI estimating tools.
Why single-pass AI estimating fails on real plans
The naive way to build AI estimating is one prompt: "Here is a plan set. Give me an estimate." Current-generation LLMs can do this for a five-sheet bath remodel and the output is plausible. Run the same approach on a 30-sheet kitchen plus structural removal and the failure modes are predictable.
- Attention dilution: the model spreads attention across all 30 sheets and misses scope on the structural sheet because the cover sheet and finish schedule are also competing for context.
- No verification step: there is no intermediate output to check. If the model hallucinates a 200 sqft kitchen as 350 sqft, the priced estimate inherits the error silently.
- No tool use: the model "estimates" areas instead of measuring them. Area estimation from a rendered floor plan without a compute_area tool is reliably 8 to 15 percent off.
- No scope hierarchy: the model produces line items in whatever order they came out of the plan. Customer-facing proposals need scope grouped by trade, with allowances and exclusions called out separately.
- Hard token cap: 30-sheet plan sets push against context window limits even on frontier models. Important detail gets truncated.
The BuildCrux multi-pass pipeline
BuildCrux runs three discrete AI passes plus a deterministic post-processing step. Each pass has a focused job, its own system prompt, its own tool access, and a verifiable output. Failures in a single pass do not cascade because the next pass receives a structured intermediate output, not raw model judgment.
| Pass | Model | Job | Tools | Output |
|---|---|---|---|---|
| Pass 1 | Sonnet 4 | Identify every sheet in the plan set | None | Structured sheet inventory |
| Pass 2 | Sonnet 4 | Run quantity takeoff | compute_area | Structured quantity list |
| Pass 3 | Opus 4.7 (commercial) or Sonnet 4 (residential) | Generate priced estimate | lookup_unit_cost | 20 to 40 line-item estimate JSON |
| Post-process | Deterministic code | Apply universal baselines, scope filter, validation | None | Final estimate |
Pass 1: Sheet identification
The first pass reads every page of the plan set and tags it. For a 30-sheet residential remodel, Pass 1 typically takes 25 to 45 seconds. The output is a structured inventory that the next two passes use to focus attention.
- Drawing sheets: cover, sheet index, existing conditions, demolition, proposed floor plan, elevation, section, finish schedule.
- Engineering sheets: structural, electrical, plumbing, mechanical, fire protection.
- Reference sheets: cut sheets, manufacturer specs, fixture schedules.
- Non-drawing pages: energy reports, compliance documents, geotechnical reports, specifications, addenda. These are flagged so Pass 2 and Pass 3 skip them — a key efficiency win on commercial plan sets where 30 to 40 percent of pages are not drawings.
Pass 2: Quantity takeoff with compute_area
Pass 2 uses the sheet inventory from Pass 1 and computes the takeoff. The model has access to one tool: compute_area. When the model needs to know the square footage of a room, the wall area of an elevation, or the linear feet of a cabinet run, it invokes compute_area with the relevant coordinates and gets a measured answer instead of estimating from the rendered image.
Pass 2 produces a structured quantity list. For the $185K Dallas kitchen estimate, Pass 2 output looked like:
Pass 2 quantity takeoff output (excerpt) for the $185K Dallas kitchen estimate.
| Trade | Item | Quantity | Unit | Source sheet |
|---|---|---|---|---|
| Demo | Kitchen finishes to studs | 285 | sf | A1.0 + D1.0 |
| Demo | Primary bath finishes to studs | 142 | sf | A1.0 + D1.0 |
| Framing | Load-bearing wall reinforcement | 1 | ls | S2.1 |
| Plumbing | Fixture relocations | 8 | ea | P1.0 |
| Electrical | New circuits (panel upgrade required) | 4 | ea | E1.0 |
| Electrical | New outlets and switches | 38 | ea | E1.0 |
| Cabinets | Kitchen base + uppers, linear feet | 38 | lf | A2.0 + cut sheets |
| Tile | Primary bath floor + walls + shower | 285 | sf | A2.1 + finish schedule |
| ... (additional rows) |
Pass 3: Priced estimate with lookup_unit_cost
Pass 3 takes the quantity list from Pass 2 and produces a priced line-item estimate. The model has access to one tool: lookup_unit_cost. For every quantity, the model calls lookup_unit_cost with the scope tag and region, and gets back a calibrated unit cost. The lookup table contains 100+ entries covering every common residential remodel line item, calibrated quarterly against national material indices and regional labor data.
Pass 3 also enforces a 3-tier line-item category structure: universal categories (always present), scope-driven categories (present when triggered by scope), and trade-detail categories (present when scope is granular enough). The structure prevents the output from being a flat list of 80 line items; it produces a customer-readable 20 to 40 line-item estimate grouped by trade.
| Tier | Always present | Examples |
|---|---|---|
| Universal | Yes | Demolition, general conditions, final clean, dumpster |
| Scope-driven | When scope triggers | Structural reinforcement (when load-bearing removal), HVAC (when ductwork modification), fire protection (commercial) |
| Trade-detail | When granularity warrants | Individual fixture types, finish schedule specifics, equipment-specific items |
How the pipeline handles 80+ page commercial plan sets
Residential plan sets fit comfortably in the pipeline. Commercial plan sets stretch it. BuildCrux validated against an 80-page pharmaceutical compounding facility tenant improvement (TI) plan set in April 2026. ChatGPT-cross-validated reference range for that project was $700K to $850K. The BuildCrux pipeline produced $686,646 with 48 line items including the five scope-driven commercial categories (Fire Protection, Roof Repair, Hazmat Abatement, Structural Reinforcement, Specialty Equipment).
The commercial pipeline differs from residential in three ways:
- Larger context: commercial plan sets use the Anthropic Files API to handle PDFs up to 500 MB; residential typically fits in a single request.
- Streaming: commercial Pass 3 with Opus 4.7 can run 4 to 10 minutes; streaming keeps the request open and surfaces progress.
- Commercial uplift multiplier: post-processing applies a commercial complexity multiplier to direct costs to reflect the additional general conditions, supervision, and quality control overhead that commercial scope demands.
How the pipeline handles scope filtering for sub-bids
Sub contractors bidding off the same plan set as the GC need a different output: only the trades in their scope, none of the other line items contaminating the estimate. BuildCrux supports a scope filter at the time of estimate generation. Pass 1 still tags every sheet, Pass 2 still computes the takeoff, but Pass 3 outputs only line items inside the filter scope. A millwork sub bidding the Dallas kitchen sees only cabinet line items, not the full 22-line estimate.
In testing, the scope filter cleanly produced a $299K millwork-only sub-bid from the same plan set that produced the $721K full-trade estimate, with no contamination from non-millwork line items.
See the pipeline run on your plan set
14-day free trial. Upload a real PDF. Watch all three passes run.
Get StartedFrequently asked questions
Why does multi-pass work better than single-pass for AI estimating?+
Each pass has a focused job and a verifiable output. Sheet identification, quantity takeoff, and priced estimate are three different cognitive tasks; asking a single AI prompt to do all three forces attention dilution that drops accuracy 10 to 20 percent on real plan sets. Multi-pass lets each step succeed or fail independently, and intermediate outputs are checkable.
How is the compute_area tool different from just asking the AI to estimate?+
compute_area is a deterministic measurement function that runs on the PDF coordinates the AI hands it. It is not the AI guessing from the rendered image. The AI invokes the tool with input like "measure the floor area bounded by these coordinates on sheet A1.0" and gets back a precise square footage. This single architecture change accounts for most of the accuracy difference between BuildCrux and single-pass AI estimating tools.
How is the lookup_unit_cost tool calibrated?+
The lookup table contains 100+ entries covering common residential and commercial remodel line items. Each entry is updated quarterly against three data sources: BLS Producer Price Index for materials, BLS wage indices for skilled trades by region, and aggregated bid data from BuildCrux customers. Contractors can override the default unit costs with their own calibrated values, and the AI inherits those overrides on every future estimate.
Why does Pass 3 sometimes use Opus and sometimes Sonnet?+
Pass 3 is the heaviest reasoning pass. For commercial multi-discipline plan sets (30+ sheets, multiple trades, scope-driven categories like fire protection or hazmat), Opus 4.7 produces measurably better output. For residential remodels (5 to 15 sheets, standard trade scope), Sonnet 4 is faster and equally accurate. BuildCrux auto-detects which mode applies from Pass 1 output and asks the contractor to confirm before charging credits.
What happens when a pass fails?+
Each pass has its own retry logic. Pass 1 failures (rare) re-run with a relaxed prompt. Pass 2 failures retry with chunked input. Pass 3 failures fall back to Sonnet if Opus times out. If all retries fail, BuildCrux refunds the credit and surfaces the failure mode to the contractor. The pipeline does not silently produce a bad estimate.
How long does the full pipeline take?+
Residential 5 to 15 sheet kitchen or bath remodel: 60 seconds to 3 minutes end-to-end. Residential 30-sheet kitchen plus bath plus structural: 3 to 5 minutes. Commercial 50-page TI: 6 to 10 minutes. Commercial 80-page complex TI: 10 to 15 minutes. The variability is in Pass 2 and Pass 3 reasoning time; Pass 1 is consistently fast.
The bottom line
AI estimating works because of architecture, not magic. Multi-pass pipelines with tool use beat single-pass prompting on real plan sets because they break the work into verifiable steps and give the AI access to measurement tools instead of asking it to estimate from rendered images. BuildCrux is built on this pipeline; it is the reason the same input that produces a noisy single-pass estimate produces a clean 22-line BuildCrux estimate.
See a real $185K kitchen estimate this pipeline produced