
Key takeaways
• Software estimation is a decision tool, not a deadline. Its job is to surface the cheapest path to a product you can sell — not to predict the future to two decimal places.
• Accuracy is bought with scope locks, not with hours. The Cone of Uncertainty shows early estimates can legitimately land anywhere from 0.25× to 4× — a 16× spread that only narrows as decisions get signed.
• Stack three methods, not one. Analogous for the pitch, bottom-up plus 3-point PERT for discovery output, Planning Poker every sprint. Any single number from any single method is wrong.
• The best contract is a paid discovery, then a fixed build. Time-and-materials for the first 2–4 weeks to compress the cone, fixed-bid with change-order discipline for the rest. Pure fixed-bid on a blank page is how projects end up in the 52.7% that blow budget.
• Agent Engineering changes the math in 2026. Well-used AI coding agents cut raw build time 25–40% on greenfield features — but only if the estimate also grows review, QA and senior-review lines. Cut those and defect rates rise 1.7×.
Why Fora Soft wrote this playbook
Since 2005 Fora Soft has shipped 625+ software products — most of them real-time video, AI and audio platforms where one misjudged SDK choice can double a quote. Over two decades of scoping calls we have written, signed and delivered every shape of estimate there is: napkin pitches for founders, 40-page Master Service Agreements for Fortune-listed enterprises, sprint-by-sprint burndown plans, outcome-based engagements and everything between. This playbook compresses what actually holds up under a signed purchase order.
We run discovery before every fixed quote, we track our own estimate-vs-actual variance per project, and we know which line items vendors routinely forget — because we used to forget them too. If you want a second opinion on a quote you already have, or a clean discovery to replace the number you don’t trust, see our planning & analytics service or how Fora Soft runs a build end-to-end.
Got an estimate from another vendor you don’t quite trust?
Send us the PDF — we will mark the missing line items, challenge the assumptions and send a second opinion back inside 48 hours. No commitment, no pitch deck.
What software estimation actually is (and isn’t)
Software estimation is the discipline of forecasting how much time, money and risk a software project carries — so a buyer can make decisions. That is the full definition. It is not a promise, not a deadline and not a commitment. Steve McConnell, who wrote the canonical book on the subject, draws the line cleanly: an estimate is a probabilistic range, a target is a business goal, and a commitment is a promise to hit a target. Vendors that collapse those three into one number are not doing estimation — they are quoting.
For a founder or CTO hiring a partner, the practical question is narrower: “What’s the cheapest thing I can do right now to move my estimate from a range I can’t bank on to a range I can?” The rest of this guide answers that question, stage by stage.
Why half of software projects still blow their budget
The numbers haven’t improved much in thirty years. The Standish Group’s CHAOS report consistently finds that only around 31% of projects finish on time and on budget, while 52.7% exceed budget by an average of 189% and about 19% are outright cancelled. PMI’s 2025 Pulse of the Profession puts the global on-budget success rate at roughly 50%. CISQ pegged the cost of poor software quality in the US alone at $2.41 trillion per year, with technical debt another $1.52 trillion.
The underlying failure modes are depressingly consistent across studies: scope creep (affecting 52–70% of projects), poor requirements gathering (cited in ~39% of failures), missing non-functional work (security, accessibility, performance, DevOps), and anchoring — picking a number that matches the budget rather than the scope. Nearly all of them live upstream of the code.
The Cone of Uncertainty — your estimate has a legal range
Barry Boehm first drew, and McConnell popularised, the Cone of Uncertainty: the observation that estimate variability is a function of how much has been decided, not how much has been discussed. At “initial concept” — the first 30-minute call — a rigorous estimate can legitimately land anywhere from 0.25× to 4× the final actual. That is a 16× spread. Pretending otherwise is malpractice.

Figure 1 — The Cone of Uncertainty. Each stage on the x-axis represents a decision, not a date. The cone only closes when decisions get signed.
Two misread this diagram in dangerous ways. Founders sometimes think the cone narrows with time — it does not. A team that has been “scoping” for four weeks with no written user story map, NFRs or acceptance criteria is still at ±4×. Vendors sometimes sell a fixed bid at the “pitch” stage and absorb the variance by padding 2–3×. Either the padding is right and the buyer overpays, or it is wrong and the project dies mid-build.
Reach for the cone when: a vendor quotes a single hard number for a 6-month build after one discovery call. Ask them which stage of the cone they think they’re at, and why.
Three estimate types — match the type to the decision you’re making
Not every estimate needs the same rigour. Spending four weeks on a high-fidelity estimate for a decision worth $30k is a waste. The PMI practice standard categorises estimates by precision, and we map them to buyer decisions below.
| Type | Accuracy | Effort to produce | Use it for | Don’t use it for |
|---|---|---|---|---|
| Rough order of magnitude (ROM) | −25% to +75% | 30–90 min | Deciding whether the idea is worth a proper discovery at all | Signing a fixed-bid contract |
| Budgetary | −15% to +25% | 1–2 weeks | Board approval, CFO budget line, go/no-go | Committing to a calendar launch date |
| Definitive | −5% to +10% | 3–6 weeks of paid discovery | Fixed-bid contract, launch-date commitment | Validating whether to build at all |
| Sprint-level | ±10% per sprint (after 3–4 sprints) | 2–4 hours per sprint | Sprint capacity planning, next-release forecast | Long-horizon commercial commitments |
Nine out of ten founders we meet are asking for a definitive estimate at a stage that only supports a budgetary one. The fix is not more meetings; it is a paid discovery that compresses the cone enough to earn the extra precision.
Six estimation methods a serious vendor still uses
These are the methods we actually reach for inside Fora Soft, with the scars to explain why. Treat them as a toolkit, not a menu — stack at least two per estimate and compare.
Analogous (expert judgement from past projects)
Pull up a finished project that looks architecturally similar, take its actuals, adjust for size, tech stack and team skill, publish the result. Cheap, fast, honest about its own assumptions, surprisingly accurate when historical data exists. Fails when the new project has a truly novel component (first LLM integration, first FDA device) with no analogue in your history. Works well at the pitch / concept stage.
Top-down (decompose from project to epics)
Break the project into 6–10 epics, assign a range to each from analogous history, sum. Great for the CFO view. Poor for sprint-level planning because it hides the 300-story detail that drives actual burn. Good first pass during discovery to sanity-check the bottom-up.
Bottom-up / Work Breakdown Structure
Decompose down to tasks of 4–16 engineer-hours, estimate each, sum with PM/QA/DevOps overhead. Most accurate once you have a complete user story map, acceptance criteria and architecture. Expensive to produce — we charge for this during discovery — and catastrophically wrong if fed incomplete scope. This is what a definitive estimate is built on.
Parametric (COCOMO II, Function Points)
Feed a model (size in KLOC or function points, complexity factors, team experience) and get a mathematical output. Useful as a tie-breaker when top-down and bottom-up disagree. Caveat: the original COCOMO II calibration dates to 1990s waterfall projects; academic studies show uncalibrated COCOMO can produce ~100% error on modern cloud / microservice / AI-assisted stacks. If a vendor cites COCOMO, ask when they last recalibrated against their own actuals.
Planning Poker (consensus story points)
Mike Cohn’s classic: the whole engineering team plays modified Fibonacci cards (1, 2, 3, 5, 8, 13, 20, 40, 100) for each story; outliers explain, the team re-votes, consensus lands. Catches hidden complexity that one-person estimates always miss, and builds team ownership of the plan. After 3–4 sprints, velocity stabilises and the same cards convert to reliable days. Do not use it at the pitch stage — you don’t have a team yet.
3-Point PERT + Monte Carlo
For every bottom-up line, capture three numbers: Optimistic (O), Most Likely (M), Pessimistic (P). Expected duration is (O + 4M + P) / 6; standard deviation (P − O) / 6. Gives you honest confidence bands rather than false-precision point numbers. Monte Carlo runs 10,000+ simulated schedules over those distributions and produces a curve — for example, “50% chance to finish by 14 weeks, 90% chance by 18 weeks.” That curve is how you negotiate the launch-date clause.
Which method fits which stage — a fit matrix
The matrix below maps each method to each decision stage. Darker cells mean stronger fit. Stack at least two methods per estimate — if they disagree by more than 1.5×, your scope is not locked enough yet.

Figure 2 — Which estimation method fits which stage. Darker = stronger fit.
How discovery compresses the estimate, stage by stage
Most of the cost overruns in the Standish data come from one pattern: signing a fixed-bid at stage 1 of the cone and then discovering the scope at stage 4. Our counter is a staged discovery where each deliverable buys a tighter estimate.

Figure 3 — The discovery funnel. Each artefact compresses the cone before you sign the build.
Stages 2 through 4 typically cost 5–8% of total budget — and in our own engagement history they save 30–50% of the downstream build by killing bad assumptions before code is written. See why launching lean beats a full-scope push for how we usually scope stage 2.
What a complete estimate must price (and what 80% of quotes miss)
Most under-quotes happen because the line items for functional features are right but the ones around them aren’t. Here is the full checklist we work off internally; ask any vendor to show you at least 12 of these explicitly.
| Line item | Typical share | Often missing |
|---|---|---|
| Feature engineering (happy path) | 35–45% | — |
| Edge cases, error handling, empty states | 10–15% | Almost always |
| QA (manual + automated + regression) | 15–25% | Often undersized |
| UI / UX design iteration | 8–12% | Quoted only for v1 screens |
| DevOps, CI/CD, infra setup | 5–10% | Often |
| Security / compliance (SOC 2, HIPAA, GDPR) | 5–15% | Almost always |
| Accessibility (WCAG 2.2 AA) | 3–6% | Usually |
| Performance / load testing | 2–5% | Usually |
| PM, architecture, tech lead oversight | 10–15% | Hidden inside “team” |
| Risk buffer / contingency | 10–20% | Often disguised as “buffer” |
If an estimate adds up to roughly the sum of feature engineering hours only, the vendor has quoted 35–45% of the real cost and is either planning to absorb the rest or — more commonly — to recover it through change orders later.
Contract models: who owns the risk when the estimate is wrong
The estimate feeds the contract, and the contract allocates risk. The five shapes below are the ones that actually show up in our MSAs in 2026.
| Model | Best for | Price predictability | Flexibility | Who carries estimate risk |
|---|---|---|---|---|
| Fixed-bid | Tight, proven scope after discovery | High | Low | Vendor (paid for via padding) |
| Time & materials (T&M) | Exploratory, research, fast-changing scope | Low | High | Buyer |
| Capped T&M | Discovery phases, prototype sprints | Medium | High | Shared |
| Discovery T&M + build fixed | Greenfield product — our default | Medium then High | High then Medium | Shared |
| Outcome-based | Measurable KPIs, strong baseline | Variable | Medium | Shared (vendor has upside) |
Reach for “discovery T&M then build fixed” when: you are a founder building a new product from zero. It is the only shape that lets the estimate earn its accuracy before you sign for it.
For a parallel perspective on hiring mechanics see how we select developers for a project.
Want a paid discovery that lands a definitive estimate?
Two-week sprint, fixed price, deliverable is a signed user-story map, architecture, risk register and a ±10% build quote you can take to the board.
Agent Engineering in 2026 — what it actually changes in your estimate
Agentic coding tools have shifted the cost curve, but not uniformly. The headline numbers are encouraging: Anthropic’s 2026 Agentic Coding Trends report puts AI-tool adoption at roughly 90% of working developers, and cites ~41% of newly written code as AI-generated. Our own internal telemetry (across real-time video, AI and edtech projects) shows 25–40% reduction in feature engineering hours on greenfield work where the domain is well-represented in training data.
The catch is downstream. GitHub’s own productivity study and McKinsey’s 2025 State of AI both flag the same pattern: AI-generated code carries ~1.7× more defects in pull-request review if review effort stays flat. Senior engineers extract roughly 5× the value juniors do. Translated to an estimate, that means:
1. Feature engineering shrinks ~25–40% on greenfield. Familiar stacks (CRUD, API glue, test scaffolding, boilerplate) get the biggest boost. Bespoke video pipelines, domain-specific ML, SDK integrations and hardware-adjacent work barely move.
2. QA and code review grow ~15–25%. More code per hour means more review per hour. Cutting review is the single fastest way to ship a 1.7×-defects product.
3. Staff mix leans senior. The senior-lead / mid / junior mix that made commercial sense in 2022 (roughly 1:2:2) now pays better at ~1:2:1, with one or two senior reviewers per pod.
4. Net effect on a typical 12-week build. Our own estimate-vs-actual data says 15–25% net reduction in total cost, not the 40% the marketing decks claim. Any vendor quoting a flat “AI makes it 50% cheaper” without adjusting review lines is inflating their margin.
For a real delivery case with numbers attached, see how AI cut 40% off development time on a 1M+ line video streaming platform — that 40% number is on the raw coding hours, not the total project.
Mini case — how a discovery rescued a $420k mis-estimate
Situation. A US-based edtech founder came to us with a fixed-bid quote from another vendor: $420k for a tutoring platform with live video, shared whiteboard, AI lesson-plan generation and LTI integration with a dozen LMS products. The quote was 26 pages of feature bullets, no NFRs, no WBS, no risk register. Build was planned for 14 weeks.
12-week plan. We ran a 3-week paid discovery (~$18k). Deliverables: user-story map (147 stories), risk register with 31 entries, reference architecture, NFR brief including FERPA and SOC 2 implications, and a bottom-up + 3-point PERT estimate with Monte Carlo schedule. Two decisions the original quote had not priced: (a) LTI integrations needed to cover 12 LMS variants rather than one, and (b) the AI lesson-plan feature required a human-in-the-loop review flow for FERPA-covered districts.
Outcome. True scope at ±10% landed at $540k / 18 weeks — 28% higher than the original quote and 29% longer. The founder went back to the first vendor, who eventually conceded they would have change-ordered the delta at month 3 or 4. The client signed our fixed build, shipped on the 18-week plan, spent 94% of budget, and avoided the mid-project renegotiation that kills most edtech rounds. See also the ALDA AI course-generator build for a comparable edtech case with 500k+ students.
A directional cost model — typical ranges we see in 2026
These are ranges we have seen in our own pipeline, not benchmarks. Treat them as an analogous-estimation reference, not a quote. Anyone quoting a point number without discovery is guessing.
| Product shape | MVP range (USD) | Timeline | Key risk driver |
|---|---|---|---|
| Internal tool / B2B dashboard | $35k–$90k | 6–12 wk | Integrations with legacy systems |
| Consumer mobile (1 platform) | $60k–$140k | 8–16 wk | Design iteration, store review |
| SaaS with subscription + admin | $90k–$220k | 12–24 wk | Payments, roles, multi-tenancy |
| Real-time video / telehealth | $130k–$320k | 16–28 wk | SFU / transport, compliance |
| AI-assisted product (LLM + RAG) | $90k–$260k | 10–20 wk | Eval harness, inference cost |
| Edtech platform with live sessions | $180k–$480k | 20–32 wk | FERPA, LMS integrations, video |
For deeper breakdowns see our 2026 mobile app cost guide and the video-streaming CTO pricing guide. We use Hetzner AX-series and DigitalOcean for most self-hosted workloads and keep cloud-egress assumptions explicit in every quote.
A decision framework — accept an estimate in five questions
Before you counter-sign any quote, walk this five-question check. If a vendor can’t answer all five in writing in under a day, the estimate is not yet worth your signature.
Q1. Which stage of the Cone of Uncertainty are you quoting from? Expected answer: an explicit stage (“requirements complete, ±1.5×”) and the artefacts that got you there (WBS, user-story map, architecture).
Q2. What are the top five assumptions, and what happens to the number if each one flips? Expected answer: a list with ±hours per assumption. No assumptions = no estimate.
Q3. What is explicitly not in scope? Expected answer: a short exclusion list. “Nothing is excluded” is a red flag.
Q4. What is the confidence band — 50%, 80%, 90%? Expected answer: P50 / P80 pair from PERT or Monte Carlo. If there is only one number, it is P50 with the uncertainty rolled in invisibly.
Q5. What is the change-order process and how is velocity tracked? Expected answer: a written change-order SLA, a sprint-level velocity plan, and a standing weekly rhythm for re-forecast. Silent velocity tracking is how over-runs get discovered in month 4.
Pitfalls — five ways estimation quietly destroys projects
1. Anchoring to the budget. A founder says “we have $120k.” The vendor reverse-engineers a scope that totals $120k. The project ships at $190k. The mistake is conversational: stop stating a number before the scope is discovered. State a decision instead (“we can commit $120k to an MVP, $200k total over 18 months”) and ask the vendor to fit scope to that envelope.
2. Padding without naming it. Hidden contingency (“I’ll just double everything”) is unfalsifiable. If the project ships early, the padding becomes vendor profit. If it overruns, there’s no residual to draw on. Always ask for explicit reserve line items.
3. Ignoring non-functional requirements. Authentication hardening, logging, monitoring, accessibility, security review, GDPR / HIPAA posture, load testing, observability, CI/CD — none of these appear in a feature list, all of them cost weeks. They deserve named line items.
4. Optimism bias on integrations. “There’s an SDK for that” is almost never the whole story. Payment providers, SSO vendors, LMS systems, IoT gateways and SIP stacks all add 3–5 weeks that the SDK docs don’t show. See what not to do when cutting costs for more on this.
5. Estimating once, never re-forecasting. An estimate is a hypothesis. After sprint 2 you have data. If the vendor does not publish an updated forecast against actuals every sprint, the cone never closes. Silent estimate drift is the single biggest predictor of a cancelled project.
KPIs — how to measure estimation health
Quality KPIs. Estimate-Accuracy-Ratio (actual / estimate) per sprint. Target: 0.9–1.1 after sprint 3. Anything outside 0.7–1.3 for two sprints in a row is a scope-creep or under-estimation alarm.
Business KPIs. Cost Performance Index (earned value / actual cost). Target: ≥ 0.95. Schedule Performance Index (earned value / planned value). Target: ≥ 0.90. These two numbers together tell you whether you are burning budget, burning calendar, or both.
Reliability KPIs. Forecast-variance trend. Plot sprint-level estimate-vs-actual as a line; a flat line means the vendor’s estimates are trustworthy, a widening cone means they are not. Change-order frequency — more than one material change order per four sprints usually means the discovery was skipped, not that requirements changed.
AI in the estimation process itself
LLMs are now genuinely useful during discovery: generating first-draft user story maps from a product brief, converting rough Figma screens into acceptance criteria, flagging likely NFR gaps, and spotting inconsistencies between a requirements doc and a wireframe in minutes rather than days. We use them routinely at Fora Soft — combined with human review — and they compress stage 2–3 of the discovery funnel by roughly 30%.
What they cannot do yet: replace the judgement call on whether a given scope is a 3-week or a 9-week piece of work. That judgement is still analogous estimation by a senior engineer with domain scars. For how we run AI inside our pipeline see AI in the software development process and AI in software architecture design.
When to not run a heavy estimation
Heavy estimation is not always the answer. If you are buying a two-week prototype to test a hypothesis, the right answer is a capped T&M engagement with a written kill-switch — not a 3-week discovery and bottom-up WBS. If the work is maintenance on a mature product with a stable team, velocity history is more accurate than any re-estimation exercise.
As a rule of thumb: when the project is under ~8 weeks or under ~$40k, skip the heavy estimation and buy a capped sprint instead. Above that, the math of discovery pays for itself every time.
Estimating your first video, AI or edtech build?
That’s what we’ve done since 2005 — 625+ shipped products, 21 years of real-time and AI scars. Tell us the one sentence version and we’ll come back with a ROM you can bank on.
FAQ
How accurate should a software estimate actually be?
It depends on the stage. A rough-order-of-magnitude estimate at pitch time is honestly −25% to +75%. A budgetary estimate after a product brief is −15% to +25%. A definitive estimate after a 2–4 week paid discovery is −5% to +10%. Anyone offering ±5% without discovery is padding.
Is fixed-bid or time-and-materials better for a startup MVP?
Neither in pure form. The most reliable shape we see is a paid discovery on capped T&M (2–4 weeks, known ceiling), followed by a fixed-bid build with a written change-order SLA. You buy certainty with the discovery and preserve flexibility inside each sprint with an agreed change-budget.
Should I accept an estimate based only on Planning Poker?
Not for a commercial commitment. Planning Poker is excellent inside a formed team for sprint-level forecasts, but it assumes a backlog of well-defined stories and measured velocity. For a pre-build estimate, stack it with bottom-up WBS and 3-point PERT; if the three methods agree within 1.5× you have a defensible number.
How much should the discovery phase cost?
Roughly 5–8% of total project budget for a greenfield product. On a $200k build that is $10k–$16k for two to four weeks of senior time, delivering a signed user-story map, architecture, NFR brief, risk register and a definitive build estimate. It routinely pays back 4–6× by killing bad scope before code is written.
How does AI coding affect my software cost estimate in 2026?
Net reduction of roughly 15–25% on a typical 12-week build, not the 40%+ shown in marketing. Feature engineering drops 25–40% but QA and review grow 15–25% because AI-generated code carries ~1.7× more defects under flat review. Senior-heavy staff mix outperforms junior-heavy mix by a wide margin.
What is the Cone of Uncertainty in simple terms?
It is the observation that early software estimates can legitimately be wrong by 4× in either direction, and that the range only narrows as real decisions (scope, architecture, UI, acceptance criteria) are signed. Time alone doesn’t close the cone — decisions do. A four-week-old unsigned scope is just as fuzzy as a four-day-old one.
Can I use COCOMO II or Function Points for a modern cloud product?
Only as a cross-check, not as the primary method. Uncalibrated COCOMO II (whose reference data is from 1990s waterfall projects) can produce errors around 100% on cloud, microservice and AI-assisted stacks. Function Points are better but still need local calibration. Use bottom-up plus 3-point PERT as your primary, parametric models as a sanity check.
What are the red flags in a vendor’s estimate?
A single point number, no assumptions list, no exclusions list, no confidence band, no QA or DevOps line, no PM oversight line, no change-order process. Any two of these together mean the estimate is a quote, not an estimate — the risk will land on you in month three.
What to read next
Process
The 2026 Seven-Phase Product Development Playbook
How estimation, discovery and build fit into a single end-to-end delivery process.
Cost guide
Video Streaming App Development Cost — a 2026 CTO Pricing Guide
Real line items and ranges for the most-miscounted category of software products.
Budget
What you should do to cut costs on your software project
The four moves that actually reduce burn without degrading the product.
Case study
How AI cut 40% off development time on a 1M+ line video platform
The raw AI-productivity delta and what it does to an estimate in practice.
MVP
Why cut features and launch the product early
MVP thinking is the cheapest route to an estimate you can bank on.
Ready to turn a wobbly quote into a signed plan?
Software estimation is, in the end, a disciplined conversation between scope, certainty and risk. The tools (analogous, bottom-up, PERT, poker, Monte Carlo) matter less than the habits: name the Cone stage, price the invisible line items, keep the contract shape honest, and re-forecast every sprint. Do those four and your estimate stops being a guess and becomes a plan you can sell to a CFO.
At Fora Soft we run this playbook every week — on real-time video platforms, AI products and edtech builds — and we are happy to apply it to the estimate you are holding right now. The 30-minute Calendly slot below gets you a live walk-through of your existing quote against the five-question decision framework. Bring the PDF; we will mark it up.
Get a second opinion on your software estimate
Free 30-minute review — five questions, marked-up PDF, honest call on whether your current quote is defensible.


.avif)

Comments