Tokens consumed, prompts shipped, models trained — none of those tell you whether AI is actually working. The single number that does, how to measure it without lying to yourself, and the trajectory a real AI-First team hits over four quarters.

Walk into any AI-curious company in 2026 and ask "is the AI working?" You'll get a tour of dashboards. Tokens consumed this quarter. Number of prompts shipped. Models fine-tuned. Internal NPS on the AI assistant. None of it answers the question.

The question that matters is brutally simple: are you doing more with the same people, or the same with fewer? Everything else is theatre. The metric that captures it is the AI Leverage Ratio, and it is — if you measure it honestly — the only AI number we've found that resists being gamed on a quarterly review.

The definition

AI Leverage Ratio = tasks completed per week ÷ hours of human input

That's it. Pick a recurring workflow — invoice triage, lead qualification, support tier-1, content brief-to-draft, pharmacy reorder. Count the tasks that came through in a week. Divide by the human hours that actually touched the work end-to-end — including reviewers, exception handlers, prompt edits, everything.

A traditional team running a workflow purely on human labour sits at a Leverage Ratio of around 1.0 — one hour of human input produces roughly one completed unit of work. A team six months into the playbook regularly hits 5–10×, sometimes higher on the most rule-shaped workflows.

Why it survives the "real or theatre" test

Three properties make it un-fakeable:

The denominator is hard to hide. You can pretend you shipped more prompts. You can't pretend your team worked fewer hours — payroll, calendars, Slack activity, ticket assignments all give it away.
Both directions matter. A high numerator with a high denominator (lots of work, lots of humans) is just a busy company. A low denominator with a low numerator (few humans, barely anything shipping) is a stalled company. Only the ratio is honest.
It self-corrects when an agent is silently failing. If an agent ships output that later requires a human to redo, the redo time lands in the denominator. The ratio drops. You notice.

How to actually measure it without lying to yourself

Most teams who try this give themselves a flattering number on day one. Three rules keep you honest:

1. Count tasks at the customer-visible unit. An "invoice triaged" is a triage decision delivered to the person who acts on it — not a prompt that ran. If the customer never sees the output, the task didn't happen.

2. Include every human minute, including reviewers. The trap most companies fall into: counting only the operator who clicks "send." If a manager reviews 10% of agent outputs, their time counts. If an engineer fixes a broken prompt twice a week, that counts too. Otherwise you've just hidden the cost.

3. Measure a single workflow, not the whole company. Aggregate Leverage Ratios across departments are meaningless — a 10× ratio in support and a 1× ratio in sales averages to 5.5× which tells you nothing useful. Track per-workflow, then roll up by reporting the distribution, not the mean.

The trajectory of a real AI-First team

Across the deployments we've run — Hope Hospital's pharmacy reorder agent, DrmHope's clinical decision support, BNI 121's chapter operations, Linkist NFC's lead flow — the curve looks almost identical:

Quarter	Target ratio	What's happening
Q1	1.5–2×	First agent live on one workflow. Humans still doing most of the work, but the agent handles the easy 30–40% unaided.
Q2	3–5×	Agent handles the boring 60–70%. Humans focus on the weird cases. Confidence scoring catches enough errors that the review loop tightens.
Q3	5–8×	Three or four workflows running on agents. Cross-workflow patterns harvested into reusable skills. The Business Brain is the operating doc.
Q4	8–10×+	Intelligence layer running daily. Humans review summaries, not tasks. Net headcount unchanged; throughput tripled or more.

Companies that try to compress this to two quarters mostly fail — not because the technology can't move that fast, but because the humans haven't built the muscle to trust agent output yet. The curve is paced by review cadence and trust-building, not model capability.

The four metrics that look like leverage but aren't

Cost saved per month. If you fire your support team, costs drop and quality drops faster. Ratio captures both; cost savings captures only one.
Tickets closed per agent. Easy to inflate by redefining what counts as "closed." The closed-but-customer-came-back-the-next-day problem is invisible.
Average response time. AI is always fast. That isn't the question. The question is whether the response was right.
Tokens or API calls per day. A pure cost number dressed up as a productivity number. More tokens just means more spending — possibly on better outcomes, possibly on a model loop that hallucinates 3× before settling on an answer.

Hope Hospital, by the numbers

The pharmacy reorder workflow is the cleanest illustration. Before: one pharmacist spent ~9 hours a week reviewing stock levels across 1,071 SKUs, generating reorder drafts, chasing supplier confirmations. The throughput was ~110 reorder decisions a week. Leverage Ratio: ~12 decisions per human hour.

After: the reorder agent reads the stock feed, applies the policy (par levels, supplier preference, narcotic-class rules), and drafts a reorder. The pharmacist reviews the agent's draft, edits the edge cases, signs off. Time down to ~1.4 hours a week. Same throughput — ~110 decisions. Leverage Ratio: ~78 decisions per human hour. A 6.5× increase, on a single workflow, in the first quarter.

That pharmacist did not lose her job. She picked up two other workflows that previously had no owner. The hospital's total throughput went up; her career got more interesting; her hours stayed the same.

Where to track yours

The AI Leverage Ratio dashboard inside the playbook tracks this per-workflow automatically — it pulls task counts from your operational tools and time-on-task from a combination of calendar data and the agent audit log. Most teams start seeing a real number by Week 6, in the first quarter of the playbook.

If you haven't started the playbook yet, walk the wizard for free to see where Leverage Ratio fits in the 10-step path. For a refresher on what "AI-First" actually means before measuring leverage against it, see the definition post.

One final caution

A high Leverage Ratio is necessary but not sufficient. A team can hit 10× by shipping fast, slop-quality output that customers tolerate for one quarter before they leave. The Ratio caught the quantity. It missed the cliff.

That's why the playbook pairs the Ratio with a customer-side trust score — agent outputs that go through a human reviewer until the trust score hits 9/10, only then does the agent ship unsupervised. The Ratio is the leverage. The trust score is the safety belt. You want both pinned to the wall.

AI Leverage Ratio: the only AI metric that matters