AI Can't Do Math (and Why That Matters)

Last Friday, our AI for Agencies peer group saw a demonstration that exposed a fundamental problem with how most of us are using AI for business analysis.

Pete Caputa and members from the Databox product team were walking through their new MCP (Model Context Protocol) connector when Tadej from the engineering team made an offhand comment that caught my attention:

"The data is never calculated in the LLM. It's always calculated in Databox."

The context was a discussion about consistency. Tadej confirmed that their architecture prevents calculation errors by bypassing the LLM entirely.

The Prediction Trap

Large language models don't do math. They predict what math should look like.

When you ask ChatGPT or Claude to analyze a spreadsheet or calculate your conversion rate, it is not running formulas. It represents pattern matching. It generates output that resembles what a correct answer would probably look like based on its training data.

If you ask for "2 + 2," it gets it right because it has seen that pattern billions of times. If you ask it to calculate the month-over-month variance of three specific marketing channels, the pattern is unique. The model guesses the next likely token. Sometimes it is right. Often it is close. Occasionally it is confidently, convincingly wrong.

Ignore the headlines about AI winning Math Olympiads. Those are specialized research models. The tools we use for business are built for language, not logic.

For drafting emails or summarizing documents, that works fine. For forecasting revenue or modeling ad spend impact, it creates a dangerous blind spot. You are making real decisions based on numbers that were never actually calculated.

The Code Interpreter Workaround

There is a fair counterargument here. Savvy users know that you can force an AI to do math correctly if you ask it to write code.

If you use tools like ChatGPT’s Advanced Data Analysis, the AI does not try to predict the number. Instead, it writes a Python script, executes that code, and reads the result back to you. It is effectively "checking its work" by using a calculator rather than its own brain.

This is a valid workaround for individuals. It bridges the gap between language and logic. However, it is slow and often fragile. It requires the user to trust that the code was written correctly or have the technical skill to verify it. It adds a layer of friction that makes it difficult to build automated, reliable business workflows around.

You cannot run a business on a workaround that requires a "code check" for every query.

Separating Thought from Calculation

What Databox demonstrated was an architecture that industrializes that workaround. It separates the "thinking" from the "calculating" at the infrastructure level.

Their MCP connector lets you query your business data through Claude or other LLM clients. However, when you ask a question like "What is driving revenue this quarter?" or "How did our campaigns perform in October versus November?" the LLM does not attempt the math. It does not even need to write its own throwaway code.

Instead, it passes the request to Databox's dedicated analytics engine. That engine runs the actual calculations against your verified data. The LLM then simply interprets and presents those results.

The AI describes the data. Databox does the math.

This distinction removes the risk of hallucination from the numbers. It gives you the conversational interface of an LLM with the reliability of a CFO’s spreadsheet.

The Strategic Stakes

This separation is likely where the industry is heading. We will see AI used for interpretation and interface, while dedicated analytics engines handle the calculation.

The infrastructure layer beneath our tools is evolving quickly. The decisions being made now about architecture will determine which tools remain reliable as the stakes get higher. The models will keep getting smarter, but for business decisions that depend on accurate numbers, smart alone is not enough. You need architecture that knows its own limitations.

author

William McKee

As a managing partner of Knowmad, William creates sustainable growth for the agency by leading its future vision, driving new revenue, and empowering team member productivity and well-being.

view bio

knowLedgeAI Can't Do Math (and Why That Matters)

sign-up for updates

The Prediction Trap

The Code Interpreter Workaround

Separating Thought from Calculation

The Strategic Stakes

William McKee