Back to Blog

Token Churn: Why Traditional Software Workflows Are Bleeding Your AI Budget

If you're building agentic systems and wondering why your token costs are spiraling out of control, the answer might surprise you: you're probably using the wrong workflow. After spending the last several years building agent orchestration systems in property tech, I've learned that one of the most expensive mistakes you can make is taking traditional software engineering processes and applying them directly to AI agents without modification. What works beautifully for human developers becomes a token-burning disaster when you hand those same workflows to LLMs.

An abstract representation of token churn in agentic systems — repeated expensive LLM review loops burning through an AI budget, contrasted with cheap deterministic verification steps

The problem I want to talk about today is something I call token churn—the massive, often hidden cost that happens when we naively port our standard development loops into agentic systems. And if you're moving any significant volume of tickets through an AI-assisted development pipeline, this is costing you far more than you realize.

The Traditional Engineering Loop and Why It's Expensive for Agents

Let's start with the workflow we all know. In traditional software engineering, we take a ticket, execute the work, and then go through a review process where a human examines the code. We might have some CI steps that run automated tests. If anything fails in CI or we get feedback in the code review, we take that feedback, make changes, and go through the loop again—CI again, code review again, iterating until everything passes. Finally, the code gets approved and merged. This loop has served us well for decades because human time is expensive and we want to catch issues before they reach production.

Now, when we take that exact same loop and apply it to an agentic system, something interesting happens: the economics completely flip. That code review process, which is relatively cheap when a human does it once or twice, becomes extraordinarily expensive when you're burning tokens for an LLM to conduct a thorough review. And we do want thorough reviews—that's the whole point of having a review step in the first place. So what happens is we end up with a well-defined ticket up front, the agent executes the work, and then we do this really expensive token-heavy code review. Anything the review finds gets fed back into the system, the agent fixes it, and then we do another expensive token-heavy code review. The cycle continues until approval.

This might not seem like a big deal if you're only processing a handful of tickets. But if you're moving significant volume through your system, or if you have repeated failures for various reasons, this expensive code review step adds up catastrophically fast. Consider a pull request that loops five times before it's approved. That's five full reviews, and each time you're conducting that review, you're burning tokens. Research shows that agent loops can burn roughly 15x the tokens of a single chat interaction, and much of that cost comes from these repeated, expensive verification steps. The auto-loop tax is real, and it's eating your budget.

Why Token Burn Matters More Than You Think

The token burn problem isn't just about direct costs, though those add up quickly enough. It's also about the hidden costs of retrieval thrash and path-dependent variance that make it nearly impossible to predict or control your expenses. When an agent loops multiple times, it's not just repeating the same operations—it's often retrieving context repeatedly, regenerating understanding, and sending vastly more tokens than a linear execution path would require. A ten-turn loop can send roughly 50 times the tokens of a single linear call, and most of that variance is driven by retrieval patterns that standard observability tools don't even surface.

This matters because it fundamentally changes the economics of building production agentic systems. What works in a prototype or demo environment—where you might process a few dozen tickets and token costs are negligible—becomes unsustainable at scale. The transition from prototype to profit requires engineering token-efficient workflows that are fundamentally different from what we're used to in human-driven development. You can't just throw more tokens at the problem and hope costs stay manageable. You need structural changes to how the workflow operates.

The Contract-Based Approach: Shifting Cost to Where It Matters

So here's what I'm working on right now to address this problem. Instead of doing expensive review loops at the end of the process, I'm moving the verification burden earlier and making it scriptable wherever possible. The modified workflow starts with a ticket spec gate, just like before—we still need well-defined requirements. But immediately after the ticket spec gate, we create what I call a ticket contract.

This contract outputs in a specific, structured format—think JSON blob—that can be validated with a script rather than an LLM call. The contract defines the parameters by which the ticket will execute successfully, and it does so in a way that's programmatically verifiable. It specifies exactly what the agent is expected to do, what files it should modify, what functions it should create or change, and what the expected outcomes are. This contract becomes the source of truth for whether the work was done correctly.

Then we pass both the well-specced ticket and the contract into the executor agent. The executor does its work, writing code and making changes according to the specification. But here's where the workflow diverges from the traditional approach: instead of immediately going to an expensive LLM-based code review, we first go through a contract verification step. This is just a script—no LLM required—that validates whether the work satisfies the contract. Did it modify the right files? Did it create the expected functions? Does the structure match what was specified? This verification step is essentially free compared to an LLM review, and it catches a huge percentage of the issues that would otherwise require expensive review loops.

After contract verification passes, we run local CI—execute the test suite, make sure there are no test failures, verify that nothing broke. Again, this is standard automated testing, no tokens required. Only after the work has passed both contract verification and local CI do we finally make an LLM call for what I call a semantic review. But this review is fundamentally different from the traditional code review. We're not doing a thorough examination of every line of code—we already verified through the contract that the agent did what it was told to do and nothing more. Instead, the semantic review is a high-level sanity check: Does this actually accomplish what the requirements requested? Is the logic correct? Are there any obvious security issues in the modified code?

This final semantic review is much cheaper because its scope is narrowly defined. We're not asking the LLM to catch structural issues or verify that the right files were changed—we already know that from the contract. We're just asking for a final correctness and security check on work that's already been validated at the structural level. If that passes, we're done. One semantic review, not five. The token savings are enormous.

The Broader Principle: Workflows Must Match Economics

Now, the specific implementation I just described isn't the main point here. You might need a different approach based on your particular constraints, your existing tooling, or the nature of the work your agents are doing. The actual point—the thing I want you to take away from this—is that when we're building agentic workflows, we can't always directly mimic the workflows we've been using in traditional software engineering.

Those traditional workflows evolved to optimize for human constraints. Human time is expensive, human attention is limited, and human review is most effective when it happens at specific gates in the process. But agentic systems have different constraints. Token usage is the primary cost driver, and tokens scale with the number and size of LLM calls. What's cheap for humans (reviewing code) is expensive for agents. What's expensive for humans (running scripts and automated checks) is essentially free for agents.

This means we need to think cleverly about how to structure these agentic loops. We need to be mindful of where we're spending tokens and ask whether each LLM call is truly necessary or whether we can accomplish the same verification through deterministic means. Building agents that don't crash, loop endlessly, or burn through tokens requires discipline and intentional workflow design, not just better prompts or more powerful models.

Building Sustainable Agentic Systems

As I continue developing agent orchestration systems and working through these problems in production environments, I'm convinced that the winners in the AI-assisted development space won't be the ones with the best models or the most sophisticated prompts. They'll be the ones who figure out the economics—who build workflows that accomplish the same verification and quality goals while minimizing token churn.

This is still an evolving space, and I'm learning new things every week about what works and what doesn't. But the fundamental principle is clear: your workflows must match your cost structure. When you're working with agentic systems, that means moving verification work away from expensive LLM calls wherever possible, frontloading specification work so agents have clear contracts to execute against, and reserving your token budget for the places where semantic understanding truly matters.

If you're building production agentic systems and you haven't thought carefully about token churn, now's the time to start. Look at your workflows, identify where you're making repeated expensive LLM calls, and ask yourself: could this verification happen earlier, cheaper, or in a deterministic way? The answer might just transform your economics from unsustainable to profitable.