[{"data":1,"prerenderedAt":111},["ShallowReactive",2],{"blog-telemetry-token-bloat":3},{"id":4,"title":5,"body":6,"date":102,"description":103,"extension":104,"meta":105,"navigation":106,"path":107,"seo":108,"stem":109,"__hash__":110},"blog/blog/telemetry-token-bloat.md","The Hidden Cost of Token Bloat: What Telemetry Taught Me About AI Tool Optimization",{"type":7,"value":8,"toc":91},"minimark",[9,13,20,25,28,31,35,38,41,45,48,51,55,58,61,65,68,71,75,78,81,85,88],[10,11,12],"p",{},"After adding telemetry to my AI application Sandstorm, I discovered something that completely changed how I think about tool integrations: what seemed like a simple task was silently consuming 350,000 tokens, and the culprit wasn't where I expected. When you're building software that interacts with AI systems, you operate on assumptions about efficiency. You trust that the tools and protocols you've implemented are doing their job without creating unnecessary overhead. But after 25 years of building software, I've learned that assumptions are where problems hide. This is the story of how fine-grained telemetry exposed a compound problem that was quietly destroying my application's efficiency, and what I did to fix it.",[10,14,15],{},[16,17],"img",{"alt":18,"src":19},"Telemetry dashboard showing token usage spikes from compound MCP calls","/blog/telemetry-token-bloat.png",[21,22,24],"h2",{"id":23},"the-investigation-what-telemetry-actually-revealed","The Investigation: What Telemetry Actually Revealed",[10,26,27],{},"I recently added detailed telemetry into Sandstorm specifically to get a granular understanding of token usage within the app. My goal was simple: understand exactly where tokens were being consumed so I could optimize performance and keep costs under control. What the data revealed wasn't a single smoking gun but rather a compound issue where multiple problems were collaborating to create massive token bloat. The telemetry didn't just show me numbers; it showed me patterns, and those patterns told a story about architectural decisions that seemed reasonable in isolation but became problematic when they compounded together.",[10,29,30],{},"The primary culprit was actually a gap in my implementation. I was missing compound MCP (Model Context Protocol) functions that could handle multiple operations in a single call. Instead of having one MCP call that could accomplish a complex task, the system had to make four separate calls to achieve the same result. This wasn't an oversight in the traditional sense; it was the natural evolution of building features incrementally without stepping back to see the bigger picture. Each individual MCP call made sense when I built it, but I hadn't created the higher-level abstractions that would allow them to work together efficiently. The result was predictable in hindsight: the system was doing more work than necessary, and each additional call came with its own overhead.",[21,32,34],{"id":33},"the-compounding-problem-when-architecture-fights-efficiency","The Compounding Problem: When Architecture Fights Efficiency",[10,36,37],{},"Here's where things got really interesting. Each MCP call would blow up the context window significantly. A simple task that should have been straightforward ended up consuming massive amounts of tokens because it required multiple sequential calls, and each call carried substantial bloat. The mathematics of this problem are brutal: if you need four calls instead of one, you're not just paying four times the cost of the operation itself. You're paying for all the overhead four times over, including all the contextual information that needs to be maintained across those calls. The context window isn't just carrying your data; it's carrying the entire protocol structure, metadata, and response formatting for every single interaction.",[10,39,40],{},"But it wasn't just the extra calls that were problematic. The very nature of MCP itself and the shape of its responses was contributing to the bloat. Even well-designed MCP calls carry inherent overhead because of how they're structured. The protocol requires certain metadata, formatting, and structural elements that ensure reliability and consistency, but all of that comes at a token cost. When you make one MCP call, that overhead is a reasonable trade-off for the functionality you get. When you're making multiple calls for what should be a single logical operation, that overhead multiplies into something that fundamentally undermines your application's efficiency. I was watching simple tasks balloon to 300,000 or even 350,000 tokens, and the telemetry data made it crystal clear that this wasn't sustainable.",[21,42,44],{"id":43},"the-migration-from-mcp-to-skills","The Migration: From MCP to Skills",[10,46,47],{},"Looking at the telemetry data, I made a decision that might seem counterintuitive: I decided to migrate away from MCP entirely and move back to skills, which is actually where Sandstorm originated. This wasn't a step backward; it was a strategic pivot based on concrete performance data. The key advantage of skills is their simplicity and precision. With skills, you can define a very precise name and description for each capability, and then the skill itself can just execute a script. There's no protocol overhead, no complex response shapes to maintain, and no unnecessary context bloat. The skill knows what it needs to do, it does it, and it reports back with exactly the information required.",[10,49,50],{},"I systematically moved all the MCPs over to skills and completely ditched MCP because of the bloat it was causing. The results were immediate and dramatic. Instead of every single MCP call inflating the context by 8,000 tokens, the skills-based approach kept things under 1,000 tokens in most cases. The skill itself doesn't have the MCP overhead to carry along with it; it's just the functional code that accomplishes the task. An entire skill in its totality—not just the header or metadata, but the complete implementation—comes in at less than a thousand tokens. When you're dealing with operations that might be called dozens or hundreds of times in a session, that difference compounds into massive savings.",[21,52,54],{"id":53},"the-results-from-350000-tokens-to-95000","The Results: From 350,000 Tokens to 95,000",[10,56,57],{},"The performance improvement was staggering. A task that originally would blow up to 300,000 or 350,000 tokens now only used 95,000 tokens after migrating to skills. That's a reduction of roughly 70%, achieved purely through architectural changes with no loss of functionality. Could it still be optimized further? Absolutely, and I'm sure there are additional gains to be found. But going from 350,000 to 95,000 tokens represents a huge win already, both in terms of cost and performance. More importantly, it validated the entire approach of using detailed telemetry to guide architectural decisions rather than relying on assumptions or best practices that might not apply to your specific use case.",[10,59,60],{},"This kind of optimization doesn't happen by accident. It requires visibility into what's actually happening in your system, not what you think is happening. After working remotely for 25 years and building custom applications across different domains—from my own software consultancy to EdTech to property tech—I've seen countless examples of performance problems that hide in plain sight. The difference between good software and great software often comes down to whether you have the observability infrastructure in place to identify these issues before they become critical. Telemetry isn't just a nice-to-have; it's the foundation of intelligent optimization.",[21,62,64],{"id":63},"looking-forward-proactive-skill-identification","Looking Forward: Proactive Skill Identification",[10,66,67],{},"Now that I have telemetry observability in place, I can start refining my workflows and creating additional skills that will improve token usage even further. But I'm not stopping at reactive optimization. I have plans for something more ambitious: proactive skill identification based on patterns recognized within the telemetry data. Since I now have comprehensive data on tool usage and token consumption, I can analyze how I'm actually interacting with Sandstorm and identify repeated patterns that suggest opportunities for new skills. If I'm consistently performing the same sequence of operations, that's a candidate for a compound skill that can handle the entire workflow more efficiently.",[10,69,70],{},"This proactive approach excites me because it transforms telemetry from a diagnostic tool into a generative one. Instead of just showing me what's wrong, it can suggest what could be better. By analyzing interaction patterns—particularly how the system interacts with Sandstorm stacks—I can identify optimization opportunities that wouldn't be obvious from a purely code-centric view. The telemetry shows me not just what the system is doing, but how it's being used in practice, and that usage data is invaluable for architectural decisions. It's the difference between optimizing in theory and optimizing for reality.",[21,72,74],{"id":73},"the-broader-lesson-dont-take-optimization-for-granted","The Broader Lesson: Don't Take Optimization for Granted",[10,76,77],{},"Here's something we don't think about enough: we take for granted how much optimization has gone into the tooling we use daily. Whether it's development frameworks, cloud platforms, or AI APIs, there are teams of engineers who have spent countless hours squeezing every bit of efficiency out of these systems. Those optimizations are invisible until you build something that sits at the intersection of multiple tools, and suddenly you're responsible for the optimization at that integration layer. If we're not careful, things can explode on us quickly, especially when dealing with token-based pricing models where inefficiency has a direct monetary cost.",[10,79,80],{},"Building serious software means being serious about performance, and being serious about performance means having the observability infrastructure to understand where your resources are actually going. The telemetry I added to Sandstorm was a relatively small investment of development time, but it paid immediate dividends by exposing a problem that was costing me hundreds of thousands of unnecessary tokens per task. As someone who's spent years building custom applications and leading remote teams, I've learned that the best time to add observability is before you think you need it. By the time a performance problem becomes obvious to users, you've already lost ground.",[21,82,84],{"id":83},"conclusion-telemetry-as-a-competitive-advantage","Conclusion: Telemetry as a Competitive Advantage",[10,86,87],{},"The journey from 350,000 tokens to 95,000 tokens wasn't just about cost savings, though that's certainly valuable. It was about building software with intentionality, using data to drive decisions rather than intuition alone. The migration from MCP to skills was only possible because I had the telemetry data to prove it was necessary and to validate that it worked. Without that observability, I'd still be operating on assumptions, watching my token usage climb without understanding why.",[10,89,90],{},"As I continue working on Sandstorm and exploring proactive skill identification, the telemetry foundation I've built will only become more valuable. Every interaction generates data, and that data informs the next round of optimizations. This is how serious builders approach software: with measurement, iteration, and a commitment to understanding not just what the code does, but how it performs in the real world. The tools and protocols we choose matter, but what matters more is having the visibility to know when those choices are working and when it's time to change course.",{"title":92,"searchDepth":93,"depth":93,"links":94},"",2,[95,96,97,98,99,100,101],{"id":23,"depth":93,"text":24},{"id":33,"depth":93,"text":34},{"id":43,"depth":93,"text":44},{"id":53,"depth":93,"text":54},{"id":63,"depth":93,"text":64},{"id":73,"depth":93,"text":74},{"id":83,"depth":93,"text":84},"2026-04-20","After adding telemetry to Sandstorm, I discovered a compound MCP problem was silently ballooning context to 350,000 tokens per task. Migrating to skills cut it to 95,000 — a 70% reduction with no loss of functionality.","md",{},true,"/blog/telemetry-token-bloat",{"title":5,"description":103},"blog/telemetry-token-bloat","rJjVVOASAMsbJxKub1XNrN2ytM4TTvN9GrA2VJ3wERg",1776700909143]