Building at the Frontier: How I Built My Own AI Agent Orchestrator and Finished a Two-Week Sprint in Three Days

I just completed an entire two-week sprint worth of tickets in three days, and yesterday I had ten simultaneous AI agents working on code, committing changes, and reviewing work—all orchestrated through a tool I built myself. If that sounds like science fiction, welcome to what Addy Osmani calls "Level 8" AI-assisted coding: building your own orchestrator. This isn't some distant future possibility or theoretical framework. This is what I've been using in production for the past two weeks, both at my day job and on personal projects. Let me walk you through what it means to build and work at the frontier of AI-assisted development.

From CLI to Cross-Platform: The Evolution of Sandstorm

A few weeks ago, I shared about my Sandstorm agent orchestrator, a CLI tool I'd built to help manage AI agent stacks. The concept was solid: spin up agent stacks to handle tasks, then tear them down when complete. But in practice, the CLI had serious limitations that became increasingly apparent the more I used it. The visibility into what was happening during execution was minimal at best, making it difficult to understand when things went wrong or how resources were being consumed. The user interface was functional but clunky, and honestly, the scripts powering everything weren't as reliable as they needed to be for serious day-to-day use. I knew I could do better, so I evolved it into Sandstorm Desktop—a full Electron app that works cross-platform on macOS and Linux, and would probably build fine for Windows if I took the time to set it up.

The transformation from CLI to desktop application wasn't just about adding a pretty interface. It fundamentally changed what was possible with agent orchestration. With a proper desktop application, I could build real-time observability into stack performance, create intuitive controls for managing multiple concurrent agents, and implement features that would be painful or impossible in a command-line environment. Yes, the development has been intense—I'm making numerous changes daily, constantly evolving and improving the application. There's definitely some instability that comes with moving this fast; sometimes I add something that breaks other features, and I'm slowly working through those issues. The codebase is moving quickly as I'm actively improving both the architecture and the testing framework to make it more robust.

Intelligence Where It Matters: Smart Model Selection and Token Optimization

Here's where things get really interesting from a practical standpoint. One of the most valuable features I've implemented is intelligent model selection. When I spin up a stack with a ticket—whether from GitHub Issues or Jira—the system reviews the ticket and intelligently selects which AI model the stack should use based on the complexity of the task. This isn't just a nice-to-have feature; it's absolutely essential for sustainability. When you're spinning up multiple stacks throughout the day, token usage becomes a serious constraint. I was burning through my token allocation at an unsustainable rate until I implemented this feature. Simple tasks don't need the best models; they can run perfectly well on less expensive models, which gives me significantly more runway with my token budget.

The observability features I've built in have proven just as valuable as the intelligent model selection. I can now see in real-time how many tokens each stack is consuming, which lets me identify when a particular task is chewing through tokens at an unexpected rate. I've built a comprehensive history system that tracks all the stacks I've launched and torn down, giving me insights into patterns and usage over time. I can group my stacks by ticket, which means if I'm working on a GitHub issue or Jira ticket, I can see the complete token consumption and agent activity associated with that specific piece of work. This level of visibility has been transformative for understanding not just what my agents are doing, but how efficiently they're doing it and where I might need to optimize my approach.

Production-Ready Results: Real Work, Real Impact

Let me be concrete about what "production-ready" actually means in this context. I've been using Sandstorm Desktop for two full weeks now across both my professional work at a property tech company and my personal projects. The results have exceeded my expectations. At work, I finished all the tickets allocated for our entire two-week sprint in just three days. Not by cutting corners or reducing quality—by orchestrating AI agents to handle the repetitive, time-consuming aspects of software development while I focused on architecture, code review, and the complex decision-making that still requires human judgment. I'm now pulling larger work from the backlog because I've cleared my immediate commitments so efficiently.

Yesterday's work session perfectly illustrates the power of this approach. I spun up ten simultaneous stacks working concurrently—each one writing code, committing changes, and even reviewing code. Everything ran smoothly, with all ten agents working in harmony through the orchestration layer I'd built. This isn't about replacing developers; it's about multiplying what a skilled developer can accomplish. I'm still making all the important decisions about architecture, reviewing the critical changes, and ensuring everything aligns with our team's standards and the product requirements. But I'm no longer spending hours on boilerplate code, routine refactoring, or straightforward feature implementations that follow established patterns. The agents handle that work while I handle the work that actually requires my 25 years of experience building software.

Building at Level 8: What Frontier Development Actually Looks Like

Addy Osmani's framework on AI adoption levels provides helpful context for where this work sits in the broader landscape. In his LinkedIn post about AI-assisted programming, he outlines different levels of AI adoption in software development, with Level 8 being "build your own orchestrator." Most developers are still at levels 3-5. Level 8 represents the frontier: developers who are building custom orchestration systems that manage multiple AI agents working in concert on complex, real-world software development tasks.

This is where Sandstorm Desktop operates. It's not just about generating code snippets or getting help with a function. It's about orchestrating multiple AI agents, each potentially running different models optimized for their specific tasks, all working together through Docker containers to accomplish real software development objectives. The technical architecture combines Claude's capabilities with Docker Desktop, using agent orchestration as the foundational premise. It's open source, which means other developers can experiment with this approach and contribute to pushing the frontier forward. I've marked my old Sandstorm CLI repository as deprecated because the desktop application has completely superseded it in terms of capabilities and reliability.

The Reality of Building on the Frontier

I want to be honest about what working at this level actually feels like because it's not all smooth sailing. The development velocity is intense—I'm making changes multiple times per day, constantly iterating based on what I learn from using the tool in real work situations. This rapid iteration means there's inherent instability; new features sometimes break existing functionality, and I spend time fixing issues that my speed of development creates. The codebase is still evolving toward a more stable architecture, and the testing framework is a work in progress. But here's the key insight: even with these growing pains, the tool is already more productive than anything else available for agent orchestration. The value it delivers far outweighs the occasional friction from rapid development.

Building your own orchestrator isn't for everyone, and that's okay. It requires comfort with ambiguity, willingness to debug your own tools while using them for production work, and the technical depth to build and maintain this kind of system. But if you're a serious builder of software—someone who sees programming as both craft and leverage—exploring this level of AI integration is worth the investment. You're not just using AI; you're architecting systems that multiply your impact as a developer. You're building the tools that will define how the next generation of software gets made.

Why This Matters Beyond Personal Productivity

The implications of orchestration-level AI assistance extend far beyond individual productivity gains. When a single developer can complete a two-week sprint in three days, that fundamentally changes the economics of software development. It shifts the bottleneck from implementation to decision-making and architecture. It means small teams can accomplish what previously required large organizations. For someone who founded a small software consultancy and led remote teams building custom applications, I see clearly how this changes the game for independent developers and small consulting shops. You can take on more ambitious projects, deliver faster, and compete with larger organizations on capabilities rather than headcount.

The open source nature of Sandstorm Desktop is intentional. I'm not trying to build a commercial product here; I'm exploring the frontier and sharing what I learn. If you're interested in agent orchestration that works with Docker locally, the repository is available for you to experiment with. Fair warning: it's evolving rapidly, the documentation is probably behind the current feature set, and you'll need to be comfortable with some rough edges. But if you want to work at Level 8, if you want to see what's possible when you build your own orchestration system tailored to your specific workflow and needs, this is a real, working example that's being used in production every day.