Quality Gates: Why Your AI Agents Are Only As Good As Your Tickets
I'm not going to sugarcoat it: not every pull request my AI agents have produced has been great.
Over the past few weeks, I've been spinning up tons of agents and watching them crank out pull requests at a pace I could never match manually. It's been exciting to see the potential of AI-assisted development in action. But as the PRs started piling up, I noticed something troubling: the quality was wildly inconsistent. Some were spot-on, ready to merge with minimal tweaking. Others missed the mark entirely, requiring significant rework or just getting scrapped altogether. After 25 years of building software and leading remote teams, I've learned that when you see patterns in problems, you need to stop and figure out what's actually going wrong before you keep charging forward.
So I took a step back and did what any experienced engineer should do: I analyzed the failures. I looked at every problematic pull request and tried to identify where things went sideways. What I discovered wasn't a flaw in the AI agents themselves, but rather two critical quality control points that were missing from my workflow. These weren't new problems—they're the same issues that plague human development teams when communication breaks down and context gets lost. The difference is that AI agents expose these weaknesses faster and more obviously because they can't fill in gaps with institutional knowledge or tap someone on the shoulder to ask a clarifying question.
The Two Quality Gates That Actually Matter
The first pattern I noticed was undeniable: the tickets that produced the best results were the ones that had been thoroughly thought through from the beginning. These tickets left no room for ambiguity. They didn't require the agent to make judgment calls about business logic or fill in gaps about the intended implementation approach. The requirements were crystal clear, the acceptance criteria were specific, and there was enough context that the agent could simply follow the ticket exactly and produce exactly the right result. When I compared these successful tickets to the failures, the difference was night and day—it wasn't that the AI was smarter on those tickets, it was that the instructions were actually complete.
The second pattern emerged when I looked at the code review process itself. I had been running code reviews in a fairly standard way: either asking for a generic review or using a code review agent with some basic rules baked in. The problem with this approach became clear once I started examining the feedback these reviews were generating. Most code review tools and agents look at code in isolation—they evaluate whether it's clean, whether it follows best practices, whether it has obvious bugs. What they don't do, at least not by default, is evaluate the code against the original ticket. They don't ask whether the code actually solves the problem that was specified, or whether it solves it in the way that was intended.
This disconnect between ticket context and code review creates a dangerous feedback loop. When a code review lacks the full context of the ticket—including the business logic, the reasoning behind certain decisions, and the specific constraints that were outlined—it can reach incorrect conclusions about the code. A reviewer might suggest changes that seem reasonable from a pure code quality perspective but actually move the implementation away from the original intent. If you then iterate based on that feedback and run another generic code review, you can end up with code that looks clean and professional but completely misses the point of what you were trying to build. I've seen this happen on teams with human developers, and it happens even faster with AI agents that are optimizing for code quality metrics without understanding the "why" behind the ticket.
Why Context-Aware Reviews Change Everything
The realization hit me: a code review without proper context is worse than no review at all because it gives you false confidence. It tells you that your code is good when it might be solving the wrong problem entirely. What we actually need is for code reviews to understand the context in which they're reviewing the code. The questions shouldn't just be "does this code look good?" but rather "does this code solve the problem that was specified in the ticket?" and "does it solve it in the way that was intended?" These are fundamentally different questions, and they require the reviewer—whether human or AI—to have access to the full context of the ticket.
This led me to a framework I'm calling quality gates. Think of them as checkpoints where you verify that you're still on the right track before you invest more time and resources moving forward. The first quality gate sits at the beginning of the process: it verifies that tickets are properly specced out before any development work begins. This gate asks whether there's enough information, whether the requirements are clear and unambiguous, whether the business logic is explained, and whether an agent (or developer) could reasonably execute on this ticket without having to make guesses. If a ticket can't pass this gate, it needs more work before anyone starts writing code.
The second quality gate sits at the pull request stage, but it's fundamentally different from a traditional code review. This gate takes the original ticket intention into full consideration during the review process. It's not just checking code quality in a vacuum—it's evaluating whether the implementation matches what was requested, whether it handles the edge cases that were mentioned in the ticket, and whether it aligns with the reasoning and business logic that were laid out upfront. This context-aware review catches problems that generic code reviews miss entirely, particularly cases where the code is technically sound but functionally wrong.
Building Better Gates in Sandstorm Desktop
Based on these insights, I built a ticket quality gate check directly into Sandstorm Desktop, the tool I've been developing to streamline my AI-assisted development workflow. This feature serves two distinct purposes, both aimed at improving outcomes by improving inputs. The first purpose is helping me generate higher quality tickets from the start. Instead of just speaking ideas and sending them raw to Claude—which sometimes works great but often sends things off on tangents—the quality gate check helps me define tickets through a structured conversation. It has sensible defaults that can be overridden, but the core function is to eliminate ambiguity before development starts.
The second purpose is even more valuable: it reviews tickets that already exist. Sometimes I'm not the one creating the ticket—it comes from someone else on the team, or it's been sitting in the backlog for a while. These tickets might look good on the surface, but when the quality gate check analyzes them, it often identifies gaps and ambiguities that would cause problems during implementation. When a ticket can't pass the quality gate, the system asks for clarifications in conversational mode: "Why is this approach being taken?" "What's the expected behavior in this scenario?" "What constraints should be considered?" It keeps asking questions until it's satisfied with the ticket quality, and then it updates the ticket description to capture all of that context.
The result is that when a ticket finally gets sent to an agent to begin work, the agent can operate much more autonomously and produce a much higher quality result. Sandstorm Desktop already had code review built in—there's a review loop that checks the code before finalizing pull requests. But when tickets were poorly specified, even that review loop couldn't produce quality output because it was checking against an ambiguous ticket that could reasonably be interpreted multiple ways. The refinement loop would sometimes go off the rails, iterating toward a solution that looked good technically but missed the actual intent. Now, with the quality gate at the beginning ensuring ticket clarity, and the review agent checking the code against the ticket context, both sides of the process have improved dramatically.
Garbage In, Garbage Out Still Applies
Last week was a learning experience. I was excited about the potential of massively parallel AI agents, so I spun up too many of them and generated a flood of pull requests. The volume was impressive, but the quality was inconsistent enough that I knew something fundamental needed to change. This isn't about the limitations of AI—it's about the timeless principle that garbage in leads to garbage out. If you feed ambiguous, incomplete requirements into any system, whether it's an AI agent or a team of experienced developers, you're going to get inconsistent results that require significant cleanup.
What I'm hoping is that these quality gates improve performance going forward by addressing the root cause rather than just the symptoms. By ensuring ticket quality upfront, I'm setting agents up for success rather than asking them to compensate for poor inputs. By incorporating ticket context into code reviews, I'm making sure the feedback loop actually moves implementations toward the intended goal rather than just toward generic code quality metrics. These changes aren't revolutionary—they're actually pretty obvious once you step back and analyze where things are breaking down. But obvious doesn't mean easy to implement, and it definitely doesn't mean they happen automatically.
The beauty of building your own tools is that you can create exactly the quality control mechanisms you need. After two and a half decades in this industry, I've learned that the best solutions are usually the ones that acknowledge human (and AI) limitations and build systems to compensate for them. We're not going to write perfect tickets every time. Agents aren't going to produce perfect code every time. But if we put the right gates in place—the right checkpoints that catch problems early and ensure context flows through the entire process—we can dramatically improve the overall quality of what we're building. That's what these quality gates are really about: not perfection, but consistent improvement through better process design.