Code review is dead, long live code review
During my first job out of undergrad, an esteemed colleague taught me a lesson that’s stuck with me for almost fifteen years.
Pending code reviews represent blocked threads of execution. [Thus, you should] prioritize code review above your other work (except for outages and customer service).
To paraphrase, an open pull request means that a colleague is unable to continue their task until someone reviews their work. Prioritizing code review optimizes a team’s throughput by reducing the amount of time people spend blocked.
I still want to live by this sage advice. But it’s getting harder as code is generated by agents. As the volume of generated code increases, it becomes untenable to review it in the classical way.
Several schools of thought are emerging about how to adapt to this changing environment.
One school of thought would have you throw out code review as a practice altogether. Just embrace the vibes. We can’t let feature velocity be help back by quality controlbureaucracy. And code is so trivially easy to produce now, we can always fix bugs quickly in post.
Call me old fashioned, but this feels like a recipe for outages and an unmaintainable codebase.
Another school of thought, one which I am more compelled by, is that advocated by the folks at Every.
This is code review done the compound engineering way: Agents review in parallel, findings become decisions, and every correction teaches the system what to catch next time.
[…]
A security expert spots authentication gaps but misses database issues. A performance specialist catches slow queries but ignores style drift. I needed specialists working in parallel, each focused on what they’re good at. Together, they catch what I might miss from a manual review.
Basically, for each hat that a code reviewer might wear (separation of concerns? data model robustness? API design? performance? security?), best practices for that aspect of review are written down in a Markdown file. Then each file is provided as context to an agent, the collection of which are spun up and thrown at each pull request.
So your PR might be evaluated by a dozen+ agents which collectively produce a list of suggestions that you then work through.
This seems strictly better than the YOLO approach. But it still puts the onus of judgement on the PR author, who has to decide whether to heed or ignore each of the agents’ suggestions.
I can imagine this working well for seasoned engineers who have learned through experience what to look out for. But less so for more junior engineers who haven’t yet built up this degree of judgement.
Or for a seasoned engineer making changes to a part of their company’s codebase that they aren’t familiar with. Like so much of the AI coding zeitgeist, this stuff feels optimized for greenfield work and less thought out for large codebases involving many people and moving parts.
Maybe a hybrid approach will emerge, where PRs are first poured over by a herd of concern-specific agents, and then a colleague is required to approve of how the author handled the agents’ suggestions.
