The LLM code review problem

Table of Contents

After months of working with people using AI to assist them in their work and it has been pretty rough. What should be making developers more productive is actually making my job way harder.

You cannot tell what is wrong anymore #

I have gotten pretty good at code reviews over the years. Usually I can scan some code and understand not just what is broken, but how the developer got there and what they are trying to do.

With AI generated code it is a completely different story. The code looks clean and thoughtful on the surface, but it is often broken in non obvious ways. All the normal signs that tell you about a developer’s thought process are gone. Instead you get code that looks impressive but doesnt actually work or works suboptimally. You can not simply copy and paste it into your project and expect it to work reliably. Unless ofcourse you take the time to actually read through the code, understand its logic and fine tune it.

The conversation disappears #

Good code review used to be a back and forth. I would point out an issue. Explain why the reasoning and we would discuss the fix together. The dev would learn something and sometimes I would too.

Now when I leave a comment, I just get “good catch” or “I will fix that”. Since they did not write the original code they cannot engage with the feedback. They will go back to the ChatGPT or Claude get a different approach and submit that. Often it is build in completely new ways in which they do not fully understand.

This reminds me of something I noticed early on instead of fixing issues in the original PR, I would often get a completely different solution as the response to my first review. The usual culture of slowly iterating a PR from “a bit broken” to “actually good” just does not work anymore.

I feel like the AI gives them false confidence that they have found the optimal approach, when often there are much simpler or more maintainable ways to solve the same problem.

The work gets flipped upside down #

Here is what really gets me some devs now spend less time on their code than I spend reviewing it. They can get out a complex solution in an hour that takes me half a day to properly understand and fix.

I had one developer who would completely rewrite their approach after every review instead of fixing the original problems. Each version looked polished but had different subtle bugs. The time I spent on these PRs was significantly higher than what the developer had invested.

Tests don’t help as much #

When people started using AI to write their tests as well the safety net I thought we had disappeared. When both your implementation and your tests come from AI you are not really getting the safety net you think you are.

I mean.. it is better than no tests at all but the tests might look thorough while actually testing the wrong things or missing edge cases that a regular developer would naturally consider.

I found myself reviewing not just the implementation but also validating whether the tests were actually useful.

Pressure to review #

The worst part is the cultural pressure this creates. When developers look appear highly productive because they are shipping out a lot of code, pushing back on quality becomes really difficult. You hear things like “80% good enough is fine”, “you are being a perfectionist” and “stop blocking”.

These might be reasonable in some normal circumstances, but not when you are dealing with code that looks good but might be completely broken underneath.

What now? #

I am not saying we should abandon AI assistance entirely. But I think we need to be more honest about the trade offs we are making. The productivity gains everyone talks about might be an illusion if other developers are drowning in review work.

Otherwise you end up burning out experienced engineers while junior developers feel like they are being incredibly productive.