slop

Stop Shipping Slop

March 2, 2026

Code has never been cheaper to write, but that's not the good news you think it is. Most of what these models produce is slop, and slop spreads.

These models are nondeterministic. Dario Amodei said it himself: there's a basic unpredictability to them that we have not solved. They never fail the same way twice, which means the only thing standing between your codebase and total entropy is you and the standards you enforce.

slop multiplies

Bad code grows exponentially while good code only grows linearly, and the gap compounds faster than you'd expect.

You let bad code in because it was convenient. It was the fast fix, the easy copy-paste, the "we'll fix it later" that everyone knows means never. And because that code was convenient to write, it's also convenient to copy, which means it quickly becomes the dominant pattern in your codebase.

Agents make this dramatically worse because they can't distinguish between your best code and your worst. They just scan for the most common pattern and copy it. If that pattern is slop, every new feature the agent writes will contain more slop, and it'll produce it faster than any human ever could.

The bad code wouldn't have made it in if it wasn't convenient, and convenient code is the kind that gets copied the most. That's why the ratio only ever moves in one direction.

Every codebase has a quality ceiling that you tend to hit around six months in. After that the patterns are locked and you're building on whatever foundation you laid down in those first months. If that foundation was vibe-coded with early models that could barely resolve a merge conflict, you're stuck with it.

kill it

When you find slop in your codebase, don't try to fix it. Delete it and rewrite it. Don't look at the git history. Don't ask who wrote it. Don't schedule a refactor for next sprint. Kill it now.

Rewriting 5,000 lines of code used to take weeks, which is why we'd patch and refactor and work around the mess instead. That math has completely changed now, and rewriting is cheap enough that there's no excuse to keep bad code around. If any part of you is telling you something should probably be deleted, you should have deleted it a month ago.

We went through this at Cleve. We vibe-coded the first version of our web app and shipped it fast. It worked for a while, but eventually it got slow, the architecture started to smell, and features that should have taken hours were taking days. So we killed it and rebuilt from scratch on a proper mf2 stack with real standards. One month, a hundred Linear tickets, over 100,000 lines of code. About half of that was boilerplate from a good template that set the right conventions from day one, 40% was agent-written against those conventions, and 10% was us. The template matters because it gives agents good patterns to copy from the start instead of letting them invent bad ones.

Now we can swap our entire pricing strategy in an afternoon to test whatever we want. A clean codebase isn't some abstract ideal to aspire to; it's the reason you move fast.

prototype, then build properly

There's an interesting pattern from the game Vampire Survivors that I think about a lot. The original version was written in Phaser.js for the browser, basically a sloppy prototype that the creator used to test ideas and iterate on gameplay. Once the game proved itself, a separate team ported the whole thing to C++ for the console release. The creator still prototypes new features in the messy Phaser.js codebase, and once something works well, the team rebuilds it properly in the production codebase.

We ended up doing something similar at Cleve without planning to. The old vibe-coded app was effectively our prototype: we used it to figure out what the product should be, how features should work, what the UX should feel like. Once we understood all of that, we rebuilt it properly. The old codebase was a throwaway exploration and the new one was built with care.

I think we're going to see more of this going forward. Instead of "measure twice, cut once," it's more like "build a throwaway version to figure out what you actually want, then build the real version with proper engineering." Code is cheap enough now that maintaining two versions of something is no longer as insane as it used to sound. The key is knowing which version you're working in, and never shipping the prototype as if it were production code.

plan before you build

Most people open their editor, type "build me X," and let the agent run. That's vibe coding, and it produces code that works in isolation but rots in context.

I spend two to three times more time planning than I do coding. I use Monologue to talk to the model with voice instead of typing, because voice carries a lot more context and nuance than text does. I explain what I want and why I want it, what I've tried before and why it failed, what the constraints are, what docs are relevant. This is where your experience as a builder actually matters: the model doesn't know about your weird auth edge case or that you tried this exact approach last month and it broke everything. You do, and pouring that knowledge into the plan is the most valuable thing you can do.

By the end of that back-and-forth I have a document I can read and judge on its own merits. If it holds up, I let the agent build. If something feels off, I keep iterating on the plan until it doesn't.

Then I always clear the context before coding starts. Context rots. The longer a conversation goes, the more noise the model carries: abandoned approaches, old file states, corrections from three tangents ago. All of that pollutes the next thing the agent writes. The agent should be executing a well-defined plan on a fresh window, not carrying the messy conversation that produced it.

interrogate everything

When the agent writes code, ask it why it made the choices it made. Why this structure? Where did this pattern come from? Why this library and not that one? What happens when the input is empty?

If the model copied a bad pattern from somewhere in your codebase, you've just found something to kill. If it invented something that doesn't exist in your project, make sure it actually should.

It doesn't matter whether a PR comes from a model or a person. Same review, same bar. If you can't explain the code you're about to merge, don't merge it.

And if you can't tell whether code looks good on a first read, that's on you. Good code is obvious when you see it: it's simple, performant, maintainable, and does one thing clearly. Bad code is equally obvious once you know what to look for. That intuition is a skill, and if you don't have it yet, you need to build it before you hand more of your codebase to agents.

Here's a useful test: ask the agent to add a simple feature to your codebase. If it does it quickly and cleanly, your architecture is probably in good shape. If it takes forever or makes a mess, the codebase is the problem and the model is just showing you.

choose your stack carefully

Not all languages perform equally well with AI models, and the differences are bigger than most people realize.

Type-safe languages give the model a feedback loop that actually works. A compiler that rejects bad code is worth more than any amount of documentation, because the model can run the compiler, see the error, and fix it. TypeScript over JavaScript, Rust over C++, anything with a real type checker over anything without. The compiler essentially acts as free, instant code review on every change the agent makes.

Languages with documentation standards built into the source code do even better. Elixir generates docs from code comments and every package in its ecosystem ships with them as a result. C# bakes XML documentation into the syntax itself. Tencent's Autocodebench showed Elixir hitting 97.5% and C# hitting 88.4%, while TypeScript and Rust both sat around 61%. The models write better code in languages where the code describes itself, where there are fewer ways to solve a given problem, and where the documentation lives right next to the implementation.

Colocation matters for the same reason. When your types, docs, and logic all live in the same file, the model reads them together and understands the full picture. When you scatter config across dashboards and external services that the model can't access, it has to guess, and it will guess wrong. This is one of the reasons I built mf2 the way I did: everything is in .ts files, every package exports typed interfaces, and there's nothing hidden in a dashboard that the agent can't see.

what to do

If you take one thing from this post, let it be this: do not tolerate slop. Not from agents, not from your team, not from yourself. If you see bad code, kill it immediately. Don't ask who wrote it, don't put it on the backlog. Kill it now, before it becomes the pattern that everything else copies.

Plan two to three times longer than you code. Talk to the model using voice. Go back and forth until the plan is something you'd actually bet on.
Always clear context after planning. Execute on a fresh window so the agent starts clean.
Kill bad code on sight. Rebuilding is cheap now, so rebuild.
Ask why. If the model can't justify its choices, reject them.
Same bar for every PR. Agent or human, no exceptions.
Use type-safe languages with real compilers. They give the model feedback loops that catch mistakes before you have to.
Put everything in files. If the model can't read it, it can't use it correctly.
Read what gets merged. If you're not reading the code that goes into your codebase, you don't get to complain when it rots.

Code is cheap to write now. So stop shipping the first draft.

March 2, 2026