Yep. The only people I've heard saying that generated code is fine are those who don't read it.
The problem is that the mitigations offered in the article also don't work for long. When designing a system or a component we have ideas that form invariants. Sometimes the invariant is big, like a certain grand architecture, and sometimes itâs small, like the selection of a data structure. You can tell the agent what the constraints are with something like "Views do NOT access other views' state" as the post does.
Except, eventually, you'll want to add a feature that clashes with that invariant. At that point there are usually three choices:
- Donât add the feature. The invariant is a useful simplifying principle and itâs more important than the feature; it will pay dividends in other ways.
- Add the feature inelegantly or inefficiently on top of the invariant. Hey, not every feature has to be elegant or efficient.
- Go back and change the invariant. Youâve just learnt something new that you hadnât considered and puts things in a new light, and it turns out thereâs a better approach.
Often, only one of these is right. Often, one of these is very, very wrong, and with bad consequences.
Picking among them isnât a matter of context. Itâs a matter of judgment, and the models - not the harnesses - get this judgment wrong far too often. I would say no better than random chance.
Even if you have an architecture in mind, and even if the agent follows it, sooner or later it will need to be reconsidered, and if you don't read what the agent does very carefully - more carefully than human-written code - you will end up with the same "code that devours itself".
> Picking among them isnât a matter of context. Itâs a matter of judgment, and the models - not the harnesses - get this judgment wrong far too often. I would say no better than random chance.
Yeah Iâm currently working for several months already on a harness that wraps Claude Code and Codex etc to ensure that these types of invariants are captured and enforced (after the first few harness attempts failed), and - while itâs possible - slows down the workflow significantly and burns a lot more tokens. In addition to requiring more human involvement, of course.
I suspect this is the right direction, though, as the alternatives inevitably lead any software project to delve into a spaghetti mess maintenance nightmare.
I agree with this. I've been writing a new internal framework at work and migrating consumers of the old framework to the new one.
I had strong principles at the outset of the project and migrated a few consumers by hand, which gave me confidence that it would work. The overall migration is large and expensive enough that it has been deferred for nearly a decade. Bringing down the cost of that migration made me turn to AI to accelerate it.
I found that it was OK at the more mechanical and straightforward cases, which are 80% of the use cases, to be fair. The remaining 20% need changes to the framework. Most of them need very small changes, such as an extra field in an API, but one or two require a partial conceptual redesign.
To over simplify the problem, the backend for one system can generate certain data in 99% of cases. In a few critical cases, it logically cannot, and that data must be reported to it. Some important optimizations were made with the assumption that this would be impossible.
The AI tooling didn't (yet) detect this scenario and happily added migration logic assuming it would work properly.
Now, because of how this is being rolled out, this wasn't a production bug or anything (yet). However, asking the right questions to partner teams revealed it and unearthed that some others were going to need it as well.
Ultimately, it isn't a big problem to solve in a way that will mostly satisfy everyone, but it would have been a big problem without a human deeper in the weeds.
Over time, this may change. Validation tooling I built may make a future migration of this kind easier to vibe code even if AI functionality doesn't continue to improve. Smarter models with more context will eventually learn these problems in more and more cases.
The code it generates still oscilates between beautiful and broken (or both!) so for now my artistic sensibilities make me keep a close eye on it. I think of the depressed robot from the Hitchhiker's Guide to the Galaxy as the intelligence behind it. Maybe one day it'll be trustworthy
âThe only people I've heard saying that generated code is fine are those who don't read it.â Are you sure these people arenât busy working rather than chatting? (haha)
But in all seriousness it depends on what youâre doing with it. Writing a quick tool using an LLM is much easier than context changing to write it yourself. If you need the tool, thatâs very valuable.
Also as a webdev, it writes basic CRUD pretty good. I am tired of having to build forms myself and the LLMs are usually really good at that.
Been building a new app with lots of policies and whatnot and instructing a LLM is just much faster than doing the same repetitive shit over and over myself.
And the solution is the same, as when it was outsourced- and the "patch" was fix it by writing spec. Thus i conclude my TED talk with the statement: LLMs are the new outsourcing and run into the same problems.
The swindle goes like this, AI on a good codebase can build a lot of features, you think itâs faster it even seems safer and more accurate on times, especially in domains you donât know everything about.
This goes in for a while whilst the codebase gets bigger and exploration takes longer and failure rate increases. You donât want it to be true and try harder so you only stop after it practically became impossible to make any changes.
You look at the code again and there is so much code spaghetti is an understatement itâs the Chinese wall.
You start workingâŚ, and you realize what was going on
I deleted 75,000 of 140,000 lines of code and I honestly feel like the 3 months I went hard into agentic coding I wasted and I failed my users by building useless features increasing bugs, losing the mental model of my code and not finding the problems I didnât know about the kind of hard decisions you only see when you in the code, the stuff that wanders in your mind for days
I find it interesting that this outcome is a surprise. I don't want this to sound smug, I'm genuinely curious what the initial expectations are and where they come from.
They seem to be different for LLMs, because would anyone be surprised if they handed summary feature descriptions to some random "developer" you've ever only met online, and got back an absolute dung pile of half-broken implementation?
For some reason, people seem to expect miracles from some machine that they would not expect of other humans, especially not ones with a proven penchant for rambling hallucinations every once in a while.
I'd like to know, ideally from people who've been there, why they think that is. Where does the trust come from?
I've never relied on an LLM to build a large section of code but I can see why people might think it is worth a try. It is incredible for finding issues in the code that I write, arguably its best use-case. When I let it write a function on its own, it is often perfect and maybe even more concise and idiomatic than I would have been able to produce. It is natural to extrapolate and believe that whatever intelligence drives those results would also be able to handle much more.
It is surprising how bad it is at taking the lead given how effective it is with a much more limited prompt, particularly if you buy in to all the hype that it can take the place of human intelligence. It is capable of applying a incredible amount of knowledge while having virtually no real understanding of the problem.
I think this is true, but i imagine there's a workflow solution to this which isnt to drop AI.
Eg., treating AI code generated as immediately legacy, with tight encapsulation boundaries, well-defined interfaces etc. And integrating in a more manual workflow.
There's a range from single-shot prompts to inline code generation, that will make more sense depending on the problem and where in the code base it is.
Single-shot stuff is going to make more sense for a protyping phase with extensive spec iteration. Once that prototype is in place, you then prob want to drop down into per-module/per-file generation, and be more systematic -- always maintaining a reasonably good mental model at this layer.
That workflow just sounds exhausting to me. Would I always need to consider how much of a blast radius my AI-generated code might have? Sounds like thereâs so much extra management going into these micro decisions that it ultimately defeats the purpose of generating code altogether.
I could see value in using it during the prototyping phase, but wouldnât like to work like you described for a serious project for end users.
I just don't like to type code anymore. If I can accomplish the same by describing the code, and get the same results as if I typed it myself, I'll opt for not typing so damn much. I've done so much typing in my career, that typing ~80% less to get the same results, makes a pretty big difference in how likely I am to set out to accomplish something.
I care more about code quality now, because typing no longer limits if I feel like it's worth to refactor something or not.
And you have discovered the job of managers! There has always been a lot of hate for managers. Wonder if the robots hate us just as much? (I often feel a weird guilt when I tell an agent to do something I know I am going to throw away but will serve as an interesting exploration...I know if I did that to a human they would be pissed...)
> treating AI code generated as immediately legacy, with tight encapsulation boundaries, well-defined interfaces etc.
This is good advice regardless whether you're using AI or not, yet in real life "let's have well-defined boundaries and interfaces" always loses against "let's keep having meetings for years and then ducttape whatever works once the situation gets urgent".
Were you auto-committing everything without reading the generated code? and if you read it but didn't understand it why not just ask for detailed comments for each output? Knowing that a larger codebase causes it to struggle means the output needs to be increasingly scrutinized as it becomes more complex.
I donât think itâs about what the code does. I think itâs more about how the code fits in its whole context. How useful it is in solving the overarching problem (of the whole software). How well does it follow the paradigm of the platform and the codebase.
You can have very good diffs and then found that the whole codebase is a collection of slightly disjointed parts.
I haven't had the chance to work on large codebases, but isn't it possible to somehow adapt the workflow of Working Effectively with Legacy Code, building islands of higher quality code, using the AI to help reconstruct developer intention and business rules and building seams and unit tests for the target modules?
AI doesn't necessarily have to increase your throughput, it can also serve as a flexible exploration and refactoring tool that will support either later hand crafter code or agentic implementation.
I donât think you truly captured the worst part:
There comes a realization, to many engineerâs horror, that AI wonât be able to save them and they will have to manually comprehend and possibly write a ton of code by hand to fix major issues, all while upper management is breathing down their back furious as to why the product has become a piece of shit and customers are leaving to competitors.
The engineers who sink further into denial thrash around with AI, hoping they are a few prompts or orchestrations away from everything be fixed again.
But the solution doesnât come. They realize there is nothing they can do. Itâs over.
I've set a few rules for working with coding agents:
1. If I use a coding agent to generate code, it should be something I am absolutely confident I can code correctly myself given the time (gun to my head test).
2. If it isn't, I can't move on until I completely understand what it is that has been generated, such that I would be able to recreate it myself.
3. I can create debt (I believe this is being called Cognitive Debt) by breaking rule 2, but it must be paid in full for me to declare a project complete.
Accumulating debt increases the chances that code I generate afterwards is of lower quality, and it also feels like the debt is compounding.
I'm also not really sure how these rules scale to serious projects. So far I've only been applying these to my personal projects. It's been a real joy to use agents this way though. I've been learning a lot, and I end up with a codebase that I understand to a comfortable level.
While this is a legitimate set of rules to follow for maintaining code sanity and a solid mental model of how a codebase may grow, itâs always challenging to stick to them in a workplace where expectations around delivery speed have changed drastically with the onset of AI. The sweet spot lies in striking a balance between staying connected to the codebase and not becoming a limiting factor for the team at the same time.
That's kind of what I figured, sadly. I haven't experienced it personally yet since I got let go from my last job about 14 months ago, but it makes so much sense given how management is so willing to sacrifice quality for speed.
Another frustrating thing that has emerged from this is where managers âvibe codeâ half-baked ideas for a couple of hours and then hand it off as if theyâve meaningfully contributed to the implementation. Suddenly youâre expected to reverse engineer incoherent prompts, inconsistent code, and random abstractions that nobody fully understands.
In their mind theyâve already done the âarchitectural heavy liftingâ and accelerated the team.
More often than not it just adds cognitive overhead where you spend more time deciphering and cleaning up garbage than actually building the thing properly from scratch.
I am lucky to have never worked in a team where my manager wouldn't expect strong push back in this scenario. Many of the corporate environments described on here seem dystopian, this included.
Vouching for this comment because my friend confided in me a week ago that her manager also does this and is like âoh yeah, hereâs 80% done, you just do the rest so we can ship itâ when a large part of it is slop that needs to be rewritten, due to not enough guidance and pushback during generation.
Unless the tests are written against logic that is in of itself subtly wrong and even the structure of code and what methods there are is wrong - so even your unit tests would have to be rewritten because the units are structured badly.
Itâs a valid direction to look in, it just doesnât address the root issue of throwing slop across the wall and also having unrealistic expectations due to not knowing any better.
Yep. Itâs very healthy to be suspicious of code. Any code. Whether generated or not. Thatâs where the bugs are.
If thereâs one thing thatâs disturbing with AI proponent is how trusting they are of code. One change in the business domain and most of the code may have turn from useful to actively harmful. Which you have to rewrite. Good luck doing that well if youâre not really familiar with the code.
This is fine if itâs more enjoyable for you, thatâs whatâs important in personal projects most of the time.
But we donât follow the same things for dependencies, work of colleagues, external services, all the layers down to the silicon when trying to work.
Why is AI suddenly different?
We just have to do this by risk and reward. Whatâs the downside if itâs wrong, and how likely is an error to be found in testing and review? What is the benefit gained if itâs all fine? This is the same for libraries and external services.
A complex financial set of rules in a non-updatable crypto contract with no testing?
A viewer for your internal log data to visualise something?
It is and has always been immensely helpful to understand what you are doing in any context.
There are some programmers who treat the job as just plumbing together what is to them completely incomprehensible black boxes, who treat the computer as a mystery machine that just does things "somehow", but these programmers will almost always be hacks that spend their entire career producing mediocre code.
There are things such a programmer can build, but they are very limited by their lack of in depth understanding, and it is only a tiny fraction of what a more competent programmer can put together.
To get beyond being a hack, you need to understand the entire stack, including the code that you didn't write, including both libraries, frameworks and the OS, and including the hardware, the networking layers, and so forth. You don't have to be an expert at these things by any means, but you do need to understand them and be comfortable treating them as transparent boxes that you may have to go in and fiddle with at some point to get where you need to go. Sometimes you need to vendor a dependency and change it. Sometimes you need to drop it entirely and replace it with something more fit for purpose you built yourself.
AI is different because it's a tool, and the user of the tool is responsible for the work performed.
An outsourced developer isn't a "tool". They're a human being, and responsible for their actions. They're being paid, and they either act responsibly or they get replaced.
A vibe coder is a human using a tool. The human is responsible for code quality, and if it's not good enough, they need to keep using the tool to make it better. That means understanding the tool's output.
If an artist used Photoshop to create a billboard ad that was ugly, they don't get to blame Photoshop. They have to keep using the tool until their output is good.
I'd think that depends on the model of responsibility at play.
For example, suppose I hire a building contractor to build a house, and the electrician he subcontracts makes mistake.
From my perspective, the prime contractor is equally responsible for that mistake regardless of whether he used a subcontractor, or did the work himself but used a broken tool.
This doesn't make the electrician any less of a "person" in the deeply important ways, but it's not a distinction that's relevant to my handling of the problem.
I was trying to follow similar rules, until one day I had to solve a hard mathematical problem. Claude is a phd level mathematician, I am not. I, however, know exactly the properties of the desired solution and how to test itâs correct. So I decided to keep Claudeâs solution over my basic, naive one. I mentioned that in the pull request and everyone agreed that was the right call. Would you open exceptions like that in your rules? What if AI becomes so much better at coding than you , not just at doing advanced mathematics? Would you then stop to write code by hand completely since that would be the less optimal option, despite you losing your ability to judge the code directly at that point (and as in my example, you can still judge tests, hopefully)? I think these are the more interesting questions right now.
Unfortunately, it is not, and many of its attempts at mathematical proofs have major flaws. You shouldn't trust its proofs unless you are already able to evaluate them--which I think is pretty much all the OP is saying.
Trust isnât a binary, and I can trust things I donât understand enough that I can use them. OP was talking about needing to understand, which is quite a bit above the level of being able to validate enough to use for a task.
I definitely wouldn't put math in my code I didn't understand just because Claude says so. I am not astonished that everyone agreed, that's why shit is going to hit the fan pretty badly pretty soon due to AI coding.
There is one exception to this: If the AI also delivers the proof of why the math is correct, in a machine-checked format, and I understand the correctness theorem (not necessarily its proof). Then I would use it without hesitation.
I always found it weird when helping people with excel formulas how few people even try to check maths they don't understand, let alone try to understand it.
I struggle to remember even relatively simple maths like working out "what percentage of X is Y" so if I write a formula like that I'll put in some simple values like 12 and 6 or 10,000 and 2,456 just to confirm I haven't got the values backwards or something. I've been shown sheets where someone put a formula in that they don't understand, checked it with numbers they can't easily eyeball and just assumed it was right as it's roughly in their ball park / they had no idea what the end result should be.
Then again I've also seen sheets where a 10% discount column always had a larger number than the standard price so even obviously wrong things aren't always checked.
I don't disagree, but whoever never put math they don't fully understand in their code gets to throw the first stone.
I've reached solutions by trial and error too, and tried to rationalize them later, quite a few times. And it's easier to rationalize a working solution, however adversarial you claim to be in your rationalization.
I don't see using gen AI for the (not so) âbrute forceâ exploration of the solution space as that different from trial and error and post fact rationalization.
How did you test that the solution is correct? Is the set of possible inputs a low-ish finite number?
Normally with mathematical problems you have to prove the solution correct. Testing is not sufficient, unless you can test all possible inputs exhaustively.
You do realize you can ask Claude about the things you don't understand?
"PhD level" just means you finished a bachelor and masters degree and are now doing a bit of original research as an employed research assistant.
Claude isn't "PhD level" anything. This shows a complete lack of understanding here. Claude has read every single text book in existence, so it can surface knowledge locked away in book chapters that people haven't read in years (nobody really reads those dense books on niche topics from start to finish).
Since Claude has infinite patience, you can just keep asking until you get it.
Iâve also heard it being called âcomprehension debt,â which I like a little more because I think itâs more precise: the specific debt being accrued is exactly a lack of comprehension of the code.
Comprehension debt just sounds like there are things you donât (yet) understand.
Cognition debt means your lack of understanding compounds and the cognition âspaceâ required to clear it increases accordingly.
An increasing comprehension debt that can be paid off one bit at a time within reasonable cognition space takes linear time to clear.
Cognition debt takes exponential time to clear the more of it you have. If it reaches a point where you simply donât have the space for the cognition overhead required to understand the problem, you probably need to start over from your specifications.
I like that too. However, âcognitive debtâ points to the possibility of cognitive overload, that the code can become so complex and inscrutable that it may become impossible to comprehend. âComprehension debtâ sounds a bit weaker in that respect, that itâs just a matter of catching up with oneâs comprehension.
This is great until the "gun to your head" is your skip-level manager demanding that a feature be implemented by the end of the week, and they know you can just "generate it with AI" so that timeline is actually realistic now whereas two years ago it would have required careful planning, testing, and execution.
Your manager is unknowingly helping you create a form of job security for yourself, with all the technical debt and bugs being accumulated.
He might not understand it, and it might not be the type of work you want to do, but someone is going to have to fix those issues. And the longer they wait, the bigger the task gets.
That isn't new, though. Managers often pushed unrealistic timelines and showed lack of care about tech debt well before vibe coding, just the timelines where different, and the magnitude will be bigger this time. But we also have LLMs to help it clean it up faster, I guess.
The question is, is it a job you actually still want once the poo pile reaches critical mass you are the only one with a shovel and the deadline is "yesterday"
That is absolutely true. Unfortunately, this ship has sailed and we are not closing Pandora's box anymore. We'll have to adapt.
But we still hold good cards in hand.
Do they want their pile of steaming slop fixed, or not? Because no amount of complaints about the deadline being "yesterday" are going to change anything about the fact that time will be needed to fix the accrued technical debt, whether they like it or not.. And if AI dug you in that deep to start with, the solution is not to dig deeper.
I suspect some companies are going to find that out the hard (costly) way.
If the manager is unreasonable, you were always going to have a problem with them, eventually. Nothing you can do with fix this.
If manage is reasonable, you can explain to them that there isn't time to check the work of the AI, and that it frequently makes obscure mistakes that need to be properly checked, and that takes time.
At this point, if they still insist you just give it the AI's work, they've made a decision that is their fault. You've done what you can.
And when the shit hits the fan, we're back to whether they're reasonable or not. If they are, you explained what could happen and it did. If they force responsibility on you, they aren't reasonable and were never going to listen to you. That time bomb was always going to go off.
Does that matter that much in practice? I bet lots of costumers are okay with software that crashes 10x as much if it costs 10x less. There already is a ton of shitty software that still sells.
I agree to this though it also depends on the nature of project.
Had a project idea which I coded with the help of AI and it became quite large to a point I was starting to have uncharted areas in the code. Mostly because I reviewed it too shallow or moved fast.
It was a good thing as that project never floated but if I were to do such a thing on my breadwinning project I would lose the joy.
I just had a Claude episode. Instead of trying to fix the bug, it edited the data to hide the bug in the sample run. This kind of BS behavior is not rare. Absolutely, if you do not understand every bit of what's going on, you end up with a pile of BS.
It's not worth fighting it at work. If the idiots you work for want everything vibe coded and delivered at 5 * 2025 speed then just vibe code and try to leave the company ASAP. That's where I am right now. Of course I might end up somewhere just as ridiculous or maybe not be able to even find another job. Shitty times we live in right now.
This is about how I use it. I initially use it to carve out an architecture and iterate through various options. That saves a lot of time for me having to iterate through different language features and approaches. Once I get that, I have it scaffold out, and I go in and tidy things up to my personal liking and standards. From there, I start iterating through implementations. I generally have been implementing stuff myself, but I've gotten better at scaffolding out functions/methods through code instead of text. Then I ask it to finish things off. That falls into your first category of letting it implement stuff that I already know I could do. Not sure if it's faster. But it's lower cognitive load for me, since I can start thinking about the next steps without being concerned about straightforward code.
This all works pretty great. Where it starts going off the rails is if I let it use a library I'm not >=90% comfortable with. That's a good use of these tools, but if I let it plow through feature requests, I end up accumulating debt, as you pointed out.
For my uses, I'm still finding the right balance. I'm not terribly sure it makes me faster. What I do think it helps with is longer focused sections because my cognitive load is being reduced. So I can get more done but not necessarily faster in the traditional sense. It's more that I can keep up momentum easier, which does deliver more over time.
I'm interested in multi agent systems, but I'm still not sure of the right orchestration pattern. These AI tools still can go off the rails real quick.
I always find these kinds of posts interesting, to compare the velocity that people seem to get with Ai, vs what I get by just coding by hand
Coincidentally I've been working on a project for about 7 months now: its a 3d MMO. Currently its playable, and people are having fun with it - it has decent (but needs work) graphics, and you can cram a few hundred people into the server easily currently. The architecture is pretty nice, and its easy to extend and add features onto. Overall, I'm very happy with the progress, and its on track to launch after probably a years worth of development
In 7 months vibe coding, OP failed to produce a basic TUI. Maybe the feature velocity feels high, but this seems unbelievably slow for building a basic piece of UI like this - this is the kind of thing you could knock out in a few weeks by hand. There are tonnes of TUI libraries that are high quality at this point, and all you need to do is populate some tables with whatever data you're looking for. Its surprising that its taking so long
There seems to be a strong bias where using AI feels like you're making a lot of progress very quickly, but compared to manual coding it often seems to be significantly slower in practice. This seems to be backed up by the available productivity data, where AI users feel faster but produce less
This struck me as odd too. 7 months? It wouldnât take that long to write it in a new language.
Another thing I donât see mentioned is code quality.
Vibe-coded code bases are an excellent example of why LLMs arenât very good at writing code. It will often correct its own mistakes only to make them again immediately after and Inconsistent pattern use.
Recently Claude has been making some âinterestingâ code style choices, not inline with the code base itâs currently supposed to be working on.
> There seems to be a strong bias where using AI feels like you're making a lot of progress very quickly, but compared to manual coding it often seems to be significantly slower in practice.
This metric highly depends on who uses the AI to do what, where strong emphasis is on "who" and "what".
In my line of work (software developer) the biggest time sinks are meetings where people need to align proposed solutions with the expectations of stakeholders. From that aspect AI won't help much, or at all, so measuring the difference of man hours spent from solution proposal to when it ends up in the test loops with and without AI would yield... very disappointing results.
But for troubleshooting and fixing bugs, or actually implementing solutions once they have been approved? For me, I'm at least 10x'ing myself compared to before I was using AI. Not only in pure time, but also in my ability to reason around observed behaviors and investigating what those observations mean when troubleshooting.
But I also work with people who simply cannot make the AI produce valuable (correct) results. I think if you know exactly what you want and how you want it, AI is a great help. You just tell it to do what you would have done anyway, and it does it quicker than you could. But if you don't know exactly what you want, AI will be outright harmful to your progress.
Seems to be baked into the GPT- produce text- aka to produce language and code is life and purpose. So the whole system is inherently internally biased towards "roll your own everything?" unless spoke too in a "Senior-dev" language, that prevents these repetitions.
It's more complex than that, I think the reality is that there's a lot of code that's just not that deep bro. I have some purely personal projects that have components that I don't understand anymore, I wrote that shit by hand, they still work but I haven't touched that shit in years. There's a lot of code that AI can write that's like that that helps me, the stuff I would forget about even if I wrote it by hand. I think you have to have discipline in it's use, it's a tool like any other.
AI, and especially agentic AI can make you lose situational awareness over a codebase and when you're doing deep work that SUUUUCKS, but it's not useless, you just have to play to it's strengths. Though my favorite hill to die on is telling people not to underestimate it's value as autocomplete. Turns out 40 gigabytes of autocomplete makes for a fucking amazing autocomplete. Try it with llama.vim + qwen coder 30b, it feels like the editor is reading your mind sometimes and the latency is so low.
> The other change is simpler: I'm doing the design work myself, by hand, before any code gets written. Not a vague doc. Concrete interfaces, message types, ownership rules.
Thatâs the hard part of coding. If you have an architecture then writing the code is dead simple. If you arenât writing the code you arenât going to notice when you architected an API that allows nulls but then your database doesnât. Or that it does allow that but you realize some other small issue you never accounted for.
I do not know how you can write this article and not realize the problem is the AI. Not that you let it architect, but that you werenât paying attention to every single thing it does. Itâs a glorified code generator. You need to be checking every thing it does.
The hard part of software engineering was never writing code. Junior devs know how to write code. The hard part is everything else.
Yes, I think there's 2 kinds of developer. Those who think the code is the hard part, and those that don't.
The developers that thing coding is hard are the ones that absolutely love AI coding. It's changed their world because things they used to find hard are now easy.
Those that think coding is easy don't have such an easy time because coding to them is all about the abstractions, the maintainability and extensibility. They want to lay sensible foundations to allow the software to scale. This is the hard part. When you discover the right abstractions everything becomes relatively easy. But getting there is the hard part. These people find AI coding a useful tool but not the crazy amazing magical tool the people who struggle with coding do.
The OP is definitely in the second camp since they could spot and realise the shortcomings of the AI. They spotted the problem, and that problem is that the AI can't do the hard bit.
I'd say there's another camp: the camp of people who know that code isn't the hard part, but that it's still time consuming to write code. AI coding is pretty useful for that, when you can nail the design but you just need a set of hands to implement it.
I'm classing that as the second camp. Because you don't find it hard to do, it's just time consuming. It means you still know what you're doing and you're just using AI as a tool to accelerate your delivery. That's the optimal way to use in my experience if you want to actually deliver well architected software.
But isnât AI doing the same thing to project management as to coding?
PMs can now cross reference and organize tickets with just a few keystrokes. Organisational knowledge, business knowledge, design systems and patterns, etc all of it is encoded in LLM consumable artefacts. For PMs it is the same switch - instead of having to do it by hand you direct lower level employees to handle the details and inconsistencies and you just do vibe and vision.
When all of the pieces successfully connect and execute reliably, what is left for humans to do? Just direct and consume?
And AI companies with their huge swaths of data are soon gonna be in the situation of being able to do the directing themselves
Such a person is just pushing a giant pile of cleanup work onto their colleagues. Unless they actually checked, the "cross references" are probably wrong in places or just entirely made up. Lower level employees by definition don't have the experience to correct the more subtle inconsistencies, so you've basically just constructed a high pass filter that lets only the worst failures through. Moreover, you're absolutely guaranteed to lose the respect of those lower level employees--forcing someone else to clean up your sloppy work is just cruel, and people resent being treated cruelly.
this pretty much sums up what i feel about AI currently. It made my life significantly easier for most tasks I already breeze through, yet tasks I used to struggle with are still the equally difficult
I agree with what you're saying, but I think we do have a problem right now with definitions where there's a lot of people basically getting supercharged tab completions or running a chatbot or two in a parallel pane, but still clearly reviewing everything; and on the other side of things is freaking Steve Yegge pitching a whole new editor that lets you orchestrate a dozen or more agents all vibing away on code you're apparently never going to read more than a line or two of: https://steve-yegge.medium.com/welcome-to-gas-town-4f25ee16d...
The first group are still thinking fairly deeply about design and interfaces and data structures, and are doing fairly heavy review in those areas. The second group are not, and those are the ones that I find a bit more worrisome.
> The first group are still thinking fairly deeply about design and interfaces and data structures, and are doing fairly heavy review in those areas.
I can't speak for others, but I'd go further and say that LLMs allow me to go deeper on the design side. I can survey alternative data structures, brainstorm conversationally, play design golf, work out a consistent domain taxonomy and from there function, data structure and field names, draft and redraft code, and then rewrite or edit the code myself when the AI cost/benefit trade off breaks down.
Thatâs a little bit of a No True Scotsman. Yes there are people who do not review anything; but even people who are reviewing every line from an LLM do not have the same understanding as someone who wrote it themselves.
Iâm not making a judgement call about which is better, but it was widely accepted in tech before the advent of LLMs that you just fundamentally lack a sense of understanding as a reviewer vs an author. It was a meme that engineers would rather just rewrite a complicated feature than fix a bug, because understanding someone elseâs code was too much effort.
> and on the other side of things is freaking Steve Yegge pitching a whole new editor that lets you orchestrate a dozen or more agents all vibing away on code you're apparently never going to read more than a line or two of
I find it useful to not listen to people who just talk.
That blog post is surreal. It's like cryptocurrencies and the whole web3 nonsense. Cryptocurrencies basically don't work, so there have been a hundred aimless attempts at fixing self inflicted problems caused by deficiencies of cryptocurrencies with no actual goal that has any impact on the real world.
It's the same thing here. AI has dropped the cost of software development, so developers are now fooling themselves into producing low or zero value software. Since the value of the software is zero or near zero, it doesn't really matter whether you get it right or not. This freedom from external constraints lets you crank up development velocity, which makes you feel super productive, while effectively accomplishing less than if you had to actually pay a meaningful cost to develop something.
Like, what is the purpose of Gas Town? It looks to me like the purpose of Gas Town is to build Gas Town.
> The first group are still thinking fairly deeply about design and interfaces and data structures, and are doing fairly heavy review in those areas
I worry about the first group too, because interfaces and data structures are the map, not the territory. When you create a glossary, it is to compose a message, that transmit a specific idea. I find invariably that people that focus on code that much often forgot the main purpose of the program in favor of small features (the ticket). And that has accelerated with LLM tooling.
I believe most of us that are not so keen on AI tooling are always thinking about the program first, then the various parts, then the code. If you focus on a specific part, you make sure that you have well defined contracts to the orther parts that guarantees the correctness of the whole. If you need to change the contract, you change it with regard to the whole thing, not the specific part.
The issue with most LLM tools is that theyâre linear. They can follow patterns well, and agents can have feedback loop that correct it. But contracts are multi dimensional forces that shapes a solution. That solution appears more like a collapsing wave function than a linear prediction.
Iâve noticed that agents almost always fail at the planing vs execution stage.
I follow the plan -> red/green/refactor approach and it is surprisingly good, and the plans it produces all look super well reasoned and grounded, because the agent will slurp all the docs and forums with discussions and the like.
Trouble is once it starts working there would inevitably be a point where the docs and the implementation actually differ - either some combination of tools that have not been used in that way, some outdated docs, or just plain old bugs.
But if the goals of the project/feature are stated clearly enough it is quite capable of iterating itself out of an architectural dead end, that is if it can run and test itself locally.
It goes as deep as inspecting the code of dependencies and libraries and suggesting upstream fixes etc. all things that I would personally do in a deep debugging session.
And Iâm supper happy with that approach as Iâm more directing and supervising rather than doing the drudgery of it.
Trouble is a lot of my team mates _dont_ actually go this deep when addressing architectural problems, their usual mode of operandi is âescalate to the architectâ.
This will not end up good for them in the long run I feel, but not sure what they can do themselves - the window of being able to run and understand everything seems to be rapidly closing.
Maybe thatâs not super bad - I donât exactly what the compiler is doing to translate things to machine code, and I definitely donât get how the assembly itself is executed to produce the results I want at scale - that is level of magic and wizardry I can only admire (look ahead branching strategies and caching on modern cpus is super impressive - like how is all of this even producing correct responses reliable at such a a scale âŚ)
Anyway - maybe all of this is ok - we will build new tools and frameworks to deal with all of this, human ingenuity and desire for improvement, measured in likes, references or money will still be there.
This is the only way for me to use Agents without completely hating and failing at it. Think about the problem, design structures and APIs and only then let AI implement it.
This is what seems to be lost on so many. As someone with relatively little code experience, I find myself learning more than ever by checking the results and what went right/wrong.
This is also why I don't see it getting better anytime soon. So many people ask me "how do you get your claude to have such good output?" and the answer is always "I paid attention and spotted problems and asked claude to fix them." And it's literally that simple but I can see their eyes already glazing over.
Just as google made finding information easier, it didn't fix the human element of deciphering quality information from poor information.
Looking at code looking for errors is a hard thing to do well for a large amount of code. A better approach is to ensure tests cover all the important cases and many edge cases. Looking at the code may still be a good idea but mostly to check the design. I think that once you get Claude to test the code it writes well, trying to find errors in the code is a waste of time. Iâve made the mistake of thinking Claude was wrong many times despite the tests passing just to be humbled by breaking the tests with my âimprovementsâ!
And when you got familiar with the other parts, you realize that writing code is the most enjoyable one. More often than not, youâre either balancing trade offs or researching what factors yoy have missed with the previous balancing. When you get to writing code, itâs with a sigh of relief, as that means you understand the problem enough to try a possible solution.
You can skip that and go directly to writing code. But that meant you replaced a few hours of planning with a few weeks of coding.
Very well said. When I'm working on a hard problem I'll often spend a few weeks sweating details like algorithms, API shapes, wire formats, database schemas, etc. These things are all really easy to change while they're just in a design document. Once you start implementing, big sweeping edits get a lot more difficult. So better to frontload as much of that as possible in the design phase. AI coding agents don't change this dynamic. However all that frontloaded work pays off big when it does come time to implement, because the search space has been narrowed considerably.
> doing the __design work__ myself, by hand, before any code gets written.
So... Claude still is generating the code I guess?
And seriously, I can't understand that they thought their vibe coded project works fine and even bought a domain for the project without ever looking at source code it generated, FOR 7 MONTHS??
> And the goal of the article is to draw attention to their project.
Additionally, they couldn't even bother to write their own blog post, so it's a little hard to take them seriously when they say they're going to write their own code...
> Claude (c) by Anthropic (R) is the best thing since sliced bread and I'm Lovin' It(tm)! Here's a breakdown of you too can live a code free life for 10 easy payments of $99.99 a month if you subscribe now!
> Step one in your journey to code free life: code the whole damn project and put it together yourself
It's so much fluff and baloney and every single article is identical. And every single one is just over the top praise of Claude that doesn't come off as remotely authentic. There's always mentions of Claude "one shotting"(tm) something.
I bought domains for projects minutes after the idea.
I donât think itâs that weird to not look at the code if itâs a side project and you follow along incrementally via diffs. Itâs definitely a different way of working but itâs not that crazy.
The article explicitly says that the author looked at the diffs; it distinguishes this from "sitting down and actually reading the code", which they didn't do. So when plastic041 says the author spent 7 months vibe coding "without ever looking at source code", it's not unreasonable for dewey to assume that "looking at source code", in this context, actually means something stronger and excludes just looking at the diffs.
I feel like Iâm watching developers speed run project and product management learnings.
Weâve moved to seeing that specs are useful and that having someone write lots of wrong code doesnât make the project move faster (lots of times devs get annoyed at meetings and discussions because it hinders the code writing, but often those are there to stop everyone writing more of the wrong thing)
Weâve seen people find out that task management is useful.
Now more Iâm seeing talk of fully doing the design work upfront. And we head towards waterfall style dev.
Then weâll see someone start naming the process of prototyping, then Iâm sure something about incremental features where you have to ma age old vs new requirements. Then talk of how really the customer needs to be involved more.
Genuinely, look at what projects and product managers do. They have been guiding projects where the product is code yet they are not expected to read the code and are required to use only natural language to achieve this.
So right. All these guys have never been managers. Do you think humans don't write things that break? Or that teams sometimes take a wrong path and burn a week of work? Or months? Well now you can experience all of that in 30 minutes of vibecoding. As a former tech product manager, it feels EXACTLY the same.
I ran quite early into the same issues with my rust pet projects. Single structs with tons of Option<T> and validation methods etc. enums for type fields combined with says optional fields in the same layer so accessor methods all return Option<T>.
I add now a long list of instructions how to work with the type system and some doâs and donâts. I donât see myself as a vibe coder. I actually read the damn code and instruct the ai to get to my level of taste.
Would you be interested in sharing your findings? I'm currently experimenting with LLM-generated rust and honestly think it works quite well, however I'm looking for ways to improve the "taste" of the agent.
> Vibe-coding makes you feel like you have infinite implementation budget. You don't. You have infinite LINE budget (the AI will generate as much code as you want). But you have the same finite complexity budget as always.
This is a special case of a general fundamental point I'm struggling with.
Let's assume AI has reduced the marginal cost of code to zero. So our supply of code is now infinite.
Meanwhile, other critical factors continue to be finite: time in a day, attention, interest, goodwill, paying customers, money, energy.
So how do you choose what to build?
Like a genie, the tools give us the power to ask for whatever we want. And like a genie, it turns out we often don't really know what we want.
Right - knowing what to actually build always has been and always will be the limiting factor to actual success. I could spend months and hundreds of dollars generating the absolute BEST todo list that's out there but nobody wants that.
I have vibe coded 3 applications I never had time to code but always wanted.
Now it is different in a way where now I donât have time to use those apps.
Thatâs a joke.
But I do believe it answers the question of âwhat to build?â. If you didnât have time before LLM assisted coding you still donât have time for it. You most likely know what is used and what not already by heart or by some measurements.
Personally, i've taken a serious step back from 'unsupervised' vibe-coding. When the codebase is clean and you want some additional fix or small feature, Claude is quite good at mimicking your style and does a pretty good job.
When asking for a new major feature, despite hard guidelines and context (that eat half your context window), then it quickly ships bloat. The foundations are not very well organized and this is where you acknowledge it is all about random-prediction of the next word-thing.
Overall, i've wasted more time reviewing the PR and trying to steer it properly than I expected. So multi-layer agent vibe coding is no longer the way to go *for me*. Maybe with unlimited tokens and a better prompt, to be investigated...
This reads too much like it was LLM generated. I can't say for sure if it was but I have an allergic reaction to the short snappy know-it-all LLM writing style.
We're still in the early ages and must discern hard what AI is good for, what it can maybe do, what it could potentially do and what it just can't do, and move those threshold marks very conservatively. AI is also cheap enough that it's worth shots of experiments. As long as you don't really rely on AI it's easy to test the capabilities of this new conversational autocomplete, and the random gains it offers can be magnificent (except when they aren't, of course).
What has generally worked for me is paraphrasing the old adage "Write the data structures and the code will follow" over to AI. Design your data, consider the design immutable and let the AI try fill in the necessary code (well, with some guidance). If it finds the data structures aren't enough, have it prompt you instead of making changes on its own. AI can do lot of the low-hanging fruit and often the harder ones as well as long as it's bound to something.
Yet, for now, AI at best has been something that relieves me from having to write a long string of boring code: it's not sustainable to keep developing stuff relying on AI alone. It's also great when quality is not an issue; for any serious work AI has not speeded me up noticeably. I still need to think through the hard parts, and whatever I gain in generating code I lose in managing the agents. But I can parallelise code generation, trying new approaches, and exploring out because AI is cheap. AI is also pretty good for going through the codebase and reasoning about dependencies whether in the context of adding a new feature or fixing a bug: I often let AI create a proof-of-concept change that does it, then I extract the important bits out of that and usually trim down the diffs down to at least 1/3 or less.
AI further helps with non-work, i.e. tasks that you have to do in order to fulfill external demands and requirements, and not strictly create anything solid and new. I can imagine AI creating various reports and summaries and documentation, perhaps mostly to be consumed and condensed by another AI at the receiving end. Sadly, all of this is mostly things not worth doing anyway.
Overall, I cringe under all the hype that's been laid on AI: it's a new tool that's still looking for its box or niche carveout, not a revolution.
I don't think we're in the early ages... LLMs technology has essentially stagnated since GPT3.5, we just have bigger models that can handle more context. We're trying to cope for the lack of progress of the actual technology by coming up with contraptions of multiple models stuck together, Mixture-of-Experts, Reviewer models, PM models...
I'm moving very slowly into AI coding. I'm not comfortable enough to let Claude do anything big. What I do is this: I set out general architecture, create function stubs and add comments on how to implement things. Then I let Claude do 10 minutes of work and I check everything and refactor some of it. It saves me on boring implantation stuff (like, is this an array, move an index here or there, check for whatever exists or not, put it in the db)
A problem often ignored is that while AI is trained on human written code, how it writes is different in practice.
Will that improve or get worse? One would argue that LLMs in general are drastically more competent now than they were a couple years ago, theyâre also much better at coding. Weâre likely just now entering the era where they can code but are still not what youâd fully expect, or at least not what someone with absolutely no coding knowledge could use to code at the same level as someone who does know how to code.
Maybe that changes as the models improve, maybe it doesnât, only time will tell.
Im exploring currently if i should split up a project into a framework part and the game itself (2d, idle game).
The framework could be an isolation later against viberod but not sure if its necessary for my small project i always wanted to do and never done anything with it.
For another tool, i will try another approach: Start with a deep investigation and spec write together with AI, than starting with the core architecture layout and than adding features.
So instead of just prompting "write a golang project with a http server serving xy, and these top 3 features" i will prompt "create a basic golang scarfold for build and test" -> "create a basic http server with a basic library doing xy" -> "define api spec" -> "write feature x"
There is kind a skill and depth to vibe coding though.
I'm thoroughly enjoying using AI to write code, but it paid off by years of doing things the hard way before. I already was a so called "10x developer" if I speak for myself. I'm doing things even faster now with AI.
So what you really mean is you are going to do better and more detailed skills files so you can get an architecture that you've thought through rather than something random?
Partly, but the order matters. The CLAUDE.md constraints only work if you designed the architecture first. They're just how you communicate it to the AI. The mistake I made wasn't writing bad skills files, it was not designing anything at all and expecting the AI to make coherent structural decisions across 30 sessions.
The rewrite is me sitting down with a blank doc and drawing the boxes before any code exists. Then the CLAUDE.md enforces what I already decided. Whether that actually holds up as the project grows, I genuinely don't know yet.
Are you really saving any time at all using AI at all then? If you have to write the architecture for it, write all the rules you want it to follow, check everything it's written, and then reprompt it because it's not how you want it?
Yes. I do all of this and I'd estimate 50-100% coding time savings. A lot of that comes from better multitasking over single-workstream throughput, which I suppose might compromise the gains depending on what you're doing. For me it amplifies the speedup by allowing some of my "coding time" to be spent on non-coding tasks too.
I could be wrong in some subtle way I'm not seeing, but I believe the model we're working in avoids the downsides. I actually think my review bar is slightly higher now, because I don't feel as much pressure to compromise my standards when I know Claude is capable of writing the code I want.
Can't you just ask AI to break up large files into smaller ones and also explain how the code works so you can understand it, instead of start over from scratch?
That was actually the first thing I tried. It did a good jov at explaining the code base mess and the architecture. Then I ran 3-4 refactor attempts. Each one broke things in ways that were harder to debug than the original mess. The god object had so many implicit dependencies that pulling one thread unraveled something else. And each attempt burned through my daily Claude usage limit before the refactor was stable.
And I'm sure the rewrite is going to teach me a whole different set of lessons...
Not sure why good coverage wouldn't mitigate risk in a refactor...
My mantra whenever I'm working with AI is that I want it to know what "point b" looks like and be able to tell by itself whether it's gotten there...
If you have a working implementation, it sounds like you have a basis for automated tests to be written... once you have that (assuming that the tests are written to test the interface rather than the implementation), then it should be fairly direct to have an agent extract and decompose...
I'm currently working on the discovery phase of a larger refactor and have pretty quickly realized that AI can actually often be pretty useless even if you've encoded the rules in an unambiguous, programmatic way.
For example, consider a lint rule that bans Kysely queries on certain tables from existing outside of a specific folder. You'd write a rule like this in an effort to pull reads and writes on a certain domain into one place, hoping you can just hand the lint violations to your AI agent and it would split your queries into service calls as needed.
And at first, it will appear to have Just Workedâ˘. You are feeling the AGI. Right up until you start to review the output carefully. Because there are now little discrepancies in the new queries written (like not distinguishing between calls to the primary vs. the replica, missing the point of a certain LIMIT or ORDER BY clause, failing to appropriately rewrite a condition or SELECT, etc.) You run a few more reviewer agent passes over it, but realize your efforts are entirely in vain... because even if the reviewer agent fixes 10 or 20 or 30 of the issues, you can still never fully trust the output.
As someone with experience in doing this kind of thing before AI, I went back to doing it the old way: using a codemod to rewrite the code automatically using a series of rules. AI can write the codemod, AI can help me evaluate the results, but actually having it apply all of the few hundred changes automatically led to a lack of my ability to trust the output. And I suspect that will continue to be true for some time.
This industry needs a "verification layer" that, as far as I know, it does not have yet. Some part of me hopes that someone will reply to this comment with a counterexample, because I could sorely use one.
When people talk about codebases being "incomprehensible", it's not always hyperbole. Sometimes the architecture literally cannot be broken up or understood.
When you see some legacy C++ codebase with millions of lines of code, catching cancer and slowly dying from it is more human than trying to unscrew that mess.
A really screwed code base blows out your context window and just starts burning tokens as the AI works out a way to kill -9 itself to escape the hell you're subjecting it to.
While I mostly agree - science is built up on truths. Code has a large amount of creativity and freedom built into the decisions, some codebases will be documented, follow rigorous training, and design decisions. Others will just be an absolute legacy mess of 20 years of odd decisions made by people who may have not known what they were doing. Like an art piece that you donât really âunderstandâ.
I don't bother trying to give the LLM a set of dos and don'ts for how to write the code, that becomes a frustrating game of whack-a-mole. I find it a lot more efficient to have it write some code, look it over, and if I'm not happy with some of the decisions give it specific instructions for how to fix that one part. as a bonus I end up reinforcing my knowledge of the code base in the process.
I don't understand the people who "get the agent to do everything" for them. It just makes a mess if you do that. Yet if I spend a little bit of time setting a project up properly (including telling my minions exactly what to do) I can then get it to do the boring things for me.
The very worst things you can do in a codebase are (a) not deeply understand how it works (have it be magic) and (b) be lazy and mess up the structure.
How do you fix a problem which happens at 2:00am and takes your system down if you don't have an excellent understanding of how it works?
Over time we're already bad at (a) because most developers hate writing documentation so that knowledge is invariably lost over time.
I don't think the prompts that the author has proposed will actually work. Including final scope and non-scope is good but it's more of a reaction of what the AI already did. These prompts are suitable for a rewrite, basically, since it's unlikely anyone would have had these ready when they start out.
I have found small iterations to have the best results. I'm not giving AI any chance to one shot it. For example, I won't tell it to "create a fleet view" but something more like "extract key binding to a service" so that I can reuse it in another view before adding another view. Basically, talk to the AI as an engineer talking to another engineer at the nitty gritty level that we need to deal with everyday, not a product person wishing for a business selling point to magically happen.
> I'm rewriting k10s in Rust. Not because Rust is better but, because it's the language I can steer. I've written enough of it to feel when something's wrong before I can articulate why. That instinct is the one thing vibe-coding can't replace. The AI hands you plausible-looking code. You need a nose for when it's garbage.
Isn't Golang relatively easier to read than Rust? I was under the impression that Rust is a more complex language syntactically.
> The other change is simpler: I'm doing the design work myself, by hand, before any code gets written. Not a vague doc. Concrete interfaces, message types, ownership rules. The architecture decisions that the AI kept making wrong are now made in writing before the first prompt.
This post is good to grasp the difference between "vibe-coding" and using the AI to help with design and architectural choices done by a competent programmer (I am not saying you are not one). Lately I feel that Opus 4.7 involves the user a lot more, even when given a prompt to one-shot a particular piece of software.
Go reads fine whether the architecture is good or bad, and I couldn't tell the difference until I was in trouble. Rust is harder to read but harder to misuse. The borrow checker would have caught that data race at compile time. I've also just written more Rust. That familiarity matters separately.
+1 on Open 4.7 involving the user a lot more. Rn I'm trying to get to a state where I can codify my design + decision preferences as agents personas and push myself out of the dev loop.
Buddy that k10s code was never good. Go vs Rust is not the issue here, itâs the fact the project was vibe coded without reading anything. Itâs hilarious to even think that a god model was caused by anything other than someone who let the bot choose too much.
Good architecture in any language is obvious to someone who is experienced and cares.
Go is actually great for bots to write if youâre actually thinking.
Right, thank you. Personally I think reading all the code that the AI produces is impossible and kind of defeats the purpose of using it. The key is to devise a structured way to interact with it (skills and similar) and use extensive testing along the way to verify the work at all steps.
> Isn't Golang relatively easier to read than Rust? I was under the impression that Rust is a more complex language syntactically
It sounds like the author knows Rust, and might not be as familiar with Go.
A language that you are proficient in is always going to be easier read than one you donât, even if it is an objectively easier language to to read in general.
In a world where juniors (or seniors in new territories) are incentivized to publish or perish, how will any of us gain proficiency any more? I can see the agent assisted journey accelerating some familiarity, but not proficiency.
Iâve used AI tools to do i18n translations to Spanish and Portuguese (somewhat ashamed to admit this). Iâve grown more familiar with the structure of these languages, and come to recognize some of the common vocabulary for our agtech domain. If anything, I feel more clueless about both languages now than I did before, when it comes to any sort of proficiency.
I am aware of layoffs that are really caused by AI.
Translation and asset generation teams for enterprise CMS, whose role has now been taken by AI.
Likewise traditional backend development, that was already reduced via SaaS products, serverless, iPaaS low code/no code tooling, that now is further reduced via agents workflow tooling, doing orchestration via tools (serverless endpoints).
the ship has sailed on my handcoding at work. the AI is producing stuff thats more bulletproof than what I can do in the same timeframe and if my competitors are using it, the pressure to ship is that much higher.
Personally, I've taken the time its freed up to spend more time on mathacademy and reading more theory oriented books on data structures and algorithms. AI coding systems are at their best when paired with someone with broad knowledge. knowing what to ask for and knowing the vocabulary to be specific about what you want to be built is going to be a much more valuable job skill going forward.
One example is a small AI based learning system I have been developing in my free time to help me learn. the mvp stored an entire knowledge graph and progress in markdown files. being an engineer, I knew this wouldn't scale so once I proved the concept viable, I moved everything into sqlite with a graphdb. then I decided to wrap some parts of teh functionality in to rust and put everything behind a small rust layer with the progress tracking logic still being in python.
someone with no knowlege of graph databases or dependncy graphs or heuristics would not be able to build this even if they had AI. they simply don't know what they dont' know and AI wont' save you there.
That said, I think its important to also spend time in the dirt. I've recently started pickign up zig as my NO AI langauge just to keep. those skills sharp.
I've been relying primarily on deepseek-v4-flash for 90% of my work. It sips tokens. that model will run on 128gb. not a cheap configuration for a consumer but within the budget of a developer relying on it for work.
Ive only been using kimi 2.5 and deepseek pro for reviewing PRs for security issues.
I think the issue is overblown by people who think claude code is a good harness and use opus for everything. opencode is objectively better. its much more verbose about what its doing, you have more control when it comes to offloading to subagents with targeted context (crucial for running through larger jobs) and I can swap between codex and open weight models.
> I typed :rs pods to switch back to the pods view. Nothing rendered. The table was empty...
> now something was fundamentally broken and I couldn't just prompt my way out of it.
Hey I don't want to over simplify, I'm sure it was complicated, but did the author have functional tests for these broken views? As long as there are functional tests passing on the previous commit I'd have thought that claude could look at the end situation and work out how to get the desired feature without breaking the other stuff.
TUIs aren't an exception, it's still essential to have a way to end-to-end test each view.
The problem wasn't the view didn't work. The problem was the view didn't work after something else had been done.
You can't test every permutation of app usage. You actually need good architechture so you can trust your test and changes to be local with minimal side-effects.
I think the answer here is to not use Claude with bubble tea. I tried the same thing and got the same result. But it seems to be limited to that specific framework, because it's really good at not doing the same thing with SolidJS.
While I felt this in 2025, I do not feel this in 2026. I use Claude and the rest with BubbleTea all the time.
But I will say... you have to know Golang. You have to have at least tried to make a BubbleTea app yourself and try to understand ELM architecture. You have to look at the code and increment with it.
It makes total sense for OP to switch to Rust and Ratatui if they don't know Golang well. But I don't think it's a better language for it. [Ratatui has brought me great inspiration though!]
Independent of framework, the LLMs get the spacial relationships. I say things like "the upper right panel's content is not wrapping inside and the panel's right edge should extend to the terminal edge" and the LLM will fix it. They can see the resultant text, I'm copy-pasting all the time.
TUI code is finicky; one mis-rendered component mucks everything up. The LLMs will decide themselves make little, temporary BubbleTea fixtures to help understand for itself when things aren't right.
The only real problem with LLMs and BubbleTea is that upon first prompt, they insist on using BubbleaTea v1 versus BubbleTea v2, released in December 2025. But then you just point it to the V2_UPGRADE.md and it gets back on track. That will improve as training cutoffs expand.
I vibe-coded this TUI for Mom's last night. I actually started with Grok (who started with v1) and then moved into Claude Code after some iteration:
This is Claude's problem.
Compared to GPT-5.5, Claude Code prefers to take shortcuts. I've tested having codexapp GPT-5.5 and Claude Code opus4.7 do the same thing - if following GPT-5.5's requirements, Claude Code's execution time for a task would stretch from 5 minutes to 40 minutes.
To solve macro architecture problems, I use Lisp to write the entire program's framework. Lisp replaces architecture documents, because I believe it has high semantic density, syntax restrictions, and checkers for assistance.
This way, at least I didn't have to rework anything anymore. I used this method to refactor my 20+ projects
I'm not sure we'll ever really be free of the GIGO (garbage in / garbage out) principle. Tools will get better and better, but can never be a substitute for a deep understanding of the thing we want to create.
AI writes what you ask it to write, you need to talk to it about architecture. You should have an architecture doc so AI can shape the code based on that, you can get the AI to make the architecture doc also. If using claude you can use the software architecture mode for this.
A coder typing in code is not solely to generate outcome. It's part of ongoing thinking process. Without this ongoing process, we have no material to keep iterating forward.
What has really made AI coding be able to continue to work as the project got bigger was using speckit. It has been great at keeping the code consistent across features.
I'm just wondering: you know what architecture you want to go to now and you have the tests... can't you just let Claude refactor it to the better architecture?
Also 1600 lines... didn't any agent reviewing the diffs point that out?
You're also adding a lot to claude.md, I dunno how much that file has grown but a big claude.md file with many instructions, I don't think the ai will be able to remember all those rules.
I'm just wondering: you know what architecture you want to go to now and you have the tests... can't you just let Claude refactor it to the better architecture?
Also 1600 lines... didn't any agent reviewing the diffs point that out?
You're also adding a lot to claude.md, I dunno how much that file has grown but a big claude.md file with many instructions, I don't think the ai will be able to remember all those rules
i try to write one portable shell script per day; using AI would take all the fun out of it, so i never started using it. i honestly find it ridiculous that anyone uses it to write code, it just doesn't make sense to me.
LLMs assist those of us who were apt to take blocks of code from StackOverflow, or wherever, to solve problems quickly and avoid as much of the aggravating and slow toil of trial and error as possible.
That trial and error process is still happening with a LLM, but much faster, and with instantaneous cross-references to various forms of documentation that I would be looking up myself otherwise. It produces code of a quality that is dependent on the engineer knowing what they want in the first place and prompting for it and refining its output correctly.
It's the exact same process of sculpting code that the majority of the industry was doing "by hand" prior to the release of LLMs, but faster, and the harnesses are only getting better. To "vibe code" is to prompt vaguely and ignore the quality of the output. You're coming to a forum full of professionals and essentially telling us that you're getting really frustrated with your Scratch project.
I don't know if you're trying to lead a charge or whatever but good luck with that. As a senior SWE, it is clear to me that this is the new paradigm until something better than LLMs comes along. My workflows and efficiency have been vastly improved. I will admit that I have never really been a "I made a SMTP server in 3k of Rust" kind of guy, though.
It's pretty simple to vibe code for months without producing slop. And it's the same recipe one used before AI:
1. make it work
2. make it pretty
3. make it fast
Omit 2. and 3. long enough -> slop beyond recovery
You don't need to go back to coding by hand if you know how to do it already. There is a middle ground.
If you understand good software architecture, architect it. Create a markdown document just as you would if you had a team of engineers working with you and would hand off to them. Be specific.
Let the AI do the implementation of your architecture.
Vibe coding works great with test driven development. You can have AI write the tests as well, but you need to confirm yourself because it's lying all the time. AI coding is like when you first started out, it's copying random bits and pieces from the web into your code until it works... Good for one shots and proof of concept. But for any long living project I think you are better off rewriting it from scratch yourself. Abstractions let you work faster, especially when you have it all in your head.
This. I definitely agree with this statement at this point in AI-assisted development. This gets at the "taste" factor that is still intrinsically human, especially in software engineering. If you can construct and guide the overall architecture of an application or system, AI can conceivably fill in the smaller feature bits, and do so well. But it must have a strong architecture and opinionated field in which to play.
My main takeaway, too. Been using Claude on my side project that I have singlehandledly been working on for three years. It works well initially, you catch all of AIs mistakes or unfavorable approaches because you know the architecture in and out. But as you stop thinking about the new features, stop losing touch with all the stuff AI throws at you, you fail to develop intuitive feeling on when and how to abstract and introduce architecture.
Another note was for me e2e tests; while AI can write them it never comes up with just basic organization or abstraction required to manage a large e2e test suite with hundreds of tests. It immediately starts to produce spaghetti code.
Let me preface my comment by saying I also still write a lot of code by hand - especially when it's something I know I need to understand in depth, and in some cases defend.
With that said, this caught my eye:
> AI gravitates toward single-struct-holds-everything because it satisfies the immediate prompt with minimal ceremony.
This is too general. "AI" is used here as a catch-all, but in fact, it was the specific model under the specific conditions you ran your prompt, including harness, markdowns, PRDs, etc. So it's not fair to say "AI does X!" in this case.
It's also very much up to you. It's very common to have a frontier model plan an architecture before you have another model implement code. If you're just one-shotting an LLM to do everything you get mediocre, more brittle code.
This stuff is still being figured out by a lot of people. But I feel the core of the issue is not using AI well. Scoping, task alignment, validation, are crucial.
The title is just flat out wrong. The author isn't going back to writing code by hand, they're plopping some new stuff into their CLAUDE.md to "fix" the issues they see AI is having.
But in my main work, reverse engineering, LLMs are godsend, for years now.
You can basically bruteforce binary obfuscation thanks to them. And thanks to eager chinese LLM providers, basically for free.
But I always use LLM only for boring work and rest is for me to do manually, or with scripts of course, but made by me. Because I want to learn.
Yes, there are a lot people using LLMs for full RE automation since they're selling exploits for profit. No problem with me.
I see funny future for huge corporations like Adobe, etc.
Imagine prompt, "Hey Claude, re-implement Adobe Photoshop with clean-room design" One agent will open decompiler, outputs complete low level technical details how is everything implemented.
Second agent implements new Photoshop based on that.
They will be mad and I like this.
You will own nothing, and you will be happy, corpos.
another behavior I noticed is that even you plan with an agent than a lot of business logic leaks to the code.
some states, for an example, are meant to be assumed from the data shape, rather than the actual state fields, but damn they like adding a state field.
>"I'm doing the design work myself, by hand, before any code gets written."
This is what I was doing right from the beginning. AI just fills out methods and doing other low intelligence work. Both are happy. My architectures and code are really mine, easy to read and reason. AI gets paid and does not get a chance to fuck me in the process. At no point I felt any temptation to leave "serious" to AI.
I donât really think OP is writing code themselves since they admit they still use agents for code gen. Iâve really scaled back the amount I use agents though because in the medium to long term I havenât been getting good results with them. And itâs not enjoyable. Thatâs enough for me, Iâll do whatever for a job because who cares, if the company wants slop I will gladly give them that, but for my own shit Ive gone back to circa 2024 and am mostly just using them as a chatbot.
Inb4 âyouâre gonna be replacedâ god damn it I hope so, I do not want to spend the rest of my life behind a computer screenâŚ
I feel like this article was circling a point it never actually got to. All the advice in here (except controlling scope creep) is specific to a TUI with an elm like architecture.
But here's the thing, you almost never know what the architecture is up front. If you do you probably aren't the one writing the actual code anymore. Writing the code, with or without an AI is part of the design process. For most people it isn't until they've tried several times, fucked it up a bunch, and refactored or rewrote even more that you actually know what the architecture needs to be.
7 months ago was early November. Coding assistants were getting very good back then, but they were still significantly poorer at making good architectural decisions in my experience. They tended to just force features into the existing code base without much thought or care.
Today I've noticed assistants tend to spot architectural smells while working and will ask you whether they should try to address it, but even then they're probably never going to suggest a full refactor of the codebase (which probably is generally the correct heuristic).
My guess is that if you built this today with AI that you wouldn't run into so many of these problems. That's not to say you should build blind, but the first thing that stood out to me was that you starting building 7 months ago and coding assistants were only just becoming decent at that time, and undirected would still generally generate total slop.
Does âwriting code by handâ mean youâre not going to use compilers to generate assembly?
Now I do feel lucky that I started learning coding about four years before the LLM revolution, but these things are really just natural language compilers, arenât they? Weâre just in that period - the 1980s, the greybeards tell me - where companies charged thousands of dollars per compiler instance, right? And now, I myself have never paid for a compiler.
This whole investor bubble will blow up in the face of the rentier-finance capitalists and Iâll be laughing my head off while it happens.
If you're coding by hand, then you're that carpenter before IKEA came along. Now the market wants bland machine-built functional furniture that gets replaced every five to ten years. If every tenth piece is broken or slightly off, doesn't matter, mass production has lowered the price that a replacement is available for free and you're still making a profit.
Time to become a "product engineer" and watch the hyper-agile agents putting up digital post-it notes on digital pin-boards discussing how much each post-it is worth in digital scrum meetings. Meanwhile the agents keep wasting more and more time so that their owners make less and less of a lose, until eventually a profit is made.
Until the costs become prohibitive and humans become cheaper than the agents that replaced them. Once the agents are replaced by the humans, the next hype bubble awaits around the bend.
Yep. The only people I've heard saying that generated code is fine are those who don't read it.
The problem is that the mitigations offered in the article also don't work for long. When designing a system or a component we have ideas that form invariants. Sometimes the invariant is big, like a certain grand architecture, and sometimes itâs small, like the selection of a data structure. You can tell the agent what the constraints are with something like "Views do NOT access other views' state" as the post does.
Except, eventually, you'll want to add a feature that clashes with that invariant. At that point there are usually three choices:
- Donât add the feature. The invariant is a useful simplifying principle and itâs more important than the feature; it will pay dividends in other ways.
- Add the feature inelegantly or inefficiently on top of the invariant. Hey, not every feature has to be elegant or efficient.
- Go back and change the invariant. Youâve just learnt something new that you hadnât considered and puts things in a new light, and it turns out thereâs a better approach.
Often, only one of these is right. Often, one of these is very, very wrong, and with bad consequences.
Picking among them isnât a matter of context. Itâs a matter of judgment, and the models - not the harnesses - get this judgment wrong far too often. I would say no better than random chance.
Even if you have an architecture in mind, and even if the agent follows it, sooner or later it will need to be reconsidered, and if you don't read what the agent does very carefully - more carefully than human-written code - you will end up with the same "code that devours itself".
> Picking among them isnât a matter of context. Itâs a matter of judgment, and the models - not the harnesses - get this judgment wrong far too often. I would say no better than random chance.
Yeah Iâm currently working for several months already on a harness that wraps Claude Code and Codex etc to ensure that these types of invariants are captured and enforced (after the first few harness attempts failed), and - while itâs possible - slows down the workflow significantly and burns a lot more tokens. In addition to requiring more human involvement, of course.
I suspect this is the right direction, though, as the alternatives inevitably lead any software project to delve into a spaghetti mess maintenance nightmare.
I agree with this. I've been writing a new internal framework at work and migrating consumers of the old framework to the new one.
I had strong principles at the outset of the project and migrated a few consumers by hand, which gave me confidence that it would work. The overall migration is large and expensive enough that it has been deferred for nearly a decade. Bringing down the cost of that migration made me turn to AI to accelerate it.
I found that it was OK at the more mechanical and straightforward cases, which are 80% of the use cases, to be fair. The remaining 20% need changes to the framework. Most of them need very small changes, such as an extra field in an API, but one or two require a partial conceptual redesign.
To over simplify the problem, the backend for one system can generate certain data in 99% of cases. In a few critical cases, it logically cannot, and that data must be reported to it. Some important optimizations were made with the assumption that this would be impossible.
The AI tooling didn't (yet) detect this scenario and happily added migration logic assuming it would work properly.
Now, because of how this is being rolled out, this wasn't a production bug or anything (yet). However, asking the right questions to partner teams revealed it and unearthed that some others were going to need it as well.
Ultimately, it isn't a big problem to solve in a way that will mostly satisfy everyone, but it would have been a big problem without a human deeper in the weeds.
Over time, this may change. Validation tooling I built may make a future migration of this kind easier to vibe code even if AI functionality doesn't continue to improve. Smarter models with more context will eventually learn these problems in more and more cases.
The code it generates still oscilates between beautiful and broken (or both!) so for now my artistic sensibilities make me keep a close eye on it. I think of the depressed robot from the Hitchhiker's Guide to the Galaxy as the intelligence behind it. Maybe one day it'll be trustworthy
âThe only people I've heard saying that generated code is fine are those who don't read it.â Are you sure these people arenât busy working rather than chatting? (haha)
But in all seriousness it depends on what youâre doing with it. Writing a quick tool using an LLM is much easier than context changing to write it yourself. If you need the tool, thatâs very valuable.
Also as a webdev, it writes basic CRUD pretty good. I am tired of having to build forms myself and the LLMs are usually really good at that.
Been building a new app with lots of policies and whatnot and instructing a LLM is just much faster than doing the same repetitive shit over and over myself.
And the solution is the same, as when it was outsourced- and the "patch" was fix it by writing spec. Thus i conclude my TED talk with the statement: LLMs are the new outsourcing and run into the same problems.
Don't outsource either then
How about we outsource it to pakistan and they use LLMs. That way, we do what the LLM people do - many agents and stacked on top
Thatâs the same story I had.
The swindle goes like this, AI on a good codebase can build a lot of features, you think itâs faster it even seems safer and more accurate on times, especially in domains you donât know everything about.
This goes in for a while whilst the codebase gets bigger and exploration takes longer and failure rate increases. You donât want it to be true and try harder so you only stop after it practically became impossible to make any changes.
You look at the code again and there is so much code spaghetti is an understatement itâs the Chinese wall.
You start workingâŚ, and you realize what was going on
I deleted 75,000 of 140,000 lines of code and I honestly feel like the 3 months I went hard into agentic coding I wasted and I failed my users by building useless features increasing bugs, losing the mental model of my code and not finding the problems I didnât know about the kind of hard decisions you only see when you in the code, the stuff that wanders in your mind for days
I find it interesting that this outcome is a surprise. I don't want this to sound smug, I'm genuinely curious what the initial expectations are and where they come from.
They seem to be different for LLMs, because would anyone be surprised if they handed summary feature descriptions to some random "developer" you've ever only met online, and got back an absolute dung pile of half-broken implementation?
For some reason, people seem to expect miracles from some machine that they would not expect of other humans, especially not ones with a proven penchant for rambling hallucinations every once in a while.
I'd like to know, ideally from people who've been there, why they think that is. Where does the trust come from?
I've never relied on an LLM to build a large section of code but I can see why people might think it is worth a try. It is incredible for finding issues in the code that I write, arguably its best use-case. When I let it write a function on its own, it is often perfect and maybe even more concise and idiomatic than I would have been able to produce. It is natural to extrapolate and believe that whatever intelligence drives those results would also be able to handle much more.
It is surprising how bad it is at taking the lead given how effective it is with a much more limited prompt, particularly if you buy in to all the hype that it can take the place of human intelligence. It is capable of applying a incredible amount of knowledge while having virtually no real understanding of the problem.
I think this is true, but i imagine there's a workflow solution to this which isnt to drop AI.
Eg., treating AI code generated as immediately legacy, with tight encapsulation boundaries, well-defined interfaces etc. And integrating in a more manual workflow.
There's a range from single-shot prompts to inline code generation, that will make more sense depending on the problem and where in the code base it is.
Single-shot stuff is going to make more sense for a protyping phase with extensive spec iteration. Once that prototype is in place, you then prob want to drop down into per-module/per-file generation, and be more systematic -- always maintaining a reasonably good mental model at this layer.
That workflow just sounds exhausting to me. Would I always need to consider how much of a blast radius my AI-generated code might have? Sounds like thereâs so much extra management going into these micro decisions that it ultimately defeats the purpose of generating code altogether.
I could see value in using it during the prototyping phase, but wouldnât like to work like you described for a serious project for end users.
I just don't like to type code anymore. If I can accomplish the same by describing the code, and get the same results as if I typed it myself, I'll opt for not typing so damn much. I've done so much typing in my career, that typing ~80% less to get the same results, makes a pretty big difference in how likely I am to set out to accomplish something.
I care more about code quality now, because typing no longer limits if I feel like it's worth to refactor something or not.
And you have discovered the job of managers! There has always been a lot of hate for managers. Wonder if the robots hate us just as much? (I often feel a weird guilt when I tell an agent to do something I know I am going to throw away but will serve as an interesting exploration...I know if I did that to a human they would be pissed...)
> treating AI code generated as immediately legacy, with tight encapsulation boundaries, well-defined interfaces etc.
This is good advice regardless whether you're using AI or not, yet in real life "let's have well-defined boundaries and interfaces" always loses against "let's keep having meetings for years and then ducttape whatever works once the situation gets urgent".
Were you auto-committing everything without reading the generated code? and if you read it but didn't understand it why not just ask for detailed comments for each output? Knowing that a larger codebase causes it to struggle means the output needs to be increasingly scrutinized as it becomes more complex.
I donât think itâs about what the code does. I think itâs more about how the code fits in its whole context. How useful it is in solving the overarching problem (of the whole software). How well does it follow the paradigm of the platform and the codebase.
You can have very good diffs and then found that the whole codebase is a collection of slightly disjointed parts.
I haven't had the chance to work on large codebases, but isn't it possible to somehow adapt the workflow of Working Effectively with Legacy Code, building islands of higher quality code, using the AI to help reconstruct developer intention and business rules and building seams and unit tests for the target modules?
AI doesn't necessarily have to increase your throughput, it can also serve as a flexible exploration and refactoring tool that will support either later hand crafter code or agentic implementation.
I donât think you truly captured the worst part:
There comes a realization, to many engineerâs horror, that AI wonât be able to save them and they will have to manually comprehend and possibly write a ton of code by hand to fix major issues, all while upper management is breathing down their back furious as to why the product has become a piece of shit and customers are leaving to competitors.
The engineers who sink further into denial thrash around with AI, hoping they are a few prompts or orchestrations away from everything be fixed again.
But the solution doesnât come. They realize there is nothing they can do. Itâs over.
I've set a few rules for working with coding agents:
1. If I use a coding agent to generate code, it should be something I am absolutely confident I can code correctly myself given the time (gun to my head test).
2. If it isn't, I can't move on until I completely understand what it is that has been generated, such that I would be able to recreate it myself.
3. I can create debt (I believe this is being called Cognitive Debt) by breaking rule 2, but it must be paid in full for me to declare a project complete.
Accumulating debt increases the chances that code I generate afterwards is of lower quality, and it also feels like the debt is compounding.
I'm also not really sure how these rules scale to serious projects. So far I've only been applying these to my personal projects. It's been a real joy to use agents this way though. I've been learning a lot, and I end up with a codebase that I understand to a comfortable level.
While this is a legitimate set of rules to follow for maintaining code sanity and a solid mental model of how a codebase may grow, itâs always challenging to stick to them in a workplace where expectations around delivery speed have changed drastically with the onset of AI. The sweet spot lies in striking a balance between staying connected to the codebase and not becoming a limiting factor for the team at the same time.
That's kind of what I figured, sadly. I haven't experienced it personally yet since I got let go from my last job about 14 months ago, but it makes so much sense given how management is so willing to sacrifice quality for speed.
Another frustrating thing that has emerged from this is where managers âvibe codeâ half-baked ideas for a couple of hours and then hand it off as if theyâve meaningfully contributed to the implementation. Suddenly youâre expected to reverse engineer incoherent prompts, inconsistent code, and random abstractions that nobody fully understands.
In their mind theyâve already done the âarchitectural heavy liftingâ and accelerated the team. More often than not it just adds cognitive overhead where you spend more time deciphering and cleaning up garbage than actually building the thing properly from scratch.
I am lucky to have never worked in a team where my manager wouldn't expect strong push back in this scenario. Many of the corporate environments described on here seem dystopian, this included.
Vouching for this comment because my friend confided in me a week ago that her manager also does this and is like âoh yeah, hereâs 80% done, you just do the rest so we can ship itâ when a large part of it is slop that needs to be rewritten, due to not enough guidance and pushback during generation.
Thatâs when you ask it to write tests to a good coverage, and then have it reimplement everything with the tests still passingâŚ
Unless the tests are written against logic that is in of itself subtly wrong and even the structure of code and what methods there are is wrong - so even your unit tests would have to be rewritten because the units are structured badly.
Itâs a valid direction to look in, it just doesnât address the root issue of throwing slop across the wall and also having unrealistic expectations due to not knowing any better.
Yep. Itâs very healthy to be suspicious of code. Any code. Whether generated or not. Thatâs where the bugs are.
If thereâs one thing thatâs disturbing with AI proponent is how trusting they are of code. One change in the business domain and most of the code may have turn from useful to actively harmful. Which you have to rewrite. Good luck doing that well if youâre not really familiar with the code.
This is fine if itâs more enjoyable for you, thatâs whatâs important in personal projects most of the time.
But we donât follow the same things for dependencies, work of colleagues, external services, all the layers down to the silicon when trying to work.
Why is AI suddenly different?
We just have to do this by risk and reward. Whatâs the downside if itâs wrong, and how likely is an error to be found in testing and review? What is the benefit gained if itâs all fine? This is the same for libraries and external services.
A complex financial set of rules in a non-updatable crypto contract with no testing?
A viewer for your internal log data to visualise something?
It is and has always been immensely helpful to understand what you are doing in any context.
There are some programmers who treat the job as just plumbing together what is to them completely incomprehensible black boxes, who treat the computer as a mystery machine that just does things "somehow", but these programmers will almost always be hacks that spend their entire career producing mediocre code.
There are things such a programmer can build, but they are very limited by their lack of in depth understanding, and it is only a tiny fraction of what a more competent programmer can put together.
To get beyond being a hack, you need to understand the entire stack, including the code that you didn't write, including both libraries, frameworks and the OS, and including the hardware, the networking layers, and so forth. You don't have to be an expert at these things by any means, but you do need to understand them and be comfortable treating them as transparent boxes that you may have to go in and fiddle with at some point to get where you need to go. Sometimes you need to vendor a dependency and change it. Sometimes you need to drop it entirely and replace it with something more fit for purpose you built yourself.
AI is different because it's a tool, and the user of the tool is responsible for the work performed.
An outsourced developer isn't a "tool". They're a human being, and responsible for their actions. They're being paid, and they either act responsibly or they get replaced.
A vibe coder is a human using a tool. The human is responsible for code quality, and if it's not good enough, they need to keep using the tool to make it better. That means understanding the tool's output.
If an artist used Photoshop to create a billboard ad that was ugly, they don't get to blame Photoshop. They have to keep using the tool until their output is good.
> An outsourced developer isn't a "tool".
I'd think that depends on the model of responsibility at play.
For example, suppose I hire a building contractor to build a house, and the electrician he subcontracts makes mistake.
From my perspective, the prime contractor is equally responsible for that mistake regardless of whether he used a subcontractor, or did the work himself but used a broken tool.
This doesn't make the electrician any less of a "person" in the deeply important ways, but it's not a distinction that's relevant to my handling of the problem.
I was trying to follow similar rules, until one day I had to solve a hard mathematical problem. Claude is a phd level mathematician, I am not. I, however, know exactly the properties of the desired solution and how to test itâs correct. So I decided to keep Claudeâs solution over my basic, naive one. I mentioned that in the pull request and everyone agreed that was the right call. Would you open exceptions like that in your rules? What if AI becomes so much better at coding than you , not just at doing advanced mathematics? Would you then stop to write code by hand completely since that would be the less optimal option, despite you losing your ability to judge the code directly at that point (and as in my example, you can still judge tests, hopefully)? I think these are the more interesting questions right now.
> Claude is a phd level mathematician
Unfortunately, it is not, and many of its attempts at mathematical proofs have major flaws. You shouldn't trust its proofs unless you are already able to evaluate them--which I think is pretty much all the OP is saying.
To be fair, many of the proof attempts that mathematicians do also have major flaws. Most get caught before getting published.
Trust isnât a binary, and I can trust things I donât understand enough that I can use them. OP was talking about needing to understand, which is quite a bit above the level of being able to validate enough to use for a task.
I definitely wouldn't put math in my code I didn't understand just because Claude says so. I am not astonished that everyone agreed, that's why shit is going to hit the fan pretty badly pretty soon due to AI coding.
There is one exception to this: If the AI also delivers the proof of why the math is correct, in a machine-checked format, and I understand the correctness theorem (not necessarily its proof). Then I would use it without hesitation.
I always found it weird when helping people with excel formulas how few people even try to check maths they don't understand, let alone try to understand it.
I struggle to remember even relatively simple maths like working out "what percentage of X is Y" so if I write a formula like that I'll put in some simple values like 12 and 6 or 10,000 and 2,456 just to confirm I haven't got the values backwards or something. I've been shown sheets where someone put a formula in that they don't understand, checked it with numbers they can't easily eyeball and just assumed it was right as it's roughly in their ball park / they had no idea what the end result should be.
Then again I've also seen sheets where a 10% discount column always had a larger number than the standard price so even obviously wrong things aren't always checked.
I don't disagree, but whoever never put math they don't fully understand in their code gets to throw the first stone.
I've reached solutions by trial and error too, and tried to rationalize them later, quite a few times. And it's easier to rationalize a working solution, however adversarial you claim to be in your rationalization.
I don't see using gen AI for the (not so) âbrute forceâ exploration of the solution space as that different from trial and error and post fact rationalization.
How did you test that the solution is correct? Is the set of possible inputs a low-ish finite number?
Normally with mathematical problems you have to prove the solution correct. Testing is not sufficient, unless you can test all possible inputs exhaustively.
How do you know what it spat out is correct though?
If itâs beyond our ability to review and we blindly trust itâs correct based on a limited set of tests⌠weâre asking for trouble.
> Claude is a phd level mathematician , I am not
Iâm going to guess that this is Gell-Mann amnesia more than anything, and itâs going to get a lot of organizations into a lot of weird places.
> Claude is a phd level mathematician
... that can't even count.
You do realize you can ask Claude about the things you don't understand?
"PhD level" just means you finished a bachelor and masters degree and are now doing a bit of original research as an employed research assistant.
Claude isn't "PhD level" anything. This shows a complete lack of understanding here. Claude has read every single text book in existence, so it can surface knowledge locked away in book chapters that people haven't read in years (nobody really reads those dense books on niche topics from start to finish).
Since Claude has infinite patience, you can just keep asking until you get it.
Iâve also heard it being called âcomprehension debt,â which I like a little more because I think itâs more precise: the specific debt being accrued is exactly a lack of comprehension of the code.
I think itâs both in fact.
Comprehension debt just sounds like there are things you donât (yet) understand.
Cognition debt means your lack of understanding compounds and the cognition âspaceâ required to clear it increases accordingly.
An increasing comprehension debt that can be paid off one bit at a time within reasonable cognition space takes linear time to clear.
Cognition debt takes exponential time to clear the more of it you have. If it reaches a point where you simply donât have the space for the cognition overhead required to understand the problem, you probably need to start over from your specifications.
I like that too. However, âcognitive debtâ points to the possibility of cognitive overload, that the code can become so complex and inscrutable that it may become impossible to comprehend. âComprehension debtâ sounds a bit weaker in that respect, that itâs just a matter of catching up with oneâs comprehension.
Yeah I like that better too, gonna start using that
âYou can outsource your thinking, but not your understanding.â
This is great until the "gun to your head" is your skip-level manager demanding that a feature be implemented by the end of the week, and they know you can just "generate it with AI" so that timeline is actually realistic now whereas two years ago it would have required careful planning, testing, and execution.
Well, that's nice.
Your manager is unknowingly helping you create a form of job security for yourself, with all the technical debt and bugs being accumulated.
He might not understand it, and it might not be the type of work you want to do, but someone is going to have to fix those issues. And the longer they wait, the bigger the task gets.
That isn't new, though. Managers often pushed unrealistic timelines and showed lack of care about tech debt well before vibe coding, just the timelines where different, and the magnitude will be bigger this time. But we also have LLMs to help it clean it up faster, I guess.
The question is, is it a job you actually still want once the poo pile reaches critical mass you are the only one with a shovel and the deadline is "yesterday"
That is absolutely true. Unfortunately, this ship has sailed and we are not closing Pandora's box anymore. We'll have to adapt.
But we still hold good cards in hand.
Do they want their pile of steaming slop fixed, or not? Because no amount of complaints about the deadline being "yesterday" are going to change anything about the fact that time will be needed to fix the accrued technical debt, whether they like it or not.. And if AI dug you in that deep to start with, the solution is not to dig deeper.
I suspect some companies are going to find that out the hard (costly) way.
If the manager is unreasonable, you were always going to have a problem with them, eventually. Nothing you can do with fix this.
If manage is reasonable, you can explain to them that there isn't time to check the work of the AI, and that it frequently makes obscure mistakes that need to be properly checked, and that takes time.
At this point, if they still insist you just give it the AI's work, they've made a decision that is their fault. You've done what you can.
And when the shit hits the fan, we're back to whether they're reasonable or not. If they are, you explained what could happen and it did. If they force responsibility on you, they aren't reasonable and were never going to listen to you. That time bomb was always going to go off.
I hate this current trend of managers deciding, what tools developers have to use. Hopefully it ends soon.
Time will tell if outages and defect resolution sky rocket or if ai can deal with it
Does that matter that much in practice? I bet lots of costumers are okay with software that crashes 10x as much if it costs 10x less. There already is a ton of shitty software that still sells.
I already followed those rules mostly with StackOverflow and before AI.
I agree to this though it also depends on the nature of project.
Had a project idea which I coded with the help of AI and it became quite large to a point I was starting to have uncharted areas in the code. Mostly because I reviewed it too shallow or moved fast.
It was a good thing as that project never floated but if I were to do such a thing on my breadwinning project I would lose the joy.
I just had a Claude episode. Instead of trying to fix the bug, it edited the data to hide the bug in the sample run. This kind of BS behavior is not rare. Absolutely, if you do not understand every bit of what's going on, you end up with a pile of BS.
Thatâs why I love gemini - none of this bullshit ever happen.
I do not think Gemini can relieve a developer from knowing what he is doing.
There's some really weird and unusual posts glazing Google in here today. Bot accounts out in force!
It's not worth fighting it at work. If the idiots you work for want everything vibe coded and delivered at 5 * 2025 speed then just vibe code and try to leave the company ASAP. That's where I am right now. Of course I might end up somewhere just as ridiculous or maybe not be able to even find another job. Shitty times we live in right now.
This is about how I use it. I initially use it to carve out an architecture and iterate through various options. That saves a lot of time for me having to iterate through different language features and approaches. Once I get that, I have it scaffold out, and I go in and tidy things up to my personal liking and standards. From there, I start iterating through implementations. I generally have been implementing stuff myself, but I've gotten better at scaffolding out functions/methods through code instead of text. Then I ask it to finish things off. That falls into your first category of letting it implement stuff that I already know I could do. Not sure if it's faster. But it's lower cognitive load for me, since I can start thinking about the next steps without being concerned about straightforward code.
This all works pretty great. Where it starts going off the rails is if I let it use a library I'm not >=90% comfortable with. That's a good use of these tools, but if I let it plow through feature requests, I end up accumulating debt, as you pointed out.
For my uses, I'm still finding the right balance. I'm not terribly sure it makes me faster. What I do think it helps with is longer focused sections because my cognitive load is being reduced. So I can get more done but not necessarily faster in the traditional sense. It's more that I can keep up momentum easier, which does deliver more over time.
I'm interested in multi agent systems, but I'm still not sure of the right orchestration pattern. These AI tools still can go off the rails real quick.
I always find these kinds of posts interesting, to compare the velocity that people seem to get with Ai, vs what I get by just coding by hand
Coincidentally I've been working on a project for about 7 months now: its a 3d MMO. Currently its playable, and people are having fun with it - it has decent (but needs work) graphics, and you can cram a few hundred people into the server easily currently. The architecture is pretty nice, and its easy to extend and add features onto. Overall, I'm very happy with the progress, and its on track to launch after probably a years worth of development
In 7 months vibe coding, OP failed to produce a basic TUI. Maybe the feature velocity feels high, but this seems unbelievably slow for building a basic piece of UI like this - this is the kind of thing you could knock out in a few weeks by hand. There are tonnes of TUI libraries that are high quality at this point, and all you need to do is populate some tables with whatever data you're looking for. Its surprising that its taking so long
There seems to be a strong bias where using AI feels like you're making a lot of progress very quickly, but compared to manual coding it often seems to be significantly slower in practice. This seems to be backed up by the available productivity data, where AI users feel faster but produce less
This struck me as odd too. 7 months? It wouldnât take that long to write it in a new language.
Another thing I donât see mentioned is code quality.
Vibe-coded code bases are an excellent example of why LLMs arenât very good at writing code. It will often correct its own mistakes only to make them again immediately after and Inconsistent pattern use.
Recently Claude has been making some âinterestingâ code style choices, not inline with the code base itâs currently supposed to be working on.
> There seems to be a strong bias where using AI feels like you're making a lot of progress very quickly, but compared to manual coding it often seems to be significantly slower in practice.
This metric highly depends on who uses the AI to do what, where strong emphasis is on "who" and "what".
In my line of work (software developer) the biggest time sinks are meetings where people need to align proposed solutions with the expectations of stakeholders. From that aspect AI won't help much, or at all, so measuring the difference of man hours spent from solution proposal to when it ends up in the test loops with and without AI would yield... very disappointing results.
But for troubleshooting and fixing bugs, or actually implementing solutions once they have been approved? For me, I'm at least 10x'ing myself compared to before I was using AI. Not only in pure time, but also in my ability to reason around observed behaviors and investigating what those observations mean when troubleshooting.
But I also work with people who simply cannot make the AI produce valuable (correct) results. I think if you know exactly what you want and how you want it, AI is a great help. You just tell it to do what you would have done anyway, and it does it quicker than you could. But if you don't know exactly what you want, AI will be outright harmful to your progress.
This was made in two days of vibe coding. It has flaws, but it's impressive as hell:
https://tinyskies.vercel.app/
This ran at ~1fps for me
Seems to be baked into the GPT- produce text- aka to produce language and code is life and purpose. So the whole system is inherently internally biased towards "roll your own everything?" unless spoke too in a "Senior-dev" language, that prevents these repetitions.
It's more complex than that, I think the reality is that there's a lot of code that's just not that deep bro. I have some purely personal projects that have components that I don't understand anymore, I wrote that shit by hand, they still work but I haven't touched that shit in years. There's a lot of code that AI can write that's like that that helps me, the stuff I would forget about even if I wrote it by hand. I think you have to have discipline in it's use, it's a tool like any other.
AI, and especially agentic AI can make you lose situational awareness over a codebase and when you're doing deep work that SUUUUCKS, but it's not useless, you just have to play to it's strengths. Though my favorite hill to die on is telling people not to underestimate it's value as autocomplete. Turns out 40 gigabytes of autocomplete makes for a fucking amazing autocomplete. Try it with llama.vim + qwen coder 30b, it feels like the editor is reading your mind sometimes and the latency is so low.
> The other change is simpler: I'm doing the design work myself, by hand, before any code gets written. Not a vague doc. Concrete interfaces, message types, ownership rules.
Thatâs the hard part of coding. If you have an architecture then writing the code is dead simple. If you arenât writing the code you arenât going to notice when you architected an API that allows nulls but then your database doesnât. Or that it does allow that but you realize some other small issue you never accounted for.
I do not know how you can write this article and not realize the problem is the AI. Not that you let it architect, but that you werenât paying attention to every single thing it does. Itâs a glorified code generator. You need to be checking every thing it does.
The hard part of software engineering was never writing code. Junior devs know how to write code. The hard part is everything else.
Yes, I think there's 2 kinds of developer. Those who think the code is the hard part, and those that don't.
The developers that thing coding is hard are the ones that absolutely love AI coding. It's changed their world because things they used to find hard are now easy.
Those that think coding is easy don't have such an easy time because coding to them is all about the abstractions, the maintainability and extensibility. They want to lay sensible foundations to allow the software to scale. This is the hard part. When you discover the right abstractions everything becomes relatively easy. But getting there is the hard part. These people find AI coding a useful tool but not the crazy amazing magical tool the people who struggle with coding do.
The OP is definitely in the second camp since they could spot and realise the shortcomings of the AI. They spotted the problem, and that problem is that the AI can't do the hard bit.
I'd say there's another camp: the camp of people who know that code isn't the hard part, but that it's still time consuming to write code. AI coding is pretty useful for that, when you can nail the design but you just need a set of hands to implement it.
I'm classing that as the second camp. Because you don't find it hard to do, it's just time consuming. It means you still know what you're doing and you're just using AI as a tool to accelerate your delivery. That's the optimal way to use in my experience if you want to actually deliver well architected software.
But isnât AI doing the same thing to project management as to coding?
PMs can now cross reference and organize tickets with just a few keystrokes. Organisational knowledge, business knowledge, design systems and patterns, etc all of it is encoded in LLM consumable artefacts. For PMs it is the same switch - instead of having to do it by hand you direct lower level employees to handle the details and inconsistencies and you just do vibe and vision.
When all of the pieces successfully connect and execute reliably, what is left for humans to do? Just direct and consume?
And AI companies with their huge swaths of data are soon gonna be in the situation of being able to do the directing themselves
Such a person is just pushing a giant pile of cleanup work onto their colleagues. Unless they actually checked, the "cross references" are probably wrong in places or just entirely made up. Lower level employees by definition don't have the experience to correct the more subtle inconsistencies, so you've basically just constructed a high pass filter that lets only the worst failures through. Moreover, you're absolutely guaranteed to lose the respect of those lower level employees--forcing someone else to clean up your sloppy work is just cruel, and people resent being treated cruelly.
this pretty much sums up what i feel about AI currently. It made my life significantly easier for most tasks I already breeze through, yet tasks I used to struggle with are still the equally difficult
I agree with what you're saying, but I think we do have a problem right now with definitions where there's a lot of people basically getting supercharged tab completions or running a chatbot or two in a parallel pane, but still clearly reviewing everything; and on the other side of things is freaking Steve Yegge pitching a whole new editor that lets you orchestrate a dozen or more agents all vibing away on code you're apparently never going to read more than a line or two of: https://steve-yegge.medium.com/welcome-to-gas-town-4f25ee16d...
The first group are still thinking fairly deeply about design and interfaces and data structures, and are doing fairly heavy review in those areas. The second group are not, and those are the ones that I find a bit more worrisome.
> The first group are still thinking fairly deeply about design and interfaces and data structures, and are doing fairly heavy review in those areas.
I can't speak for others, but I'd go further and say that LLMs allow me to go deeper on the design side. I can survey alternative data structures, brainstorm conversationally, play design golf, work out a consistent domain taxonomy and from there function, data structure and field names, draft and redraft code, and then rewrite or edit the code myself when the AI cost/benefit trade off breaks down.
Thatâs a little bit of a No True Scotsman. Yes there are people who do not review anything; but even people who are reviewing every line from an LLM do not have the same understanding as someone who wrote it themselves.
Iâm not making a judgement call about which is better, but it was widely accepted in tech before the advent of LLMs that you just fundamentally lack a sense of understanding as a reviewer vs an author. It was a meme that engineers would rather just rewrite a complicated feature than fix a bug, because understanding someone elseâs code was too much effort.
> and on the other side of things is freaking Steve Yegge pitching a whole new editor that lets you orchestrate a dozen or more agents all vibing away on code you're apparently never going to read more than a line or two of
I find it useful to not listen to people who just talk.
That blog post is surreal. It's like cryptocurrencies and the whole web3 nonsense. Cryptocurrencies basically don't work, so there have been a hundred aimless attempts at fixing self inflicted problems caused by deficiencies of cryptocurrencies with no actual goal that has any impact on the real world.
It's the same thing here. AI has dropped the cost of software development, so developers are now fooling themselves into producing low or zero value software. Since the value of the software is zero or near zero, it doesn't really matter whether you get it right or not. This freedom from external constraints lets you crank up development velocity, which makes you feel super productive, while effectively accomplishing less than if you had to actually pay a meaningful cost to develop something.
Like, what is the purpose of Gas Town? It looks to me like the purpose of Gas Town is to build Gas Town.
> The first group are still thinking fairly deeply about design and interfaces and data structures, and are doing fairly heavy review in those areas
I worry about the first group too, because interfaces and data structures are the map, not the territory. When you create a glossary, it is to compose a message, that transmit a specific idea. I find invariably that people that focus on code that much often forgot the main purpose of the program in favor of small features (the ticket). And that has accelerated with LLM tooling.
I believe most of us that are not so keen on AI tooling are always thinking about the program first, then the various parts, then the code. If you focus on a specific part, you make sure that you have well defined contracts to the orther parts that guarantees the correctness of the whole. If you need to change the contract, you change it with regard to the whole thing, not the specific part.
The issue with most LLM tools is that theyâre linear. They can follow patterns well, and agents can have feedback loop that correct it. But contracts are multi dimensional forces that shapes a solution. That solution appears more like a collapsing wave function than a linear prediction.
Iâve noticed that agents almost always fail at the planing vs execution stage.
I follow the plan -> red/green/refactor approach and it is surprisingly good, and the plans it produces all look super well reasoned and grounded, because the agent will slurp all the docs and forums with discussions and the like.
Trouble is once it starts working there would inevitably be a point where the docs and the implementation actually differ - either some combination of tools that have not been used in that way, some outdated docs, or just plain old bugs.
But if the goals of the project/feature are stated clearly enough it is quite capable of iterating itself out of an architectural dead end, that is if it can run and test itself locally.
It goes as deep as inspecting the code of dependencies and libraries and suggesting upstream fixes etc. all things that I would personally do in a deep debugging session.
And Iâm supper happy with that approach as Iâm more directing and supervising rather than doing the drudgery of it.
Trouble is a lot of my team mates _dont_ actually go this deep when addressing architectural problems, their usual mode of operandi is âescalate to the architectâ.
This will not end up good for them in the long run I feel, but not sure what they can do themselves - the window of being able to run and understand everything seems to be rapidly closing.
Maybe thatâs not super bad - I donât exactly what the compiler is doing to translate things to machine code, and I definitely donât get how the assembly itself is executed to produce the results I want at scale - that is level of magic and wizardry I can only admire (look ahead branching strategies and caching on modern cpus is super impressive - like how is all of this even producing correct responses reliable at such a a scale âŚ)
Anyway - maybe all of this is ok - we will build new tools and frameworks to deal with all of this, human ingenuity and desire for improvement, measured in likes, references or money will still be there.
This is the only way for me to use Agents without completely hating and failing at it. Think about the problem, design structures and APIs and only then let AI implement it.
This is also why I don't see it getting better anytime soon. So many people ask me "how do you get your claude to have such good output?" and the answer is always "I paid attention and spotted problems and asked claude to fix them." And it's literally that simple but I can see their eyes already glazing over.
Just as google made finding information easier, it didn't fix the human element of deciphering quality information from poor information.
How do you know what good output should look like with little code experience?
Looking at code looking for errors is a hard thing to do well for a large amount of code. A better approach is to ensure tests cover all the important cases and many edge cases. Looking at the code may still be a good idea but mostly to check the design. I think that once you get Claude to test the code it writes well, trying to find errors in the code is a waste of time. Iâve made the mistake of thinking Claude was wrong many times despite the tests passing just to be humbled by breaking the tests with my âimprovementsâ!
And when you got familiar with the other parts, you realize that writing code is the most enjoyable one. More often than not, youâre either balancing trade offs or researching what factors yoy have missed with the previous balancing. When you get to writing code, itâs with a sigh of relief, as that means you understand the problem enough to try a possible solution.
You can skip that and go directly to writing code. But that meant you replaced a few hours of planning with a few weeks of coding.
Very well said. When I'm working on a hard problem I'll often spend a few weeks sweating details like algorithms, API shapes, wire formats, database schemas, etc. These things are all really easy to change while they're just in a design document. Once you start implementing, big sweeping edits get a lot more difficult. So better to frontload as much of that as possible in the design phase. AI coding agents don't change this dynamic. However all that frontloaded work pays off big when it does come time to implement, because the search space has been narrowed considerably.
Title says
> back to writing code by hand
But what they are doing is
> doing the __design work__ myself, by hand, before any code gets written.
So... Claude still is generating the code I guess?
And seriously, I can't understand that they thought their vibe coded project works fine and even bought a domain for the project without ever looking at source code it generated, FOR 7 MONTHS??
In short, it is simply a click-bait title.
And the goal of the article is to draw attention to their project.
> And the goal of the article is to draw attention to their project.
Additionally, they couldn't even bother to write their own blog post, so it's a little hard to take them seriously when they say they're going to write their own code...
It's the same thing every time.
> Claude (c) by Anthropic (R) is the best thing since sliced bread and I'm Lovin' It(tm)! Here's a breakdown of you too can live a code free life for 10 easy payments of $99.99 a month if you subscribe now!
> Step one in your journey to code free life: code the whole damn project and put it together yourself
It's so much fluff and baloney and every single article is identical. And every single one is just over the top praise of Claude that doesn't come off as remotely authentic. There's always mentions of Claude "one shotting"(tm) something.
I bought domains for projects minutes after the idea.
I donât think itâs that weird to not look at the code if itâs a side project and you follow along incrementally via diffs. Itâs definitely a different way of working but itâs not that crazy.
> I donât think itâs that weird to not look at the code if itâs a side project and you follow along incrementally via diffs.
Its not weird to not look at the code, as long as you're looking at the code? (diffs?)
Uh, ok
The article explicitly says that the author looked at the diffs; it distinguishes this from "sitting down and actually reading the code", which they didn't do. So when plastic041 says the author spent 7 months vibe coding "without ever looking at source code", it's not unreasonable for dewey to assume that "looking at source code", in this context, actually means something stronger and excludes just looking at the diffs.
I feel like Iâm watching developers speed run project and product management learnings.
Weâve moved to seeing that specs are useful and that having someone write lots of wrong code doesnât make the project move faster (lots of times devs get annoyed at meetings and discussions because it hinders the code writing, but often those are there to stop everyone writing more of the wrong thing)
Weâve seen people find out that task management is useful.
Now more Iâm seeing talk of fully doing the design work upfront. And we head towards waterfall style dev.
Then weâll see someone start naming the process of prototyping, then Iâm sure something about incremental features where you have to ma age old vs new requirements. Then talk of how really the customer needs to be involved more.
Genuinely, look at what projects and product managers do. They have been guiding projects where the product is code yet they are not expected to read the code and are required to use only natural language to achieve this.
So right. All these guys have never been managers. Do you think humans don't write things that break? Or that teams sometimes take a wrong path and burn a week of work? Or months? Well now you can experience all of that in 30 minutes of vibecoding. As a former tech product manager, it feels EXACTLY the same.
Except it isn't the same because the cost is different, which allows discovery that we couldn't afford previously.
Yes x 1000. I find it amazing.
I ran quite early into the same issues with my rust pet projects. Single structs with tons of Option<T> and validation methods etc. enums for type fields combined with says optional fields in the same layer so accessor methods all return Option<T>.
I add now a long list of instructions how to work with the type system and some doâs and donâts. I donât see myself as a vibe coder. I actually read the damn code and instruct the ai to get to my level of taste.
Would you be interested in sharing your findings? I'm currently experimenting with LLM-generated rust and honestly think it works quite well, however I'm looking for ways to improve the "taste" of the agent.
So you're not actually writing code by hand? I'm very confused by the difference between the title and the conclusion here.
The point was to come up with a sensationalistic headline that HN eats up and post flies to the front page.
I wonder whether the title was generated/suggested by an AI?
> Vibe-coding makes you feel like you have infinite implementation budget. You don't. You have infinite LINE budget (the AI will generate as much code as you want). But you have the same finite complexity budget as always.
This is a special case of a general fundamental point I'm struggling with.
Let's assume AI has reduced the marginal cost of code to zero. So our supply of code is now infinite.
Meanwhile, other critical factors continue to be finite: time in a day, attention, interest, goodwill, paying customers, money, energy.
So how do you choose what to build?
Like a genie, the tools give us the power to ask for whatever we want. And like a genie, it turns out we often don't really know what we want.
Right - knowing what to actually build always has been and always will be the limiting factor to actual success. I could spend months and hundreds of dollars generating the absolute BEST todo list that's out there but nobody wants that.
I have vibe coded 3 applications I never had time to code but always wanted.
Now it is different in a way where now I donât have time to use those apps.
Thatâs a joke.
But I do believe it answers the question of âwhat to build?â. If you didnât have time before LLM assisted coding you still donât have time for it. You most likely know what is used and what not already by heart or by some measurements.
Personally, i've taken a serious step back from 'unsupervised' vibe-coding. When the codebase is clean and you want some additional fix or small feature, Claude is quite good at mimicking your style and does a pretty good job.
When asking for a new major feature, despite hard guidelines and context (that eat half your context window), then it quickly ships bloat. The foundations are not very well organized and this is where you acknowledge it is all about random-prediction of the next word-thing.
Overall, i've wasted more time reviewing the PR and trying to steer it properly than I expected. So multi-layer agent vibe coding is no longer the way to go *for me*. Maybe with unlimited tokens and a better prompt, to be investigated...
I'm going back to writing algorithms on paper.
This reads too much like it was LLM generated. I can't say for sure if it was but I have an allergic reaction to the short snappy know-it-all LLM writing style.
AI;DR
Writing code by hand but blog post are written by LLMs?
yeah, it set off my llm radar too
We're still in the early ages and must discern hard what AI is good for, what it can maybe do, what it could potentially do and what it just can't do, and move those threshold marks very conservatively. AI is also cheap enough that it's worth shots of experiments. As long as you don't really rely on AI it's easy to test the capabilities of this new conversational autocomplete, and the random gains it offers can be magnificent (except when they aren't, of course).
What has generally worked for me is paraphrasing the old adage "Write the data structures and the code will follow" over to AI. Design your data, consider the design immutable and let the AI try fill in the necessary code (well, with some guidance). If it finds the data structures aren't enough, have it prompt you instead of making changes on its own. AI can do lot of the low-hanging fruit and often the harder ones as well as long as it's bound to something.
Yet, for now, AI at best has been something that relieves me from having to write a long string of boring code: it's not sustainable to keep developing stuff relying on AI alone. It's also great when quality is not an issue; for any serious work AI has not speeded me up noticeably. I still need to think through the hard parts, and whatever I gain in generating code I lose in managing the agents. But I can parallelise code generation, trying new approaches, and exploring out because AI is cheap. AI is also pretty good for going through the codebase and reasoning about dependencies whether in the context of adding a new feature or fixing a bug: I often let AI create a proof-of-concept change that does it, then I extract the important bits out of that and usually trim down the diffs down to at least 1/3 or less.
AI further helps with non-work, i.e. tasks that you have to do in order to fulfill external demands and requirements, and not strictly create anything solid and new. I can imagine AI creating various reports and summaries and documentation, perhaps mostly to be consumed and condensed by another AI at the receiving end. Sadly, all of this is mostly things not worth doing anyway.
Overall, I cringe under all the hype that's been laid on AI: it's a new tool that's still looking for its box or niche carveout, not a revolution.
I don't think we're in the early ages... LLMs technology has essentially stagnated since GPT3.5, we just have bigger models that can handle more context. We're trying to cope for the lack of progress of the actual technology by coming up with contraptions of multiple models stuck together, Mixture-of-Experts, Reviewer models, PM models...
I'm moving very slowly into AI coding. I'm not comfortable enough to let Claude do anything big. What I do is this: I set out general architecture, create function stubs and add comments on how to implement things. Then I let Claude do 10 minutes of work and I check everything and refactor some of it. It saves me on boring implantation stuff (like, is this an array, move an index here or there, check for whatever exists or not, put it in the db)
A problem often ignored is that while AI is trained on human written code, how it writes is different in practice.
Will that improve or get worse? One would argue that LLMs in general are drastically more competent now than they were a couple years ago, theyâre also much better at coding. Weâre likely just now entering the era where they can code but are still not what youâd fully expect, or at least not what someone with absolutely no coding knowledge could use to code at the same level as someone who does know how to code.
Maybe that changes as the models improve, maybe it doesnât, only time will tell.
Im exploring currently if i should split up a project into a framework part and the game itself (2d, idle game).
The framework could be an isolation later against viberod but not sure if its necessary for my small project i always wanted to do and never done anything with it.
For another tool, i will try another approach: Start with a deep investigation and spec write together with AI, than starting with the core architecture layout and than adding features.
So instead of just prompting "write a golang project with a http server serving xy, and these top 3 features" i will prompt "create a basic golang scarfold for build and test" -> "create a basic http server with a basic library doing xy" -> "define api spec" -> "write feature x"
There is kind a skill and depth to vibe coding though.
I'm thoroughly enjoying using AI to write code, but it paid off by years of doing things the hard way before. I already was a so called "10x developer" if I speak for myself. I'm doing things even faster now with AI.
So what you really mean is you are going to do better and more detailed skills files so you can get an architecture that you've thought through rather than something random?
Partly, but the order matters. The CLAUDE.md constraints only work if you designed the architecture first. They're just how you communicate it to the AI. The mistake I made wasn't writing bad skills files, it was not designing anything at all and expecting the AI to make coherent structural decisions across 30 sessions.
The rewrite is me sitting down with a blank doc and drawing the boxes before any code exists. Then the CLAUDE.md enforces what I already decided. Whether that actually holds up as the project grows, I genuinely don't know yet.
Are you really saving any time at all using AI at all then? If you have to write the architecture for it, write all the rules you want it to follow, check everything it's written, and then reprompt it because it's not how you want it?
Yes. I do all of this and I'd estimate 50-100% coding time savings. A lot of that comes from better multitasking over single-workstream throughput, which I suppose might compromise the gains depending on what you're doing. For me it amplifies the speedup by allowing some of my "coding time" to be spent on non-coding tasks too.
But even if coding time is reduced by half, is that worth the downsides? Coding has never really been a major percentage of my time.
I could be wrong in some subtle way I'm not seeing, but I believe the model we're working in avoids the downsides. I actually think my review bar is slightly higher now, because I don't feel as much pressure to compromise my standards when I know Claude is capable of writing the code I want.
Can't you just ask AI to break up large files into smaller ones and also explain how the code works so you can understand it, instead of start over from scratch?
That was actually the first thing I tried. It did a good jov at explaining the code base mess and the architecture. Then I ran 3-4 refactor attempts. Each one broke things in ways that were harder to debug than the original mess. The god object had so many implicit dependencies that pulling one thread unraveled something else. And each attempt burned through my daily Claude usage limit before the refactor was stable.
And I'm sure the rewrite is going to teach me a whole different set of lessons...
What's your test coverage like?
Not sure why good coverage wouldn't mitigate risk in a refactor...
My mantra whenever I'm working with AI is that I want it to know what "point b" looks like and be able to tell by itself whether it's gotten there...
If you have a working implementation, it sounds like you have a basis for automated tests to be written... once you have that (assuming that the tests are written to test the interface rather than the implementation), then it should be fairly direct to have an agent extract and decompose...
I'm currently working on the discovery phase of a larger refactor and have pretty quickly realized that AI can actually often be pretty useless even if you've encoded the rules in an unambiguous, programmatic way.
For example, consider a lint rule that bans Kysely queries on certain tables from existing outside of a specific folder. You'd write a rule like this in an effort to pull reads and writes on a certain domain into one place, hoping you can just hand the lint violations to your AI agent and it would split your queries into service calls as needed.
And at first, it will appear to have Just Workedâ˘. You are feeling the AGI. Right up until you start to review the output carefully. Because there are now little discrepancies in the new queries written (like not distinguishing between calls to the primary vs. the replica, missing the point of a certain LIMIT or ORDER BY clause, failing to appropriately rewrite a condition or SELECT, etc.) You run a few more reviewer agent passes over it, but realize your efforts are entirely in vain... because even if the reviewer agent fixes 10 or 20 or 30 of the issues, you can still never fully trust the output.
As someone with experience in doing this kind of thing before AI, I went back to doing it the old way: using a codemod to rewrite the code automatically using a series of rules. AI can write the codemod, AI can help me evaluate the results, but actually having it apply all of the few hundred changes automatically led to a lack of my ability to trust the output. And I suspect that will continue to be true for some time.
This industry needs a "verification layer" that, as far as I know, it does not have yet. Some part of me hopes that someone will reply to this comment with a counterexample, because I could sorely use one.
Rewrite following a new architecture plan could get finished pretty quickly, treating the original as a prototype.
When people talk about codebases being "incomprehensible", it's not always hyperbole. Sometimes the architecture literally cannot be broken up or understood.
I find that really hard to believe. It's not like curing cancer
When you see some legacy C++ codebase with millions of lines of code, catching cancer and slowly dying from it is more human than trying to unscrew that mess.
A really screwed code base blows out your context window and just starts burning tokens as the AI works out a way to kill -9 itself to escape the hell you're subjecting it to.
While I mostly agree - science is built up on truths. Code has a large amount of creativity and freedom built into the decisions, some codebases will be documented, follow rigorous training, and design decisions. Others will just be an absolute legacy mess of 20 years of odd decisions made by people who may have not known what they were doing. Like an art piece that you donât really âunderstandâ.
No but it can be a rube goldberg machine of insanity
I don't bother trying to give the LLM a set of dos and don'ts for how to write the code, that becomes a frustrating game of whack-a-mole. I find it a lot more efficient to have it write some code, look it over, and if I'm not happy with some of the decisions give it specific instructions for how to fix that one part. as a bonus I end up reinforcing my knowledge of the code base in the process.
I don't understand the people who "get the agent to do everything" for them. It just makes a mess if you do that. Yet if I spend a little bit of time setting a project up properly (including telling my minions exactly what to do) I can then get it to do the boring things for me.
The very worst things you can do in a codebase are (a) not deeply understand how it works (have it be magic) and (b) be lazy and mess up the structure.
How do you fix a problem which happens at 2:00am and takes your system down if you don't have an excellent understanding of how it works?
Over time we're already bad at (a) because most developers hate writing documentation so that knowledge is invariably lost over time.
That's a strange definition of "code by hand"
I don't think the prompts that the author has proposed will actually work. Including final scope and non-scope is good but it's more of a reaction of what the AI already did. These prompts are suitable for a rewrite, basically, since it's unlikely anyone would have had these ready when they start out.
I have found small iterations to have the best results. I'm not giving AI any chance to one shot it. For example, I won't tell it to "create a fleet view" but something more like "extract key binding to a service" so that I can reuse it in another view before adding another view. Basically, talk to the AI as an engineer talking to another engineer at the nitty gritty level that we need to deal with everyday, not a product person wishing for a business selling point to magically happen.
> I'm rewriting k10s in Rust. Not because Rust is better but, because it's the language I can steer. I've written enough of it to feel when something's wrong before I can articulate why. That instinct is the one thing vibe-coding can't replace. The AI hands you plausible-looking code. You need a nose for when it's garbage.
Isn't Golang relatively easier to read than Rust? I was under the impression that Rust is a more complex language syntactically.
> The other change is simpler: I'm doing the design work myself, by hand, before any code gets written. Not a vague doc. Concrete interfaces, message types, ownership rules. The architecture decisions that the AI kept making wrong are now made in writing before the first prompt.
This post is good to grasp the difference between "vibe-coding" and using the AI to help with design and architectural choices done by a competent programmer (I am not saying you are not one). Lately I feel that Opus 4.7 involves the user a lot more, even when given a prompt to one-shot a particular piece of software.
Go reads fine whether the architecture is good or bad, and I couldn't tell the difference until I was in trouble. Rust is harder to read but harder to misuse. The borrow checker would have caught that data race at compile time. I've also just written more Rust. That familiarity matters separately.
+1 on Open 4.7 involving the user a lot more. Rn I'm trying to get to a state where I can codify my design + decision preferences as agents personas and push myself out of the dev loop.
Buddy that k10s code was never good. Go vs Rust is not the issue here, itâs the fact the project was vibe coded without reading anything. Itâs hilarious to even think that a god model was caused by anything other than someone who let the bot choose too much.
Good architecture in any language is obvious to someone who is experienced and cares.
Go is actually great for bots to write if youâre actually thinking.
Gotcha, that implies you are going to read the code that the AI produces anyways.
> Go reads fine whether the architecture is good or bad
Were you reading the Golang code all along and got fooled or did you review it after it failed? Sorry I admit I didn't read the whole article.
He was NOT reading the code: "For 7 months I'd been prompting and shipping without ever sitting down and actually reading the code Claude wrote."
Right, thank you. Personally I think reading all the code that the AI produces is impossible and kind of defeats the purpose of using it. The key is to devise a structured way to interact with it (skills and similar) and use extensive testing along the way to verify the work at all steps.
> Isn't Golang relatively easier to read than Rust? I was under the impression that Rust is a more complex language syntactically
It sounds like the author knows Rust, and might not be as familiar with Go.
A language that you are proficient in is always going to be easier read than one you donât, even if it is an objectively easier language to to read in general.
In a world where juniors (or seniors in new territories) are incentivized to publish or perish, how will any of us gain proficiency any more? I can see the agent assisted journey accelerating some familiarity, but not proficiency.
Iâve used AI tools to do i18n translations to Spanish and Portuguese (somewhat ashamed to admit this). Iâve grown more familiar with the structure of these languages, and come to recognize some of the common vocabulary for our agtech domain. If anything, I feel more clueless about both languages now than I did before, when it comes to any sort of proficiency.
I am still mostly coding by hand, other than meeting the KPIs of AI use at the company, required trainings, use of agents and whatever.
Eventually like every hype wave the dust will settle, and lets see where we stand.
By now all the AI companies have consumed all human knowledge so they either learn to actually think for themselves, or that is it.
Either way, that won't change the ongoing layoffs while trying to pursue the AI dream from management point of view.
> Either way, that won't change the ongoing layoffs while trying to pursue the AI dream from management point of view.
I think most companies doing layoffs are bloated to begin with, AI is just the scapegoat to do the layoffs.
I am aware of layoffs that are really caused by AI.
Translation and asset generation teams for enterprise CMS, whose role has now been taken by AI.
Likewise traditional backend development, that was already reduced via SaaS products, serverless, iPaaS low code/no code tooling, that now is further reduced via agents workflow tooling, doing orchestration via tools (serverless endpoints).
the ship has sailed on my handcoding at work. the AI is producing stuff thats more bulletproof than what I can do in the same timeframe and if my competitors are using it, the pressure to ship is that much higher.
Personally, I've taken the time its freed up to spend more time on mathacademy and reading more theory oriented books on data structures and algorithms. AI coding systems are at their best when paired with someone with broad knowledge. knowing what to ask for and knowing the vocabulary to be specific about what you want to be built is going to be a much more valuable job skill going forward.
One example is a small AI based learning system I have been developing in my free time to help me learn. the mvp stored an entire knowledge graph and progress in markdown files. being an engineer, I knew this wouldn't scale so once I proved the concept viable, I moved everything into sqlite with a graphdb. then I decided to wrap some parts of teh functionality in to rust and put everything behind a small rust layer with the progress tracking logic still being in python.
someone with no knowlege of graph databases or dependncy graphs or heuristics would not be able to build this even if they had AI. they simply don't know what they dont' know and AI wont' save you there.
That said, I think its important to also spend time in the dirt. I've recently started pickign up zig as my NO AI langauge just to keep. those skills sharp.
> the ship has sailed on my handcoding at work.
I'm really curious if we'll seesaw once AI costs go up 10x.
I've been relying primarily on deepseek-v4-flash for 90% of my work. It sips tokens. that model will run on 128gb. not a cheap configuration for a consumer but within the budget of a developer relying on it for work.
Ive only been using kimi 2.5 and deepseek pro for reviewing PRs for security issues.
I think the issue is overblown by people who think claude code is a good harness and use opus for everything. opencode is objectively better. its much more verbose about what its doing, you have more control when it comes to offloading to subagents with targeted context (crucial for running through larger jobs) and I can swap between codex and open weight models.
And they will.
> I typed :rs pods to switch back to the pods view. Nothing rendered. The table was empty... > now something was fundamentally broken and I couldn't just prompt my way out of it.
Hey I don't want to over simplify, I'm sure it was complicated, but did the author have functional tests for these broken views? As long as there are functional tests passing on the previous commit I'd have thought that claude could look at the end situation and work out how to get the desired feature without breaking the other stuff.
TUIs aren't an exception, it's still essential to have a way to end-to-end test each view.
The problem wasn't the view didn't work. The problem was the view didn't work after something else had been done.
You can't test every permutation of app usage. You actually need good architechture so you can trust your test and changes to be local with minimal side-effects.
When the title stands in opposition to the actual post, I'm not gonna engage with that author again.
I used to write code by hand.
I still do, but I used to, too.
I think the answer here is to not use Claude with bubble tea. I tried the same thing and got the same result. But it seems to be limited to that specific framework, because it's really good at not doing the same thing with SolidJS.
While I felt this in 2025, I do not feel this in 2026. I use Claude and the rest with BubbleTea all the time.
But I will say... you have to know Golang. You have to have at least tried to make a BubbleTea app yourself and try to understand ELM architecture. You have to look at the code and increment with it.
It makes total sense for OP to switch to Rust and Ratatui if they don't know Golang well. But I don't think it's a better language for it. [Ratatui has brought me great inspiration though!]
Independent of framework, the LLMs get the spacial relationships. I say things like "the upper right panel's content is not wrapping inside and the panel's right edge should extend to the terminal edge" and the LLM will fix it. They can see the resultant text, I'm copy-pasting all the time.
TUI code is finicky; one mis-rendered component mucks everything up. The LLMs will decide themselves make little, temporary BubbleTea fixtures to help understand for itself when things aren't right.
The only real problem with LLMs and BubbleTea is that upon first prompt, they insist on using BubbleaTea v1 versus BubbleTea v2, released in December 2025. But then you just point it to the V2_UPGRADE.md and it gets back on track. That will improve as training cutoffs expand.
I vibe-coded this TUI for Mom's last night. I actually started with Grok (who started with v1) and then moved into Claude Code after some iteration:
https://gist.github.com/neomantra/1008e7f2ad5119d3dd5716d52e...
This is Claude's problem. Compared to GPT-5.5, Claude Code prefers to take shortcuts. I've tested having codexapp GPT-5.5 and Claude Code opus4.7 do the same thing - if following GPT-5.5's requirements, Claude Code's execution time for a task would stretch from 5 minutes to 40 minutes. To solve macro architecture problems, I use Lisp to write the entire program's framework. Lisp replaces architecture documents, because I believe it has high semantic density, syntax restrictions, and checkers for assistance. This way, at least I didn't have to rework anything anymore. I used this method to refactor my 20+ projects
I'm not sure we'll ever really be free of the GIGO (garbage in / garbage out) principle. Tools will get better and better, but can never be a substitute for a deep understanding of the thing we want to create.
Research also makes similar claims: https://arxiv.org/html/2603.24755v1
AI writes what you ask it to write, you need to talk to it about architecture. You should have an architecture doc so AI can shape the code based on that, you can get the AI to make the architecture doc also. If using claude you can use the software architecture mode for this.
A coder typing in code is not solely to generate outcome. It's part of ongoing thinking process. Without this ongoing process, we have no material to keep iterating forward.
This is the wrong take. If you keep âvibeâ coding and end up with bad results you should probably question your ability.
When he mentions I push commits at work for as long as my tokens last I can understand that. Managing tokens has become an important skill.
What has really made AI coding be able to continue to work as the project got bigger was using speckit. It has been great at keeping the code consistent across features.
https://github.com/github/spec-kit
I'm just wondering: you know what architecture you want to go to now and you have the tests... can't you just let Claude refactor it to the better architecture?
Also 1600 lines... didn't any agent reviewing the diffs point that out?
You're also adding a lot to claude.md, I dunno how much that file has grown but a big claude.md file with many instructions, I don't think the ai will be able to remember all those rules.
> can't you just let Claude refactor it to the better architecture?
In my experience, no. These tools suck at refactoring, mostly choosing to add more code instead.
I'm just wondering: you know what architecture you want to go to now and you have the tests... can't you just let Claude refactor it to the better architecture?
Also 1600 lines... didn't any agent reviewing the diffs point that out?
You're also adding a lot to claude.md, I dunno how much that file has grown but a big claude.md file with many instructions, I don't think the ai will be able to remember all those rules
It absolutely looks like AI psychosis.
i try to write one portable shell script per day; using AI would take all the fun out of it, so i never started using it. i honestly find it ridiculous that anyone uses it to write code, it just doesn't make sense to me.
So how are people writing the specifications for AI?
Do they write empty functions and let AI fill them in?
Or do they use some kind of specification language?
Are people designing those languages?
LLMs assist those of us who were apt to take blocks of code from StackOverflow, or wherever, to solve problems quickly and avoid as much of the aggravating and slow toil of trial and error as possible.
That trial and error process is still happening with a LLM, but much faster, and with instantaneous cross-references to various forms of documentation that I would be looking up myself otherwise. It produces code of a quality that is dependent on the engineer knowing what they want in the first place and prompting for it and refining its output correctly.
It's the exact same process of sculpting code that the majority of the industry was doing "by hand" prior to the release of LLMs, but faster, and the harnesses are only getting better. To "vibe code" is to prompt vaguely and ignore the quality of the output. You're coming to a forum full of professionals and essentially telling us that you're getting really frustrated with your Scratch project.
I don't know if you're trying to lead a charge or whatever but good luck with that. As a senior SWE, it is clear to me that this is the new paradigm until something better than LLMs comes along. My workflows and efficiency have been vastly improved. I will admit that I have never really been a "I made a SMTP server in 3k of Rust" kind of guy, though.
It's pretty simple to vibe code for months without producing slop. And it's the same recipe one used before AI: 1. make it work 2. make it pretty 3. make it fast Omit 2. and 3. long enough -> slop beyond recovery
You don't need to go back to coding by hand if you know how to do it already. There is a middle ground.
If you understand good software architecture, architect it. Create a markdown document just as you would if you had a team of engineers working with you and would hand off to them. Be specific.
Let the AI do the implementation of your architecture.
Vibe coding works great with test driven development. You can have AI write the tests as well, but you need to confirm yourself because it's lying all the time. AI coding is like when you first started out, it's copying random bits and pieces from the web into your code until it works... Good for one shots and proof of concept. But for any long living project I think you are better off rewriting it from scratch yourself. Abstractions let you work faster, especially when you have it all in your head.
Outright lie clickbait. As he states himself, he's doing the design work by hand, and will likely still use AI to write code.
Good luck finding a job. All the decision making business people I know see only two types of âtechnical peopleâ.
The ones who are âAI pilledâ and the contagious lepers.
Strict SDD might help to constrain and harness the process.
> tl;dr: AI writes features, not architecture.
This. I definitely agree with this statement at this point in AI-assisted development. This gets at the "taste" factor that is still intrinsically human, especially in software engineering. If you can construct and guide the overall architecture of an application or system, AI can conceivably fill in the smaller feature bits, and do so well. But it must have a strong architecture and opinionated field in which to play.
My main takeaway, too. Been using Claude on my side project that I have singlehandledly been working on for three years. It works well initially, you catch all of AIs mistakes or unfavorable approaches because you know the architecture in and out. But as you stop thinking about the new features, stop losing touch with all the stuff AI throws at you, you fail to develop intuitive feeling on when and how to abstract and introduce architecture.
Another note was for me e2e tests; while AI can write them it never comes up with just basic organization or abstraction required to manage a large e2e test suite with hundreds of tests. It immediately starts to produce spaghetti code.
Let me preface my comment by saying I also still write a lot of code by hand - especially when it's something I know I need to understand in depth, and in some cases defend.
With that said, this caught my eye:
> AI gravitates toward single-struct-holds-everything because it satisfies the immediate prompt with minimal ceremony.
This is too general. "AI" is used here as a catch-all, but in fact, it was the specific model under the specific conditions you ran your prompt, including harness, markdowns, PRDs, etc. So it's not fair to say "AI does X!" in this case.
It's also very much up to you. It's very common to have a frontier model plan an architecture before you have another model implement code. If you're just one-shotting an LLM to do everything you get mediocre, more brittle code.
This stuff is still being figured out by a lot of people. But I feel the core of the issue is not using AI well. Scoping, task alignment, validation, are crucial.
nuts
The title is just flat out wrong. The author isn't going back to writing code by hand, they're plopping some new stuff into their CLAUDE.md to "fix" the issues they see AI is having.
I also code by hand.
But in my main work, reverse engineering, LLMs are godsend, for years now.
You can basically bruteforce binary obfuscation thanks to them. And thanks to eager chinese LLM providers, basically for free.
But I always use LLM only for boring work and rest is for me to do manually, or with scripts of course, but made by me. Because I want to learn.
Yes, there are a lot people using LLMs for full RE automation since they're selling exploits for profit. No problem with me.
I see funny future for huge corporations like Adobe, etc.
Imagine prompt, "Hey Claude, re-implement Adobe Photoshop with clean-room design" One agent will open decompiler, outputs complete low level technical details how is everything implemented.
Second agent implements new Photoshop based on that.
They will be mad and I like this.
You will own nothing, and you will be happy, corpos.
another behavior I noticed is that even you plan with an agent than a lot of business logic leaks to the code.
some states, for an example, are meant to be assumed from the data shape, rather than the actual state fields, but damn they like adding a state field.
Writing code by hand is an oxymoron. You donât write code with AI, AI doesnât write, it generates.
Wow ok, I will too then. Fuck AI!
This doesnt make much sense the article itself is AI written
It would have been easy to run a few ai agents to review the code and find these issues as well and architect it clearly
Luddite.
Seems to be an unstated assumption that the Ludds were wrong.
Alternate title: "I did not understand the current limitations of AI and assumed it could do large software design and it generated spaghetti slop"
Yea, that's why engineers are still very important for now (until models can do this type of longer term designs and stick to them).
>"I'm doing the design work myself, by hand, before any code gets written."
This is what I was doing right from the beginning. AI just fills out methods and doing other low intelligence work. Both are happy. My architectures and code are really mine, easy to read and reason. AI gets paid and does not get a chance to fuck me in the process. At no point I felt any temptation to leave "serious" to AI.
I donât really think OP is writing code themselves since they admit they still use agents for code gen. Iâve really scaled back the amount I use agents though because in the medium to long term I havenât been getting good results with them. And itâs not enjoyable. Thatâs enough for me, Iâll do whatever for a job because who cares, if the company wants slop I will gladly give them that, but for my own shit Ive gone back to circa 2024 and am mostly just using them as a chatbot.
Inb4 âyouâre gonna be replacedâ god damn it I hope so, I do not want to spend the rest of my life behind a computer screenâŚ
I feel like this article was circling a point it never actually got to. All the advice in here (except controlling scope creep) is specific to a TUI with an elm like architecture.
But here's the thing, you almost never know what the architecture is up front. If you do you probably aren't the one writing the actual code anymore. Writing the code, with or without an AI is part of the design process. For most people it isn't until they've tried several times, fucked it up a bunch, and refactored or rewrote even more that you actually know what the architecture needs to be.
> I learned over these 7 months
7 months ago was early November. Coding assistants were getting very good back then, but they were still significantly poorer at making good architectural decisions in my experience. They tended to just force features into the existing code base without much thought or care.
Today I've noticed assistants tend to spot architectural smells while working and will ask you whether they should try to address it, but even then they're probably never going to suggest a full refactor of the codebase (which probably is generally the correct heuristic).
My guess is that if you built this today with AI that you wouldn't run into so many of these problems. That's not to say you should build blind, but the first thing that stood out to me was that you starting building 7 months ago and coding assistants were only just becoming decent at that time, and undirected would still generally generate total slop.
Does âwriting code by handâ mean youâre not going to use compilers to generate assembly?
Now I do feel lucky that I started learning coding about four years before the LLM revolution, but these things are really just natural language compilers, arenât they? Weâre just in that period - the 1980s, the greybeards tell me - where companies charged thousands of dollars per compiler instance, right? And now, I myself have never paid for a compiler.
This whole investor bubble will blow up in the face of the rentier-finance capitalists and Iâll be laughing my head off while it happens.
Nondeterministic natural language compilers
Just because the trajectory is chaotic doesn't mean itâs not deterministic.
So C++ doesn't count as code now.
If you're coding by hand, then you're that carpenter before IKEA came along. Now the market wants bland machine-built functional furniture that gets replaced every five to ten years. If every tenth piece is broken or slightly off, doesn't matter, mass production has lowered the price that a replacement is available for free and you're still making a profit.
Time to become a "product engineer" and watch the hyper-agile agents putting up digital post-it notes on digital pin-boards discussing how much each post-it is worth in digital scrum meetings. Meanwhile the agents keep wasting more and more time so that their owners make less and less of a lose, until eventually a profit is made.
Until the costs become prohibitive and humans become cheaper than the agents that replaced them. Once the agents are replaced by the humans, the next hype bubble awaits around the bend.
/s
We should go back to designing UML diagrams for programs before we write them /s
TLDR ai wrote tech debt slop because I vibed for 7 months, now I am taking a hybrid approach of defining strict constraints before vibingâŚ
Not sure if just me, but this post feels AI written?
You are absolutely right.
Feels a bit too long winded to be AI generated.