The code it generated was awful. The kind of garbage that people who don’t know any better would ship: it looked right and it worked. But it was instantly a maintenance dead end. But I had an effortless time converging on a design that I wouldn’t have been able to do on my own (I’m not a designer). And then I had a reference design and I manually implemented it with better code (the part I am good at).
You can talk to a bunch of designers who will say the opposite. Claude Design Studio generated this garbage UI, that I fixed manually, but it created great code j never could have that made it work.
These systems should allow rapid iteration on discovery and thinking. One can now make a prototype a day that would have taken a week. That means that we should be able to converge on a much better design in the same amount of time it would have taken to make a v0 that turns how to have systemic flaws.
AI should scale our understanding of systems, not just shovel out half baked features and apps.
In the Tailwind thread the other day I was explicitly told that the intended experience of many frameworks is "write-only code" so maybe this is just the way of the future that we have to learn to embrace. Don't worry how it's all hooked up, if it works it works and if it stops working tell the AI to fix it.
It's kind of liberating I guess. I'm not sure if I've reached AI nirvana on accepting this yet, but I do think that moment is close.
At the moment, we understand the basic tech, could reasonably DIY, but choose not to knowing full well there's a mess of understandable code somewhere we could go clean up but dont want to. We accept fast iterations because we know roughly the shape of how it "should be" and can guide an automated framework towards that. This is especially true on our own projects or something we built originally! Stark/Iron man knew/moved, the suit assisted by adding momentum.
We're riding our "knowledge momentum".
If companies can hold out long enough, that knowledge completely fades, and the tool is all you have. At that point, they are locked in. Then it's not Iron man, it's an Iron lung (couldn't resist!)
Prototypes are practically free now. You can ask the AI try each architectural or stylistic option and just see which code you like better.
To your point, another interesting note is that rewriting and rearchitecting are also very good.
One pattern I like is to vibe code a set of solutions, pick the approach, then backfill tests and do major refactors to make it maintainable.
Here the skill is knowing what good architecture looks like, and knowing how to prompt and validate (eg what level of tests will speed up the feedback cycle or enable me to make the LLM’s changes legible).
To be fair the “ready, fire, aim” approach of rapid prototyping has been known for a long time, but you need to be quite quick at coding in old world for it to work well IMO.
- first I've created a skill how the architecture of the system should look like
- I'll tell the LLM to follow the guidelines; it will not do that 100%, but it will be good enough
- I'll go through what it produced, align to the template; if I like something (either I've not thought about the problem in that way, or simply forgot) I add that to the skill template
- rinse and repeat
This is not only for architecture of the system, but also when (and how to) write backend, frontend, e2e tests, docs. I know what I want to achieve = I know how the code should be organized and how it should work, I know how tests should be written. LLMs allow me to eliminate the tediousness of following the same template every time. Without these guardrails it switches patterns so often, creating unmaintainable crap
Bear in mind - the output requires constant supervision = LLM will touch something I told it not to touch, or not follow what I told it to do. The amount of the output can also sometimes be overwhelming (so, peer review is still needed), but at this point I can iterate over what LLM produces with it, with another LLM, then give to a human if it together makes sense
I'm not a designer either, but I've been around designers long enough to recognize when something is bad but just not know what is needed to make it better/good. I've taken time to find sites that are designed well and then recreated them by hand coding the html/css to the point that I consider myself pretty decent at css now. I don't need libraries or frameworks. My css/html is so much lighter than what's found in those frameworks as well. I still would not call myself a designer, but pages look like they were designed by a mediocre designer rather than an engineer :shrug:
I've found that Claude can write code that's on par with what an average SE can produce, but you have to guide it. Write instructions, but also ask it to propose and assess multiple options, even ask it to run a code review and refactoring every so often.
Yes, the code is unlikely to be approved by Linus Torvalds, but so is the code that's made by many humans that is still merged and shipped.
that’s the problem
It doesn't really make sense to suggest AI can work on something any make it now and work correctly, and at the same time say it's unmaintainable. It is maintainable with AI.
The real question is whether or not you're happy to ship AI-generated code that you can't modify to production unless you use AI. Few developers are there yet, plenty of non-tech people are there already. I don't know which group is actually wrong.
Which I think is perfectly worthy of exploration. Some people want to check in the prompts. Or even better, check in a plan.md or evenest betterest: some set of very well-defined specifications.
I'm not sure what the answer will be. Probably some mix of things. But today it is absolutely imperative that the code I write for the case I wrote it in is good quality and can be maintained by more than just me.
I suppose you could solve that in two ways. Manually rewrite it as you did. Or formalize an architecture and let the AI rewrite it with that in mind. I suspect that either works.
AI is great at creating slop that almost works.
But, my god, it is terrible at following clear as day instructions on how to cleanup slop.
It wrote 150k lines of code that almost works in 2 months. It's taken 1 month to delete about 2000 lines of broken architecture and fix it, and it still hasn't gotten it done, despite nonstop repeated efforts to do something not that hard.
I definitely could've fixed it less time then I've spent prompting at this point (but no way I'd have gotten the other 150k lines). But doing it myself is not the point. It's to see if it can actually scale.
The answer is yes... But my god is it agonizing.
The creating garbage part that almost works is fun.
The inevitable cleanup is not.
And unfortunately I don't see this aspect materially improving in the short term.
If you want it to code you something about 5-10k lines of code that's already been done 1000 times before or only slightly different, it's great.
Most people want more than that.
Sort of like LLMs can be prompted "in the style of Charles Bukowski" write blah blah
maybe "in the style of core freebsd code" to make things tighter or maybe a specific git author who checks things in
the power comes from creating the machine you can steer. Treat AI like an over eager college intern who you need to hand hold, but do tasks.
I gave up on this recently. It achieved the goal now, and in a year or two, when you actually want to add whatever feature, the SOTA AI will probably be able to clean it up as it does so. What does "maintain" even mean anymore?
If you don't agree, how many years into the future do we need until you would agree?
And people have been saying this exact thing for years now. Someone said this very thing two years ago. And we're still at the "maintenance dead end" stage. So let me flip it back on you: how many years are we going to pour an obscene amount of resources into this thing that is always going to be able to clean up its own messes "in a year or two" before we realize its a dead end (at best) and we need to be using those resources elsewhere? And, similarly, what happens to you when the SOTA AI in two years can't clean up the code it wrote for you two years ago, but people are depending on it and your still on the hook for maintaining it?
I feel what you write, but then again: every now and then i write small greasmonkey scripts to remove annoyances from webpages, and to do so i have to look at the html and the kind of trash you describe is already there.
[dead]
Iron Man created Jarvis whose capabilities are way beyond any models in the near future. So it wasn’t an Iron Man moment.
(And on a personal note, I'm glad we don't have a publicly released Jarvis before we get our act together about the use.)
I keep asking myself the same questions, and the conclusion I keep coming to is the clean modeled structure we want to see is for humans to maintain and extend, but the AI doesn't need this.
There's definitely an efficiency angle here where it's faster for AI to go from a clean modeled solution to the desired solution because it's likely been trained on cleaner code. Is this really going to matter though?
The best argument I can come up with is the clean modeled solution is better for existing development tools because it's less likely to get confused by the patch work of vibes throughout the code; but this feels like it ultimately becomes an efficiency concern as well.
This just might be the new reality, and we need to stop looking behind the curtain and accept what the wizard presents us.
Sure, Claude Code and Codex can write (most of) the code for me - but the amount of technical knowledge I need to decide what and how to build remains enormous.
As an example: I'm working on a system right now that works like Claude Artifacts, allowing custom HTML+JS apps to safely run in an iframe sandbox inside a larger application.
Just understanding why that's a useful thing that can be built requires deep knowledge of sandboxing, security threats, browser security models, and half a dozen different platform features that have been evolving over a couple of decades.
A vibe coded without that technical understanding would have zero chance of prompting such a thing into existence, no matter how much guidance the LLMs gave them.
It really saddens me to see some developers talk about literally quitting their careers over AI, right when the benefits of existing deep technical experience have never been more valuable.
There's an interesting repository with 63600 stars on GitHub (1). The developer of the repository is No 1 at the GitHub's trending contributors list (2). However, it seems like the application isn't what it's described to be (3), and the developers, on their end, are unable to clearly answer whether this is real or not, as it's just messy LLM output.
Proof that the suit alone doesn't make anyone Iron Man.
1. https://github.com/ruvnet/RuView
It's a reducer of time.
For less experienced developers, it's an immediately reduction at the start of a project. But then they will almost certainly have problems later when their initial decisions come back to haunt them.
For senior devs, it's like having a junior or mid-level dev that will instantly do things within their capability, so long as it's explained to them well enough. This junior dev will do things fairly smartly, but any important decisions left to them will be wholly or subtly wrong. And the subtle ones are the worst ones, because they're so hard to detect.
But if that senior dev sets the guidelines well enough, and notices the problems, development is so, so, so much faster. It's wild.
Better headline: "Why AI Multiplies Developer Skills Rather Than Replacing Them"
This is the "view on web" link designed for people to read my emails if they don't display correctly in their email client . It’s not really intended for a broader audience.
What folks seem to avoid is that a Junior (in ANY subject) has the ability to LEARN so much faster with an AI research assistant, and that becoming an expert has accelerated for those with the personal stamina to dig deep (this as a requirement hasn't changed). I spend just as much time with my AI tooling asking questions as I do asking it to "build" or "fix" things. "How does this work?". "Can you suggest other tools?".
I think some people always think about AI as an input / output relationship, when a lot of the time, the fiddling in between, with or without AI was always the important part. Yes people will suck in the beginning, against they always did. I think the good folks though will suck for a MUCH shorter time than I did getting into things.
A lot of people will drop out and get discouraged. That happened before too. Learning things requires persistence. I think the only real case to be made is that AI's sense of immediate pleasure can neuter people away from running into friction. AI natives likely won't understand friction and question it.
What’s not clear to me is: if writing more code per engineer is possible, does that result in fewer engineers or just more software, especially in areas that traditionally got squeezed: UX, testing, DevEx, documentation, etc. Perhaps the bar just gets raised?
Me: Isn’t it crazy that X is better than Y.
Claude: what an insightful critique, Y is better than X because of x, y, and z reasons.
- And this answer from Claude was good. Thoughtful, well-reasoned. But it was opposite the point I wanted to make so I said :
Me: “oh, you heard me say Y is better than X, I actually was being counter/intuitive and said X is better than Y”
And Claude responded:
Claude: oh you’re absolutely right, X is better than Y for the following reasons (and Claude again provided a well reasoned response here)
And this is sort of that dumb smart genius meme.
“It’s just autocomplete” “No it’s way more than that it has a model in its mind” “It’s just autocomplete”
I liken it to the library of babel. All the genius in the world, but only if you have the right index keys.
Someone needs to watch iron man 3...
I like to think of it as a normal distribution, the further away a programmer is to the right of the mean, the more their benefit. It's almost like it's their standard deviation squared (σ²). So someone like Matt Perry (as OP mentioned), who is a >99.99% programmer for argument's sake and is therefore four standard deviations away from the mean... Matt gets a (4×4) 16x multiplying effect on their productivity.
Someone who is a slightly above average programmer might see a 2 or 3x boost on their productivity, which is huge(!) and might also make them fear for their job. Which tracks with the level of moral panic we are seeing and experiencing. This math kinda still holds up for "bad programmers" too (i.e. left of the mean), as in they still see a boost to their productivity (negative squared is a positive number)... but there's something iffy about their results. The technical debt is unmaintainable and because they don't _understand_ the systems that they're operating in, they end up in the "3 hour" prompt loops that the OP refers to.
> Similarly, if Matt Perry handed me the keys to the Motion repository and told me to take over, I wouldn’t have the same results even though I have access to the same set of LLM tools.
The question is -- how long is this multiplier going to exist for? Some people would wager "for the foreseeable long-term future"; some people think it will widen further; and some people think it will diminish or god forbid even collapse. It feels like most arguments at the moment (like this article's) are that the humans who "know what they are doing" will be able to baton the hatches and avoid being usurped by ever-capable models. I saw it in a café yesterday: someone was using a coding agent to build a marketing website for their project, getting more and more frustrated by not getting the outcome they wanted. Their friend typed a couple of sentences on their keyboard and got a "Dude! How did you do that? That was sick!" a minute or so later. "I used to build websites" the friend said. -- The friend 'knew what they were doing'.
How much longer is knowing what you're doing going to be a moat?
I work with two pretty green developers. The rate that they can make a mess is now phenomenal. And the sense of confidence the tools give them with early successes, means any experience I might have to offer means less now. Which is ok, I’m not going to be that “my experience has to be useful to you so I still fell relevant” old guy. But I do find myself curious how “lessons are learned” that lead to greater and greater tool exploitation in this brave new world.
And still worth repeating that AI has net negative gain in team settings but is a booster for lone wolves like me.
true fullstack (not only tech parts - from marketing to product to boring IT and financials) makers now have insane speedups, IF they previously did everything by hand properly and found ways to be good at it solo. Often the only scarce resource was your own time (or more correct: attention/focus hours per day) for execution and research. Now offloading stuff to agents, especially since you know the domain and the shortcomings by heart and when to (not) not trust AI is supercharged.
People fail to reap benefits for organizational scale and mostly fail completely, they try to make AI fit a human process spaghetti theater somehow, but a lone wolf can just change himself entirely in days and be adapted and leverage AI-first if he wants to. just like that. no coordination needed beyond rewriting personal scripts, which is fast with LLMs too.
1. AIs aren't yet good at architecture.
2. AIs aren't yet good at imagining technically exciting stuff to build.
And I agree that there's still space there to build a career in the short to medium term (plus Jevons Paradox). When both those points are no longer true we are certainly much closer to, dear I say it, agi. I suspect that (1) will be solved for somewhat limited domains in the near future using harnesses. And it could snowball from there.
Those are people who weren't making it to the MVP stage before LLMs.
There is no doubt that highly technical people are getting A LOT more out of LLMs than people without dev experience, in an absolute sense. I think it's less clear in a relative sense.
A question I also ask myself a lot: What are the skills I'm leveraging, exactly, as a highly experienced developer that's now doing a lot of vibe coding?
1) I'm choosing good technology for the task, and thinking about what LLM-agents are good at and choosing technology that they can work well with.
2) I'm choosing good workflows for the LLM-agent, starting a new context at the right time, having it test things, making sure it has logging that it can inspect, making sure it can operate the application in a way that it can debug and inspect it.
3) I'm thinking about the code even though I'm not looking at it, I'm telling it how I want things implemented, I'm telling it how to debug things.
I think these are all hard things for non-developers to do, but I also think non-developers will be able to replicate a large chunk of #1 and #2 relatively quickly. I only have to figure out that it's valuable to tell the LLM-agent to use playwright when working on web page visuals once, and then I can tell you to do that too. Or the coding agents will come with that knowledge built-in (to the model or as a builtin skill or whatever). Knowledge around this will accumulate and become easier for non-developers to access, and in many cases be builtin to the models or harnesses.
It can allow a skilled engineer to have multiplied effect of repeating their skills HOWEVER it would take away their ability to question, think and improve themselves. The syntax highlighting by editors is a good example, most engineers cant work without it, however its a static skill which does not needs constant improvement so its an acceptable support risk.
I've found I can prevent the LLM, in many cases, from thrashing on a bug/feature for long periods of time by switching into plan mode and, even in the middle of a conversation, having it reassess the structure around the problem, first. If you keep prompting about the same bug, it may keep producing variations of the problem code. But forcing it to stop and 'think' for a bit, has yielded much better results.
But the general argument of 'we will need skilled operators' still holds.
For every 'junior' displaced by AI, there will be some other kind of relevant role they're needed for.
Agentic workflows, integration, all the data science stuff, new UX paradigms.
I don't think the job numbers will dwindle, just shift.
The software is a tool specifically designed around my requirements of managing lectures that need to be prepared, managed, have presentations, grading etc. I wanted one big space where i can quickly access all related data in a workspace, fold and unfold important aspects while also editing and moving contents across multiple days/lectures.
The first version is a vscode plugin, which i now use since about 4 months without or with minor modifications to manage my lectures and private data. The second version is a standalone application which improves on the ideas of the first version and goes a few steps further.
AI can make you something that looks like its running quickly. But when you try to finish it takes way longer then you'd think. You need to specify every little detail. You need to make its KISS and DRY etc. You let it analyze the application structure and simplify and cleanup nearly the same amount of times as you add features. While fixing bugs you might need to run the same thing multiple times and revert any unrequired changes. You need to think about good level of debug logs and ways that the program can help you find errors and report them quickly.
I hope my project will be ready in about 2 to 3 months. The current version is according to a quick analysis over 850 files with 250'000 lines of code.
I spent about 2000$ on ai subscriptions in that time. 200$ claude for a while down to 100$ a month now. 20$ to openai which is very important for architecture and reviews. 20$ on tests with other ai's, but i rarely use them in the works. I also spent 1500$ on 2 * 3090's to hopefully have a local ai agent in the future.
I spend about 2 to 4 hours each day (including weekends) to check that app and write prompts.
I would never have been able to create such a large and complex project next to my other tasks and i am very confident that the final product will be good enough for productive work.
It also has a multiplying effect on technical deficits.
If you habitually demonstrate poor attention to detail when developing software, AI will amplify that too.
LLMs project who we are back at us, amplified, for good or ill, and I’m starting to wonder exactly how deep that runs.
You cannot hold a computer liable for any of those reasons. You can, however, sue the human that built or used the AI. So those concerns shoudn't be any different with or without AI. The same problems will be here either way. If you really care about those problems, you would demand your representatives in government actually enshrine those things in law, with some teeth, to ensure companies prevent problems with them. If you don't do something about those problems (with or without AI), then it's clear by your actions that ethical/environmental/safety concerns aren't actually that important to you.
AIs have skills humans aren’t good at like nerding out on technical details.
That’s not a perfect map because I’m spitballing. However there is a symbiosis.
I am not sure I am productive anymore with AI as I am up to 125 repos and agents most of which are tools for managing AIs and things break frequently that it feels like spinning plates.
I spent two months in November and December last year writing by hand a fundamental library to constrain how the AIs build clis. That did make things move a lot faster but for those two months I felt the slowness.
I think it will always be like this. It’s the nature of paradigm shift to shift.
[deleted]
The guy uses Claude Code, same as me… it’s like a highly skilled mason with a chisel, against me with a chisel. I’m not going to produce the same masterpiece, because there is SME that underpins the accelerant.
The problem is just that the question is not whether "human developers will be necessary in the near future", it's "how many human developers will be necessary in the near future" - managers wanting to exploit the efficiency gains by deciding that fewer developers can now do more work "thanks" to AI.
I understand the need to make a living but hard to take this stuff seriously/sincerely with the, "and buy my course!" angle.
Now I can just get to it. I know what I want and can organize the codebase, whatever the code is I can generate it.
I think most understand that their jobs are going away because we will need fewer engineers to build the software their companies currently need.
Now - huge opportunity for new companies and markets, but not sure if they will be as profitable.
- Lesser overall engineers needed -> lesser demand of human engineers -> lower compensations
- insufficient training at junior levels.
- longer time to productive human engineering skill.
These are playing out right now, and a concern for all engineers in the industry. IronMan amplification don't address the above
Not the most talented developer, but this has been pretty much my experience as well. Just keep it under control, know what and why its doing at every step, read the code, and then it will boost your productivity.
If you know what good looks like, the tools are incredible. If you don't, they help you produce plausible wrong things faster.
* it cheats at verification. Even with specific instructions how to verify, it still cheats.
* generating UX(CLI tool) that is absolute garbage and inconsistent, even with specific instructions to minimize unnecessary flags, use convention over configuration ,etc.
* it absolutely will not go 'above and beyond' to solve problems - if task is hitting a permission or dependency barrier, it'll likely cheat or handwave the problem away. (gpt 5.5 xhigh)
There is maybe this hope/hubris that we can figure out just the right incantations or agent workflows to eliminate these issues - I was optimistic about this too but after trying for awhile and seeing them not only not go away but in some cases regress with newer models, I am less sure.
One is augmentation and the other is whack-a-mole.
Excellent read.
Yes this is what I have been experiencing as well. I kind of live in a bubble around my own technical expertise, but it seemed only natural that you would have to guide the model. I'm new to AI as of February. Someone suggested I use it to upgrade my Hugo site and that was all I needed. No I am converting a CRM to obsidian and Rewriting some Obsidian plug-ins to do what I want them to do.
For example, I am using an Obsidian plug-in to do presentations from my markdown files "Advanced Obsidian Slides" but it styles the slides using absolute styling. Every time I tried to the prompt model to improve display elements in that styling I spent a lot of time asking it to go back and try something else to get the results I wanted.
Finally I decided to rewrite the plugin with a fluid design with in the fixed boundary of the slides. Instead of using floats I switched to Flexboxes and all my problems were solved in one-shot situations.
The moral of the story is don't expect the model to do what you want it to do without the proper framework. And research correct terms, you will save yourself a lot of time prompting/writing instruction docs that the LLM will end up implementing.
Guitars do not think. AI does. The analogies that try to paint AI as "just another inanimate tool" are way off base, and so is the conclusions of this article.
[deleted]
This sentiment will stray further from the truth as time goes on.
Sure, it's a multiplier for those who are already skilled, but for those who are unskilled, it is capable of taking you from 0 -> 1+.
The ones currently benefiting from AI are the ones who (i) have a general understanding of how an AI works and experience with using it and (ii) have a very generic understanding of what it is they're trying to do (programming, most likely) and know the limits of their tools, but don't know how to actually do anything meaningful.
The whole point of AI is to open the door of complexity to normies; they are the ones benefiting most from it. For a skilled developer, it may make a 1hr task -> 5 mins; for a normie, it makes something which was utterly impossible into -> now within his reality to achieve. the difference for normies is just more life-changing.
If you think of skilled developers as the ceiling and normies as the floor, AI raises the floor higher by giving normies more capability, which makes the ceiling seem less impressive. But eventually the floor will surpass the ceiling, and then it'll be a matter of who can operate AI better/how good AI is.
[deleted]
I didn't think this 6 months ago but today after what I've seen these models debug and accomplish in established, messy production monoliths, I'm fully convinced even the worst vibe coders are only a year or two away from being able to actually create something from scratch and have it not blow up 50 files in.
So I guess I take the totally opposite stance, today's AI is the worst AI will ever be at coding, and I believe the vested interests behind AI do not plan on making it any worse at this task, so...
Everything these days is either the greatest thing ever or the worst thing ever. All the stuff in the middle has vanished. Very few it seems acknowledge AI as being a useful tool. It's either "We're all being replaced" or "The technology is all slop" and everyone talks over each other like it's the Super Bowl and their teams are battling it out.
It would be nice if we could just look to the opportunities this tech offers and focus on that.
I used to be a PM and am technically literate enough but can only very minimally write code. I have been using LLMs to build (or try to, at least) internal tools for my business since GPT-4.
In the early days, I'd get a little ways, then the LLM would start breaking things, and I'd try but fail to get it to fix things. But over successive generations, I was increasingly able to get it unstuck by offering suggestions on where it may have gone wrong. With Opus 4.7, I don't even really have to do that - if something isn't working it's usually sufficient to just tell it what's broken. It can figure out how to fix it without my input. And of course fewer things are broken in the first place.
So I think I'm very well positioned to understand how these things are improving - better able to get the LLM to do what I want than the post OP quoted from /vibecoding (though I am 99% sure that post is actually AI slop), but less so than most of the people posting in this thread. As they've improved, whatever ability I have to guess at the causes of problems based on my experience having seen things go wrong with products I've PMed has become less necessary to getting the right outcome.
I expect that trend to continue - increasingly the LLM won't need the guidance of people with a great deal of technical expertise. I basically no longer have to attempt to diagnose problems in order to get them fixed, though with the caveat that I am building internal tools for which I am the only user, so certainly much simpler in scope than the stuff OP is talking about.
> Without guidance, LLMs tend to paint themselves into a corner, because they’re generating code to solve individual prompts, not thinking holistically about an application’s architecture.
The crux of what I'm trying to say here is that I absolutely believe that this line is 100% true today, but I would be deeply cautious about assuming that it will continue to be true given the improvements in LLMs over the past few years.
When you see rising inequality, don't just cheer because you happen to win for now.. maybe think about the future and also others..
What can Tony Stark do if all is done by the suit? What can he do after a year, two years, five years?
Y’all sound the same:
> Let’s start with an uncomfortable truth: AI models have become shockingly good at completing a wide variety of programming tasks. They’re certainly not perfect, but in many cases, they’re good enough. I’m not happy about this, for a wide variety of ethical/environmental/safety reasons, but it is what it is.
More Inevitabilism posting with the “not happy with” but is-what-it-is washing of your hands. At a distance you all look the same: an army of posts insisting the obvious, the inevitable; who knows why you all need to sound the same and say the same thing, but I guess it is to keep it top-of-mind for us alls. It is what it is.
> [...] It’s never been easier to learn about new topics, with tools like ChatGPT that can answer any questions you have. But that only works when you know what questions to ask. My course offers a curated curriculum that will introduce you to all sorts of new techniques. I think you’ll be amazed at what you can build after taking the course.
Okay, sure. I ask these LLMs things too (c.f. outright --be coding) so that’s not necessarily incongruent with the stance of being not-happy-about-this.
Seemingly every AI pilled programmer who writes a blog post on AI's impact on software engineering has the same philosophical argument, and it's wording changes slightly every 6-12 months to reflect the newest models capabilities.
In 2023 it was: "AI is just autocomplete. It can't code whole blocks on it's own."
In 2024 it was: "AI is only good for scaffolding new projects, or boiler plate code. It can't write the application whole sale."
Since November 2025 it's been: "AI is only writing the code for us. It can't manage architecture, or do the long term planning required for real world applications."
In 6-12 months when the AI is doing an increasing amount of the architecture and high level planning, what will AI pilled programmers fall back on then?
So while the author's points are completely true and valid, an executive will say "True, but Claude will get smarter faster than these problems and in 3 years it'll fix everything" and there's absolutely nothing you can say or do in response to this.
[deleted]
[dead]
[dead]
[dead]
[dead]
[dead]
[dead]
[dead]
[dead]
[dead]
Maybe not the same agency you would expect from a human being, but if you put them in a ralph loop they can go far, far away, and mostly because on how we build our world in the pre-llm era: do you need to order something (or you want to hire a hitman)? -> you can go do it on a web site or via whatsapp or by calling some API.
The point is they mostly wind up somewhere stupid, and it takes expertise to spot and correct that. (Maybe that changes with further development.)
Where I’m at when building personal applications for my home / life is: does the code execute and perform the desired task?
If so, what do I care how shitty it is? I’m not publishing these projects (for the most part… I have one joke application up at songshift.reachnick.co) so efficient, clean, secure code are not really a priority for me.
[deleted]
That is to say - it's entirely possible to have a design that a layperson looks at and goes "wow that's beautiful", and then A/B test it in the real world and your revenue goes down X% because (for example) certain important sections now require more clicks to access.
Or to use a real-world example - you could redesign a train station and make it more beautiful while also increasing the amount of people who get lost because it's now more difficult for some people with poor eyesight to find the right track.
https://en.wikipedia.org/wiki/Michael_Crichton#:~:text=%5B14...
People are never perfectly even in intelligence across all possible disciplines.
Gell-Mann's observation was a sincere and thoughtful caution about the way we transmit information about complicated ideas. Crichton's "amnesia effect" is an excuse to ignore media you dislike.
You're suggesting that (a) their UI skills are lacking (based on what? isn't UI exactly what they were iterating on and trying to improve?), and (b) that a real UI expert would've somehow felt the UI they were working on was consistently garbage, despite how many times they iterate on it?
Which means you're saying you don't believe anyone can actually produce high quality (to an expert) output with AI on the same target they're working on, and if they think they are, that just means they don't have a good sense of quality?
the llm produced something the operator thought was garbage for the design too, and the operator iterated it from garbage to good.
they could also have the llm iterate the underlying code from garbage to good, if they wanted.
most likely a specialist would say its neither good nor bad, since its not considering the right things, and hasnt collected the right useability feedback, but making straightforward designs isnt that hard, and counting clicks and interactions, and avoiding hidden functionality is all measureable stuff
AI pixel art looks particularly bad because most users don’t even go through the effort of downscaling and then upscaling it using something as simple as nearest-neighbor scaling, which by itself will squash out a lot of high-frequency noise that manifests in the form of terrible looking "fringing". Proper grid alignment also makes a big difference. It’s not perfect by a long shot, but it helps.
findfantasyxviii.com
Well when you put it that way ... monetizing the Dunning-Kruger effect does actually sound like a very good business idea.
[deleted]
I think that it is extremely tempting to just let Claude run freely over the codebase and turn it into unmaintainable slop. My hot take is that this is fine.
It doesn't take very long to come up with a list of simple yes-or-no style rules (e.g. modules should be descoped to simple files, each module must have tests at filename_test, a reasonable reader should find comments to be concise and not extraneous, etc). It doesn't take very long to set up a precommit hook that starts a short Claude session to check each rule, block if any are failed, and explain the issues.
After that? I've found that it's pretty easy to get good code written, and even easier to maintain it. Obviously anything with a value judgement is an avenue for issues, but even a non-frontier model can generally do a passable job answering questions. As long as you eventually read the code to decide on big-picture refactorings, you'll be in a great place.
basically the AI-slop version of food, yet still they thrive
There is an enormous untapped market for crappy low-effort apps which previously weren't worth the time - but with the effort so low put together a simple dashboard or one-off tool it becomes much more attractive.
First of all internal tools just prove my point. Second can't wait to hear a story of a health care production database blown away because someone was playing with generated tools that "pokes the right things".
We NEED end user experiences that don't suck, and don't keep getting shittier like now, not being able to use a cli for internal tools, it's a skill issue, not everythint needs a shitty ui that taps into an os system calls and blows away as soon as the cli responds with something unexpected
For example, the testing tool I built explicitly doesn't work in production environment. That was part of my design spec and I manually verified the code and behavior.
Which is probably why so many random buttons in microsoft/apple/spotify just stop working once you get off the beaten path or load the app in some state which is slightly off base
The number of edge cases in a software is not fixed at all. One of the largest markers of competence in software development is being able to keep them at minimum, and LLMs tend to make that number higher than humanely possible.
The people pushing AI _over_ humans never thought they were. They just don't care about 'good' or 'bad', only 'time-to-market'. A bad app making money is better than a good one that isn't deployed yet. And who cares about anything past the end of the quarter? That's the next guy's problem.
In terms of "junior dev following" it would be the model trying to think and write it as a Senior or Staff Level engineer would.
Humorously, this could be the result of LLMs vacuuming up all the sentiment on the web that the code that LLMs produce is trash-tier.
Maybe a more idealized training set could improve things, but at least for today’s SOTA, you have to get the shitty first draft out and then improve it.
Harnessing makes a difference, but it’s only shuffling around when and where the tokens get generated. It can trade being slower by doing a hidden first draft and only showing the output after doing a self review. But the models still need to generate it all explicitly.
[deleted]
I did an experiment on this a few weekends ago and Codex for example was a lot more adversarial and thorough in its review when given Claude-authored code compared to when given the same code with "I wrote this, can you review it?"
so... perl?
Granted, the load bearing thing here is whether we’re actually getting good at rebuilding up to any sort of standard of quality. Or if the tooling is even structurally capable of doing that rather than just introducing new baskets of problems with each build.
I love the Iron lung reference. Perfect.
It's tempting to move out a layer and try making prompts and plan.md the "source code", and then the generated actual-source-code becomes just another ephemeral form of "intermediate representation" in the toolchain while building the final executable product. But then how are you versioning the toolchain and maintaining any reasonable sense of "stability" (in terms of features/bugs/etc) in the final output?
Example: last week, someone ran our "LLM inputs" source code through AgentCo SuperModel-7-39b, and produced a product output that users loved and it seemed to work well. Next week, management asks for a new feature. The "developer" adds the new feature to the prompting with a few trial iterations, but the resulting new product now has 339 new subtle bugs in areas that were working fine in last week's build owing the fact that, in the meantime, AgentCo has tweaked some weights in SuperModel-7-39b under the hood because of some concern about CSAM results or whatever and this had subtle unrelated effects. Or better yet: next month, management has learned that OtherCo MegaModel-42.7c seems to be the new hotness and tells everyone to switch models. Re-building from our "source" with the new model fixes 72 known bugs filed by users, fixes another 337 bugs nobody had even noticed yet, and causes 111 new bugs to be created that are yet-unknown.
If you treat the output source code as a write-only messy artifact, and you don't have stable, repeatable models, and don't treat model updates/changes as carefully as switching compiler vendors and build environments, this kind of methodology can only lead to chaos.
And don't even get me started on the parallel excuses of "Your specifications should be more-perfect" (perfection is impossible), or "An expansive testsuite should catch and correct all new bugs" (also impossible. testing is only as good as the imperfect specification, and then layers in its own finite capabilities to boot).
I never tried spec driven development for myself, but if I review other's MRs I am typically exhausted after the first 10 lines.
And there are hundreds of lines, nearly always with major inaccuracies.
For myself I always found the plan mode to work well. Once the implementation is done, the code is the source of truth. If it works, it works.
When I want to add more functionality or change it, I just tell the agent what I want changed.
I doubt walls of semi-accurate existing specs are going to be beneficial there, but maybe my work differs from yours.
I mitigate this by few things: 1. Checkpoints every few days to thoroughly review and flag issues. Asking the LLM to impersonate (Linus Torvalds is my favorite) yields different results. 2. Frequent refactors. LLMs don't get discouraged from throwing things out like humans do. So I ask for a refactor when enough stuff accumulates. 3. Use verbose, typed languages. C# on the backend, TypeScript on the frontend.
Does it produce quality code? Locally yes, architecturally I don't know - it works so far, I guess. Anyway, my alternative is not to make this software I'm writing better but not making it at all for the lack of time, so even if it's subpar it still brings business value.
Respectfully, I asked first. ;)
> before we realize its a dead end (at best)
You've declared the future, which doesn't leave much room for a conversation. So, cheers!
This is talk and talk is cheap. Prove it, otherwise it's still a million dollar question... unanswered.
HN is notoriously mentally deficient when it comes to AI. They were wrong about self driving cars (I sit in AI cars daily), they were wrong about AI getting used for coding (I don't use an IDE or type code anymore as a SWE). So I have to say unless there's something evidence based or substantial here it's likely given HN track record that most people here will end up being another wrong, baseless and over confident answer.
I'm looking for legit answers not confidently biased statements with no evidence.
only in the bubble you live in
> So I have to say unless there's something evidence based or substantial here it's likely given HN track record that most people here will end up being another wrong, baseless and over confident answer.
Again only in the bubble you live in. You want evidence? There are zero killer app products out there. Not even one, the ones that seems impressive like bun zig to rust, it's just an experiment and will see how bad it will unravel.
And the existing products? They get shittier day by day at rate not seen before.
Or are u building one via codex inside the ai car while u eat a burrito?
> Iron Man created Jarvis whose capabilities are way beyond any models in the near future. So it wasn’t an Iron Man moment.
Like an LLM, you misunderstood the context. The voyeuristic experience doesn't require fiction to be reality.
This does not match my experience. I do a lot of AI-assisted coding at this point, and what I've seen is that when the AI is asked to extend or modify existing code, it does a much better job on clean, well-structured and well-abstracted code.
I think the reason is simple, and tracks for humans as well: well-structured code is simply easier to understand and reason about, and takes a smaller amount of working-set memory. Even as LLMs get better with coding, I expect that they would converge on the same conclusion, namely that good structure + good abstractions make for code that is more efficient to work with.
I think it’s all about the structure you use to work in and how you use the model. We are shipping better, more human friendly code, with less bugs, then we ever did before and doing it at 1/10 the cost before LLMs.
But we are definitely not vibe coding, and the key seems to be devs with years of experience managing teams, managing the LLM instead. Basically you create the same kind of formal specifications, conventions, and documentation that you would develop for a project with two or three teams, then use that to keep the project on the rails recursively looping back through the docs as you go along. I’ve only had to back out of a couple of issues over the last year, and even though that cost a couple of hours, it was still extremely cheap.
Meanwhile we are shipping at 4x speed with 1/4 the labor, and the code is better than it was because the “overhead” of writing maintainable, self documented code has inverted into the secret ingredient to shipping bug free code at unprecedented speed.
If you just explain the standards to which you want the code written, use a strict style guide, have a separate process that ensures test coverage (not in the same context) you can get example quality code all the way through. Turns out that’s also in the training data.
So yeah, it’s imminent. Let’s see how demand shifts in response in the future.
I think we eventually end up at the tool approach via vendors providing the tools to other companies, but it still feels like there's a long road ahead to get there.
That's not true. The LLM performance will degrade as the codebase gets messier as well. You get to a point where every fix breaks something else and you can't really make forward progress.
Yes, you might be able to get a bit further with a messy codebase just because the LLM won't complain and will just grind through fixing things, but eventually it will just start disabling failing tests instead of actually fixing things.
Of course that just leads to: what’s the best way to achieve that goal? Through elegant code or adding lots of tests? Which is a debate from long before LLMs existed.
LLMs have a limit to how deep they can understand and refactor architectural issues.
That limit is far, far lower than a human's.
This is how societies become shittier. People who are ostensibly responsible for doing their jobs not giving a damn about quality.
1. Because the experience of interacting with AI is miserable. I like writing code. I don't like finding the magic incantation that gets the machine to write the correct code. I don't like correcting the machine when it gets things wrong. I don't like any of this, it's awful and I would never have gone into this field if someone had told me that it would be like this one day.
2. I cannot condone the means by which these tools were created, which is, as far as I am concerned, theft. I think it's unethical to use them at all, because they were created unethically. I dislike using stolen work, I think it's wrong, and I think everyone who uses it is making the world worse and normalizing theft. If continuing in my career means that I have to compromise my ethics, I wouldn't do it even if I loved this stuff, and see point (1).
3. Is anyone going to pay me more for my "more valuable" skills? Doesn't seem like it, engineering salaries on the whole are going down right now. You can believe they'll go up eventually if you like, but there's no evidence that will happen, or that it's happening. If my employer captures all the value, why should I care whether I'm creating more of it?
I'm your exact opposite.
I've felt like code is 1960's punch card tech my entire career. I've always wanted to do more.
So much of coding is plumbing. Or paying attention to tiny little details. Or hunting down stupid bugs. Or changing requirements and refactoring. That shit sucks. All of it.
I've never had so much fun with software. It's starting to feel like magic. And because we possess deep understanding, we are uniquely positioned to take advantage of this.
The AST is not the objective. The finished product is. Our DNA is by all accounts filled with garbage. Let your feelings about code purity and sanctity go. It's the job to be done that matters.
Code is not holy. In 100 years people will look at our ephemeral artifacts as silly little things. Treat it that way today. Means to an end.
Perfection is achieved when there is nothing left to take away.
> In 100 years people will look at our ephemeral artifacts as silly little things
Whereas they'll totally admire the hamster wheels in which people shoveled product? Well, I don't care either way. Craftsmanship and care have their own rewards, and shape the person engaging in them for the better.
But using the DNA example- perfect shouldn’t be the enemy of good. Our bodies are far from perfect but they’re functional and effective. If the biological imperative was perfect genomes and not functional genomes, there would be no life at all.
I’m not a developer, I’m a consumer of digital products. I couldn’t care less, or even have the ability to notice, if code is perfect. I’m here to achieve a goal through software. If it achieves that goal, what is the problem from my end?
That's like, the entire point and the entire reason any of it works with any sense of reliability. Did they not do the "tell me how to make a sandwich" gag to show why thinking about the details matters? Ignoring them is how you end up with borderline unusable applications slower than they were with 10 fewer years of hardware development. I guess I shouldn't be surprised.
No offense, but this sounds like you just don't like anything about writing code and you don't have any LLM superpowers, because those are the technical skills that make you good at being a software engineer regardless of whether you're using an agent.
> Code is not holy. In 100 years people will look at our ephemeral artifacts as silly little things. Treat it that way today. Means to an end.
I don't give a shit about code as an artifact. Writing code to solve problems is fun. Prompting an AI to solve problems makes me want to eat a gun. That's a real difference and it's not something I can just change about myself.
Tedium absolutely exists in coding. And is usually a sign of bad interfaces and/or architecture.
For most of us it wasn’t really about getting the user to do X. It’s getting the user to do X at 1/10th of the price, 10x the speed, and the user is left absolutely amazed.
Magic is for the user to experience. Not for the user of the programming language.
But that's not what senior executives think. That's all that matters. If they think that AI can replace engineers, so let it be. I mean, since when senior executives know shit about what quality means? They only care about revenue and profit. So yeah, you're right, but that's not gonna happen (sadly)
One, I think the talk about AI replacing developers is tripe. We’re still correcting the post-Covid hiring binge.
Two, even if that level is breached, I’d consider your skillset more broadly than what you can literally do right now. Organizing people and technical systems is hard. And the article highlights how that doesn’t seem to be something AI is focused on improving on right now. (Would take larger context windows. Which would make inference more expensive.)
I don't intend this to read as pure snark, but someone's abstract value isn't much good to them if the job market itself can't / won't recognize it.
I posted something here not long ago re. There is a huge amount of craftsmanship in between a great idea and great product. And it received a lot of downvotes.
LLM’s don’t change this in reality. That’s why we haven’t seen an explosion of high value products. Creating a product in the first place that create value is VERY, VERY HARD! The arrogance of people to dismiss it because of the existence llm’s is hilarious.
And it's not just the product design either - the number of steps it takes to get a piece of software in front of other people, correctly deployed, with backups and analytics and monitoring and a domain name...
Even with all the LLM help in the world you still need to know a whole lot about how web software works to pull that off.
So, a nonfunctional project is created by AI and AI is used to attests its nonfunctionality.
What a brave new world.
AI creates a delusional product, people don't trust their own opinion regarding it and follow it, another AI is needed to prove that the product is unreal.
In the loop.
I do believe that is a running theme in the Iron Man comics and movies.
I'm not sure. It's a reducer on time to refine and adopt skills, if you're interested. I know that's somewhat pedantic, but there really is a skill multiplier if that's what you want out of it.
I'm far, far better at using AWS than I was after years of using it (for better or worse). I use the command line more effectively than ever. These are real skills that have come to me as a result of that reduction of time to get the answers I wanted. This applies to all kinds of things in my work, and it's been quite liberating to have this incredible tightening of temporal scales in order to get where I'd like to be, resulting in actual differences in my own outputs and capabilities.
Still, I agree that time really is a key facet. I could have found this information before, too. It just took so long to do it.
I wanted to run a stupid little webserver on one of my raspis, so I just asked Gemini to write my the code, along with a bash script to set up the proper configs to run as a systemd service. Nothing I couldn't basically do in my sleep, but it takes some amount of time and focus to do.
In the time it took me to write this comment, it wrote exactly what I needed. Not that impressive in isolation, but now I've been doing all the little home automation things I never had the energy for with other responsibilities.
Fewer developers required to achieve the same things means a lot of people are going to be unemployed
It also means that the people who remain will likely be paid less. Why would you pay a senior salary when you could pay a junior salary plus AI subscription and get "the same result"?
I think Software Devs are in for a rough time. I've been doing this for 15 years now, and I'm not looking forward to it. I'm honestly thinking about re-skilling to a different industry. Even if it pays less, it's probably worth it to sit out this shitshow.
I don’t think that's a foregone conclusion. Every company I’ve worked for has had a huge list of tasks we'd do if we had more engineering resources. There's never been a shortage of worthwhile things we could do, it's always been ruthless prioritization to find the 10% of tasks that are the most important.
Look up Jevons' Paradox. This is a thing that has happened a bunch of times before.
It's the same thing that happened to every other skilled profession that was automated in the past. That's why unions became a thing and they started busting heads until their employers paid them more.
Edit: the only way I can see software developer salaries staying the same is if the amount of work available for them expands dramatically.
Hypothetically if half of software developers are laid off and replaced by AI, there will need to be twice as many software development jobs in order for salaries to not go down
Also keep in mind that even if salaries remain flat, inflation means you're making less.
Plastering walls use to be a great paying skilled job, and when drywall came out and everyone thought that meant less time making boring flat walls and more time doing fancy plasterwork in corners and edges. But the fancy corners and edgings disappeared, it took too long compared to the rest of the wall plane and people who did it still wanted decent pay for maintaining or building that skill. And even for plain drywallers, productuon demands went up while wages stagnated. And now these days most drywall is seamed like trash and most guys doing it are desperate and/or addicts. The only thing that earns money now in drywall and plaster is meth head production speed and a lack of complaints about the work.
[Mom]: We have Terry Davis at home.
[znnajdla]: In fact I am actually tempted to build my own OS — something I wouldnt have dared to think of before AI.
I’m not seeing this. And based on what we’re seeing at the university level, I’m not expecting to.
(The preliminary research so far supports this: using AI to do the hard assignments produces poor learning outcomes, but using AI as a tutor, or even just for help with the hard assignments, produces slightly better learning outcomes.)
I think what you're seeing is the effect of the incentives of the system. The system uses simplistic numbers like grades as proxies for actual learning, and these grades heavily influence students' job prospects, and so you're simply seeing Goodhart's Law in action. Given how easy current methods of skill assessment are to game with AI, my guess is the entire system has to be overhauled.
Source? The few people I’ve seen try to do this wind up with a terrible understanding of the material, with large knowledge gaps and one or two fundamental fuckups. In every case, an introductory textbook would have been better. (It would also have been harder.)
https://www.mdpi.com/2076-3417/14/10/4115 -- probably the earliest one of its kind, finds over-reliance degrades critical skills but supplementary use is mostly harmless.
https://arxiv.org/html/2601.20245v2 -- Anthropic's study, same as above except supplementary use (like clarifying concepts) can actually be beneficial.
https://scale.stanford.edu/ai/repository/ai-meets-classroom-... -- "Students who use LLMs as personal tutors by conversing about the topic and asking for explanations benefit from usage. However, learning is impaired for students who excessively rely on LLMs to solve practice exercises for them and thus do not invest sufficient own mental effort." Interestingly, they found simply disabling copy-paste on the chatbot interface resulted in better outcomes!
Beyond coding, I recently came across this new meta-study; largely positive findings (which it admits may be biased) but does highlight evidence of negative effects of over-reliance: https://www.sciencedirect.com/science/article/pii/S2666920X2...
(Multiple studies find that the outcome depends on how AI is used. Surprisingly, incorrect guidance / unreliability / hallucinations appear to be a bigger problem than over-reliance! That could also explain poor performance in some cases.)
My intuition, supported by these studies, is that as long as students are willing to do the hard cognitive work -- for which there is no substitute, really -- having LLM assistance is a boon. Which makes sense, it's comparable to having a tutor explain difficult concepts. This is why in my mind the real problem is that the incentives to use AI as a crutch are just too strong.
The analogy is unlimited typing in Gmail won’t make you a better writer or typesetter on its own.
I've seen this work well at a job when there's a feedback loop for juniors that incentivized them to learn with more scope and compensation
This is key, I think, and gets overshadowed by people being offended by seeing bad vibecode or claims of 10x speeds, etc.
The most important learning that happens is not when we ask and get the answer to our question right away. It's when we stretch ourselves to seek out the answer, fail a few times, think deeply, then perhaps after a nap, solve the problem. That kind of knowledge is priceless because it not only gets you an answer it gets you some errant paths you can use to avoid problems in future problem solving as well as getting you increased trust in your own thinking.
If the next generations skip this step, they'll always think answers are supposed to be easy to find and will find themselves more and more dependent on AI and less and less confident in their own brains.
This seems like a very polite way of saying they will become less intelligent and less capable
You don't learn by reading, you learn by doing.
In this case, simply reading the output of an LLM isn't going to substantially educate you.
Classy.
> if you think reading code isn’t worth it.
I didn't say that.
With anything you learn, sure, you need to read it, but you haven't actually learned it until you try to do it.
As millions of teenagers find out in high school, it is not possible to "learn" trigonometry or calculus by reading the problems; they actually have to drill problems to pass.
> Do you think novelists just write novels from nothing? They read books.
Excellent example! Even with novelists, and professional authors, they only get better by writing. Face it - millions of people read just as much as (even more) than best selling authors, and yet those millions are unable to produce anything of note.
> When was the last time you read the code for the best open source software in your industry?
All the time; how else would I know that simply reading is not sufficient to learn something?
I'm surprised that this point is even in contention; it is almost common knowledge that you can't learn from reading alone; it's the practice that results in learning, not the reading.
If anything it allows to be as lazy as possible. I have not seen anyone digging deeper with the AI tools.
This is a testable hypotheses with severe lack of citations. Intuition would argue the opposite. We learn by using our brains, if we offload the thinking to a machine and copy their output we don‘t learn. A child does not learn multiplication by using a calculator, and a language learner will not learn a new language by machine translating every sentence. In both cases all they’ve learnt is using a tool to do what they skipped learning.
1. AI is for cheating and doing the work for you. Obviously it won't help you learn faster because you won't have to do any thinking at all.
2. AI is an always-available question answering machine. It's like having a teaching assistant who you can ask about anything at any time. This means you can greatly accelerate the process of learning new things.
I'm in team 2, but given how many people are in team 1 (and may not even acknowledge team 2 as even being a possibility) I suspect there may be some core values or different-types-of-people factors at play here.
But even with category 2. I think that still does not absolve AI as a cheating machine. Doing research is a skill and if you ask AI to do the research for you that is a skill a junior developer simply never learns.
"The expertise reversal effect is present when instructional assistance leads to increased learning gains in novices, but decreased learning gains in experts."
There's a whole lot of depth to the question of how AI tools support or atrophy learning for different levels of expertise.
There is even preliminary research evidence for this, e.g. https://www.mdpi.com/2076-3417/14/10/4115 and https://www.sciencedirect.com/science/article/pii/S2666920X2...
For such a person, I believe AI can be very empowering for learning. Like Google, wikipedia and stack overflow, Arxiv before it - AI tools give access to a lot of information. It allows to quickly dig deep into any topic you can imagine. And yes, the quality is variable - so one needs to find ways to filter and synthesize from imperfect info. But that was also the case before. Furthermore AI tools can be used to find holes in arguments or a paper. And by coding one can use it to test out things in practice. These are also powerful (albeit imperfect) learning tools. But they will not apply themselves.
Companies with AI will move faster than those without.
AI itself could subsume what we collectively consider as Engineering Taste.
AI is faster at what it does. So even if a junior costs less on his own than AI. Paying extra for AI means gaining first mover advantage.
Only if AI feeds on more taste than garbage.
This is a contradictory statement imo.
Digging deep still takes the same amount of time it used to. AI accelerates the surface level (badly, tbh), it doesn't accelerate digging deep. Becoming an expert still takes time and effort, there really aren't shortcuts.
To torture the Iron Man metaphor a bit. If you're not an expert without the AI, then you're not an expert with it.
But, to get the best out of them, you really have to consider what that means in the small. They are predicting, to a first order of approximation, what you want or expect them to answer.
Its response to your first prompt is hilarious, because the LLM completely misunderstood you and based its prediction on what it thought you wrote. Its response to your second prompt further cements that its goal is to predict what you want or expect to see.
It's also well known that LLMs are prone to hallucinations. One of the biggest triggers for hallucinations is when the LLM's interpretation of your expectations doesn't match reality.
Because the LLM will try to make reality match what it perceives to be your expectations.
One of the best ways to reduce hallucinations is to work hard to remove any assertions from your prompts.
For example "Isn’t it crazy that X is better than Y." contains an explicit assertion. The LLM misunderstood the direction of the assertion, but certainly understood that an assertion was there, and so it gave you reasons why reality matched its understanding of your assertion.
When you clarified the assertion, it switched, and again gave you reasons why reality matches its understanding of your assertion.
Lawyers often get into trouble for made-up citations. "Claude, find me case law that shows X" is a recipe for disaster. Instead "Claude, what is the case law on X?" is probably a better starting point.
On the getting work done front, if what you’re trying to do is remotely subjective, than you really need to be sure you’re asking it the right thing and not expecting it to correct you and provide capital T truth.
On the social element. Like wow using it as a therapist or a mentor or to bounce ideas off of. What a huge trap if you expect it to correct you with a semblence of objective reality.
Yes, as a few cases have shown, people can go off the deep end, and the LLM goes right there along with them.
The LLM has no understanding of objective reality. It's even worse than any one of the blind men trying to describe an elephant, because it has no true experience of either the thing, or the other thing it is trying to compare it to.
For a looooonnnnngggg time, unless there's massive progress in AI research.
Fundamentally, next token prediction is limited. Granted, I'm pretty amazed at how well it's done, but if you can't activate the right parts of the models (with your prompts), then you're not going to get good results.
And to be fair, for lots of things this doesn't matter. Steve in Finance or Mindy in Marketing can create dashboards that actually help them, and the code quality mostly doesn't matter.
For stuff that needs to be shipped, monitored and maintained you still need to know what you're doing.
To me, I don't see how this will ever not be an advantage. All software requires constraints. Some of those constraints might be objective (scale, performance, etc.) but a lot of them are subjective and require active decision making (architecture, UI, readability).
So if there was only one way to do something or only one desired output, then yes I think models would surpass humans. But like art, I don't think there is a objective truth to software and because of that, humans get the opportunity to play an important role.
Now whether that is valued from a business/industry perspective is a question that I think we all know the answer to unfortunately.
The question that really matters is whether that will continue to be the case. My guess is that technical expertise matters less over time, and the ability to specify the desired outcome is eventually the only thing that becomes important. But I could be wrong! The direction this all goes is pretty fuzzy in my mind.
They're full of noise and distractions. They offer no ergonomics, no proper screens, no nothing.
Anything that happens or doesn't happen there is mostly irrelevant to relevant software at large.
[deleted]
(I think I'm reading this the right way but if not feel free to correct).
In a word: pain.
Until there's a legitimate threat to their well-being (emotional, psychological, or financial), the lessons won't be truly "learned." Until you know the true cost of a decision, you're flying high.
Older engineers have dealt with this organically so it's kind of encoded into their DNA. The very reason certain things aren't done (or a certain way) is because that pain has already been felt/encouraged learning a better way.
E.g ‘productivity’ is seemingly increasing but what is the effect on a firms financial position? It’s all speculative and experimental right now.
It would be just as unwise to ignore the progression of LLM agents as it would be to over-index on them.
This is the correct way to code with AI. If you don't understand the code, we're not yet in a point where the model can do it all, well it can, but where you can confidently move forward knowing its been thoroughly built and reviewed by a model up to par. Some day maybe, but not currently.
What is the llm equivalent?
The human brain has a wide parallel multisystem real-time low-wattage execution layer that has way more modes than a large language model.
More importantly, because our brains are real-time, our qualia plus spatial and visual reasoning is superior to an LLM at understanding "elegance", "code smells", and overall system design because we can imagine ourselves as being the code or the system and we don't necessarily need to think in language. Well, at least that's how I experience coding in my mind; I imagine other developers similarly bring large parts of themselves into coding.
Feeling the code seems to be much more efficient at reducing complexity than any static analysis I've yet seen.
Finally, humans also empathize with other humans who have all the money. We know what works and doesn't work for humans in the here and now, not 2 years ago when the model training data was last collected. The value of Qualia is not to be discounted.
That being said, Sonnet 4.0 was the best model I've used that could express how the code felt, so who knows. If the emotionality wasn't tamped down, and the spatial reasoning improved, and the new algorithms for context engineering and parallelism make it to market, these advantages can be erased.
As I responded to another commenter, as a prediction engine, the LLM is trying to predict what you want. It, at one level, correctly predicts that you want tests to pass.
Maybe try telling the LLM that you're a verification engineer, and you get bonuses for finding bugs?
Think about it. All those security researchers wouldn't be finding real bugs in real programs using LLMs if this were an insurmountable problem.
I admit the analogies aren't perfect, but the analogies are mostly used to help explain the empirical stuff I’m seeing in the real world. Are you seeing something different?
OK so, _realistically_, what can you do that will make any meaningful difference?
It's essentially a "brute force" approach, but in most cases, they only need to succeed once.
The article’s point is this is not true. They wind up in bullshit attractors where they hit a wall and then get lost within their muddled context window.
> they only need to succeed once
Yet they don’t. Not on their own. Like, you haven’t had an LLM get stuck in a stupid loop where you point out the flaw and then it gets unstuck?
The issue before is that coding is not only difficult and time-consuming to learn, but also that I think it requires a particular type of person to fully grasp this new, non-human language.
I see these SOTA LLMs as akin to the digital camera revolution. Suddenly the moat that has kept people from participating in this art form (for film it was the high cost of film stock, processing film, editing the film prior to non-linear editing programs, etc) has disappeared.
Are people producing low-quality video content now because of the cheap and ubiquitous access? Of course, but we’re also exposed to brilliant filmmakers / artists who simply never would have had the opportunity to try their hand.
By the same token, sure there’s lots of garbage code out there now. But it’s also unlocking imaginations by granting access to the mysterious inner workings of a computer to the average person, letting them use their computers more thoroughly than ever before.
I find it exciting. Bummer for the highly-paid SWEs, but such is life. You can only protect a niche to demand high wages for so long.
For example, you can make AI music, but who will listen? If you form an AI band around AI music and execute an AI marketing strategy like it was a real band, probably thousands, hopefully millions.
If you make AI art, who’s going to look at it? If you make AI art in a very specific style and you can crank out 8x upscaled high resolution versions of it for print, well, you just have a business!
And if you make film, you already know, green screen and chroma key models are far superior, that AI enhancements can help you in the editing room, and that LTX2.3 can fill in the VFX shots when the budget is exhausted.
It's only confusing because you don't know the field. Which is kind of the point.
Tell me about it… I was forced to use a program called Farmer’s Wife for a time. What a fucking nightmare of a UX.
[deleted]
It's not the default, because the training data is full of unmaintainable code done wrong with mistakes. People literally complain that LLMs write too many tests or add comments.
If instead of "do it right", you give it specific actionable advice of how to right code, it does surprisingly well. Newer frontier models also do a great job of mimicking the style and rigor of the surrounding codebase without prompting, if you're working in an established codebase, for better or worse.
You never wrote quick exploratory code? One off scripts? How is the Ai suppsed to know unless you tell it.
If you tell another person to write some code, how are they suppsed to know? If you have your boss come to you and ask you to write some code to do some data analysis are you going to spend weeks writing units tests and perfect abstractions? Or do it quick and get the data and result?
For example, I built up a programming language from scratch with Claude, it knows nuances about my languages syntax, and can write code in my language effectively. I did it mostly as a test. It definitely helped that my language is heavily mostly Python based.
[dead]
I've been using LLMs daily and I spun up a few spec driven flows once or twice but like the person above I think the code is the source of truth.
Also why wouldn't you use TDD to enforce the 'spec' then?
The issue is a lot of people form definitive conclusions from blurry data. I'm challenging that type of bias. For example: How the hell do we know LLMs produce code that LLMs can't maintain? Like did you actually try it? And what about the instances where it worked? I don't think the answer is as clean/cut as yes/no. Even if we had data, most likely the data will be contradictory data in the sense that some AI projects worked, some descended into slop.
Oh you mean the USA bubble? sorry i didn't consider somalia or wherever backwater place you live.
>Again only in the bubble you live in. You want evidence? There are zero killer app products out there. Not even one, the ones that seems impressive like bun zig to rust, it's just an experiment and will see how bad it will unravel.
It's called claude code and bun and practically every company building a SAAS app. that's evidence.
>And the existing products? They get shittier day by day at rate not seen before.
That's your opinion. Obviously people who make more money than you and are smarter than you and are more powerful than you think otherwise.
>Or are u building one via codex inside the ai car while u eat a burrito?
When you insult someone like this. It's a sign that you got nothing substantial to say. It's a cowardly move and you're a coward. I'm not trying to insult you here. Just stating a fact plain and simple fact: You're a little coward. You're afraid to face reality.
You're truly a little coward, and this is not an insult, I'm just objectively stating who you are. You're afraid to face the fact that AI is on a trajectory to be a better programmer than you.
When we call out bloat in software, it's not because we don't know what the code is doing, but the opposite, because we know it better than the author. To compare that to life is kinda as far off the mark as possible.
> perfect shouldn’t be the enemy of good.
No, but when good gets drunk on sheer money and wants to take even the idea of perfection, of craftsmanship, of correctness and care out back, then maybe "perfection" should clap back every now and then.
> If the biological imperative was perfect genomes and not functional genomes, there would be no life at all.
It's not an imperative, it's a fact. The laws of physics are kinda strict that way, a puddle always settles at the lowest point. It doesn't settle at some "good enough" point. From what we know so far that never happens, even once, in this gigantic universe across these unfathomable timespans.
Why aren't you simply copy pasting your comment and repeating everything 5 times instead of saying it once? We all live by this, to some degree, we all know this intuitively to be true. We can be wasteful on purpose, out of whimsy or depression or whatever, but we're never confused about whether it's better to drink from a glass, where 300ml water in the glass mean 300ml water in the stomach, or drinking from a sieve, which means 99% of the water just goes to waste.
The fact that such simple things are even discussed to me prove the presence of "bad" factors. Of coercion, greed, fear, confusion, whatever, you name it. It gives me ants on top of a grass blade waiting for a cow because a fungus told it to vibes, when it comes to the "industry".
> I’m not a developer, I’m a consumer of digital products. I couldn’t care less, or even have the ability to notice, if code is perfect. I’m here to achieve a goal through software. If it achieves that goal, what is the problem from my end?
None IMO, because you're unlikely to shove your thing into everybody's life, or telling people to just give up and give in because it's all the same etc. At worst you'd be wasting your own electricity or holding your own data hostage.
Perfection in glue and plumbing?
That's what 99% of software is. Even active-active distributed systems are glue and exist only to bridge ephemeral infrastructure. Everything will eventually be thrown out and rewritten.
Nobody lauds the half-century old banking code written in COBOL. They want it ripped out and replaced.
Nothing is "perfect". Not even close.
> "the sand doesn't matter, only the beach does"? Makes no sense.
The code isn't the sand, it's the sandcastle.
> > In 100 years people will look at our ephemeral artifacts as silly little things
> Whereas they'll totally admire the hamster wheels in which people shoveled product?
They'll hear about "You Tube" and "Face Book", I'm sure. But none of the code that runs either of those things will likely be running or capable of running.
> Nobody lauds the half-century old banking code written in COBOL. They want it ripped out and replaced.
So? That's true for actual plumbing, too. That changes nothing about the fundamental fact that a pipe that achieves a specific thing with N gram of materials and N meters of pipe length is better than something that is 100x times heavier and goes around the block several times just because why not.
> The code isn't the sand, it's the sandcastle.
Same difference. It makes just as little sense to say "the sand the sandcastle is made of doesn't matter, only the sandcastle does."
> But none of the code that runs either of those things will likely be running or capable of running.
Obviously. Who claimed otherwise?
If this is how well you write prose, I would absolutely hate it if you stopped writing code.
Joke aside, I read your comment and wanted to yell “PREACH!” Pretty sure that’s the first time I felt like I had a use for that word.
I made over $500k TC writing active-active high availability services that moved billions of dollars a day. I've been around the block.
> Magic is for the user to experience. Not for the user of the programming language.
Why are you treating our primitive technology as holy? It's all temporary fucking garbage that is a limitation of our current civilizational abilities.
Do you think the Linux kernel will live forever? I think we'll be done with it before 2050. Seriously.
Everything you think is permanent is just temporary.
I would rather be building star ships and holodecks and engineering 10,000 year human lifespans, brain uploads, and stuff like that than worrying about the craftsmanship of some dumb web service.
I think you should dream more and worry about the current station of SWEs less. We're merely a stepping stone.
You and I are stepping stones. We're dust.
None of what we do today will be relevant in some short decades. And that is a blip on the geologic timescales.
I was born too early for this bullshit. I don't like living with you neanderthals, especially when you don't want to step out of the cave.
Thankfully I don't have to worry about this tech winning. It already is. You can keep up or hold your nose until you're out of a job. There are plenty of other things you could do, I just wouldn't bet on being a truck driver.
What skills people think are valuable and prized, and what skills corporations are willing to pay for, are often very different.
However, the happy path is that as more code is written, more revenue streams will be developed and more sr engineers will be needed and hired to manage those ballooning codebases. That will be a gradual process of growing the pie though.
Are you a researcher in the pedagogical sciences? Regardless, you have to admit that the original claim has very little evidence behind it despite being testable. And also the caveat you tag onto the end is a pretty massive caveat, and from the sources you provided it seems that students which use in the way which you claim has been shown to be effective, that those students are in a minority anyway.
Also, “classy.” As a phrase is about as “classy” as my original comment, so i guess we’re both passive aggressive jerks
[deleted]
[deleted]
So your first study actually concludes the opposite. It concluded that all AI users performed worse, but the effect was smaller for students which used AI as a tutor.
The second meta analysis I don‘t quite understand. I understand they conclude that using AI tutor shows significant improvement, but I don‘t understand the methodology. I may be misunderstanding but it seems to simply count papers which shows positive outcomes and reaches conclusion that way. I think that methodology is deeply flawed as it will amplify whichever biases are present in the studies it uses. I also think the lack of control groups is a major issues. If we are comparing AI tutor to nothing, off course the AI tutor is gonna perform better. We need to compare to traditional methods. And this is especially relevant in our discussion because junior developers usually have excellent access to senior developers (via peer review, pair programing, etc.), much better then student’s access to tutors for that matter.
So out of the meta-analysis I picked the paper with the strongest claim (trying to steel-man it) which is this one: https://online-journal.unja.ac.id/JIITUJ/article/view/34809/...
It claims the following in the abstract:
> The results indicated that students employing AI tutors shown significant improvements in problem-solving and personalized learning compared to the control group.
Now when I look at the control group it claims this (also in the abstract):
> Participants were allocated to a control group receiving conventional training and an experimental group utilizing AI technology,
But when I look into the methodology section I see this:
> The researchers classified the patients into two groups: MathGPT and Flexi 2.0
MathGPT and Flexi 2.0 are both AI tutors. Now I am confused, where is the control group and how was this “conventional training conducted”?
The methodology section actually tells a different story from the abstract:
> This research utilized a quantitative methodology via a quasi-experimental design.
By quasi-experimental design they mean that they tested the same students before and after AI intervention. And concluded that the AI tutor helped them improve. Now this is not what control group means, so the researchers are actually lying by omission in the abstract. This is a spectacularly bad experimental design and I wonder how it would pass peer review, so I look at the publisher Jurnal Ilmiah Ilmu Terapan Universitas Jambi. So not exactly a reputable journal.
I still stand by my no evidence for a testable hypotheses. I suspect that your first link is actually correct in that AI is bad for students and just less bad if it is used as a tutor.
That said, there are 80+ other studies listed in the meta-study, which is pretty frank about its limitations. (Note the snippets about positive biases in the conclusion.) It is going more for quantity over quality and is transparent about the statistical findings of each one (or lack thereof; see the count of "Not reported"s.) All these references have a myriad of results, but across the spectrum of well-designed studies at reputable venues to the other end, they follow the same themes, so I don't think this can be dismissed that easily.
But if you want, here's more research (some of which I linked in a sibling comment https://news.ycombinator.com/item?id=48241839) which has similar findings:
https://scale.stanford.edu/ai/repository/ai-meets-classroom-...
https://arxiv.org/html/2601.20245v2 (from Anthropic)
This article summarizes some of the above and more studies and has similar findings: https://maxmynter.substack.com/p/learn-to-code-with-llms-i-r...
This is in the opening of the results section in the meta-analysis:
> In the final screening phase, a rigorous full-text analysis evaluated the methodological robustness and empirical validity of the remaining studies. [...] The final corpus comprised 88 studies that demonstrated robust empirical evidence for LLM applications in educational contexts.
The inclusion of the study I read does not give me confidence that this statement is true. And the fact that they reach their conclusion by simply tallying up the positive vs. negative studies makes me conclude that this meta-analysis is practically useless. They do admit this in the conclusion (which is probably why it passed peer review [assuming the peer reviewer didn’t read the same citation as me as I am 100% certain they would have asked for it to be excluded]). But that pretty much just leaves us with nothing. We are exactly where we started. No evidence that LLMs help students beyond traditional methods.
Now I am not gonna read that Anthropic study. It reminds me of Cigarette companies finding the health benefits of cigarettes. That leaves that excellent 3-study review. In their first study they found LLM has negative effects on students (in line with the first link you showed me). In the second study they found no effect. And in the third study they found mixed (nuanced) effect where using LLMs as tutor helped students in one aspect but had negative effects on others. This is by far the best study you have presented me but it still does not change my opinion. There is little evidence that LLMs (even when used as a tutor) help people learn better traditional methods.
What makes me even more against this sentiment is this quote from the conclusion of the 3-study review paper:
> Our results suggest that students prefer to use LLMs to substitute rather than complement learning activities.
So on their own, students are more likely to use LLMs in a way which is harmful to their learning. I would expect similar behavior of junior developers.
And as we are talking about junior developers it is safe to assume your conditions (1), (2), and (4) are all true, if any of them are false, then why did that person apply for and get a job as a junior developer? As for condition (3), all workplaces eventually hires a person who does not fulfill this, then they either fire that person, or they give them a talk and the developer grows out of it and changes their behavior to fulfill that condition.
Aside: you listed 4 conditions for learning. I am not sure these are actually conditions recognized as such by behavior science. In fact, I doubt they are and that these conditions are just your opinions (man).
sounds like "no moat" to me
if you look at LLMs based coding as another step up in programming abstraction then it's clear this is the case. Think about the progression of programming languages. Over time, we go further and further from the hardware and closer and closer to specifying the desired outcome. The terminology, structure, and completeness of a user story that guides a codingagent to the desired output, and only the desired output, is the new programming language.
But that entire narrative follows from one, single, very big "If". It is not a given that AIs are a step up in abstraction.
Like, copying the answers in a test isn't considered an abstraction, I don't consider copy-pasting AI into your codebase an abstraction.
I guess to take it a step further, you can lay your requirements in order with guidance in a markdown file called 'myprogram.md'. Then tell ClaudeCode to read that file and do what it says. In that way, myprogram.md, actually your requirements doc, is the programming language being turned into the 1s and 0s the computer understands.