--dangerously-skip-reading-code

hckrnws

--dangerously-skip-reading-code

(olano.dev)

111

16h

by fagnerbrack

facundo_olano
43h
Author here. I'm surprised to see this surfacing now. I just wanted to clarify, since apparently the post doesn't do a good job at it, that what I discussed there is not a methodology I advocate for. The point of the post was: ok, since there are organizations mandating to maximize speed by reducing time spent on typing code (or even mandating to maximize agents usage), is there a way we can meet that requirement while still preserving the rigor somewhere else?
This was a follow up to a previous article[1] and the pair tried to express what I still think today (using AI daily at work): every time I use AI for coding, to some capacity I'm sacrificing system understanding and stability in favor of programming speed. This is not necessarily always a bad tradeoff, but I think it's important to constantly remind ourselves we are making it.
[1] https://olano.dev/blog/tactical-tornado/
1. ignoreusernames
  02h
  Don’t you think that the provider of the LLM is also a dimension on these discussions about responsibility? We often talk about the tech itself (LLM driven development) but how we access it is just as important imo. It’s either locked behind a non trivial amount of hardware (for open models) or some monopolistic driven provider entity like OpenAI or anthropic. In the provider case, it’s not really the LLM that will “own” the code, it’s the provider itself and we’ll be at the mercy of whatever pricing model they shove down our throats.
2. LelouBil
  02h
  I don't like the premise of the article, but I agree that if you accept the premise, the contents of the articles are a good way to do it.
3. Uptrenda
  12h
  nothing like citing yourself for peak credibility
  1. AlexCoventry
    028m
    He was establishing the context of The current blog post. Very unlikely that he was doing it for Google juice.
throwaw12
16h
> my first bet would be specifications and tests
You are missing another dimension how easy it would be to migrate if adding new feature hits a ceiling and LLM keeps breaking the system.
Imagine all tests are passing and code is confirming the spec, but everything is denormalized because LLM thought this was a nice idea at the beginning since no one mentioned that requirement in the spec. After a while you want to add a feature which requires normalized table and LLM keeps failing, but you also have no idea how this complex system works.
Don't forget that very very detailed spec is actually the code
1. abalashov
  03h
iloveoof
43h
Software engineering has always worked this way, just not to ICs.
“The LLMs produce non-deterministic output and generate code much faster than we can read it, so we can’t seriously expect to effectively review, understand, and approve every diff anymore. But that doesn’t necessarily mean we stop being rigorous, it could mean we should move rigor elsewhere.“
Direct reports, when delegated tasks by managers, product non-deterministic outputs much faster than team leads/managers can review, understand or approve every diff. Being a manager of software developers has always been a non-deterministic form of software engineering.
1. alabut
  11h
ramoz
2310h
> If I had to roll out such a development process today, I’d make a standardized Markdown specification the new unit of knowledge for the software project. Product owners and engineers could initially collaborate on this spec and on test cases to enforce business rules. Those should be checked into the project repositories along with the implementing code. There would need to be automated pull-request checks verifying not only that tests pass but that code conforms to the spec. This specification, and not the code that materializes it, is what the team would need to understand, review, and be held accountable for.
The constant urge I have today is for some sort of spec or simpler facts to be continuously verified at any point in the development process; Something agents would need to be aware of. I agree with the blog and think it's going to become a team sport to manage these requirements. I'm going to try this out by evolving my open source tool [1] (used to review specs and code) into a bit more of a collaborative & integrated plane for product specs/facts - https://plannotator.ai/workspaces/
[1] https://github.com/backnotprop/plannotator
ricardobeat
32h
> We can’t leverage agents if our unit of work is still “add a new endpoint to the RESTful API”
Why not? You just make every task faster. Not everything has to be an uncontrollable rocket launch.
> We need a virtually infinite supply of requirements, engineers acting as pseudo-product designers, owning entire streams of work
Why? To build what? You can only build as fast as you understand the business and your users.
1. charcircuit
  21h
  >You can only build as fast as you understand the business and your users.
  It should be possible to go faster by having AI understand the business and users.
nirui
11h
> We can stop reading LLM-generated code just like we don’t read assembly, or bytecode, or transpiled JavaScript; our high-level language source would now be another form of machine code.
My opinion is very close to this. Currently the reason that it's bad to not reviewing/testing the code LLMs generated is because the LLMs can sometime generate bad codes. But it's a bug that can be improved. One day you'll have LLMs generating code consistently better than what a human could write. And then you just stop needing to review them. (And that's probably also the time where most programmers/developers got fired too)
Don't get surprised if anyday the LLMs starts to generate binaries directly. THAT will be impossible to read and costs more time to analyze.
1. 9029
  01h
zoogeny
16h
>... my first bet would be specifications ... and tests ... If I had to roll out such a development process today, I’d make a standardized Markdown specification the new unit of knowledge for the software project.
I've found that adopting RFC Keywords (e.g. RFC 2119 [1]; MUST, SHOULD, MAY) at least makes the LLM report satisfaction. I'd love to see a proper study on the usage of RFC keywords and their effect on compliance and effectiveness.
1. https://www.rfc-editor.org/info/rfc2119/
1. kortex
  03h
k3vinw
34h
> We can stop reading LLM-generated code just like we don’t read assembly, or bytecode, or transpiled JavaScript; our high-level language source would now be another form of machine code
This is too weird for me. At least with programming languages I can consult the documentation and if the programming language isn’t behaving as documented, it’s obviously a defect and if you’re savvy enough you often have open channels that accept contributions. Can we say the same for Claude or other AI solutions?
1. Sevii
  24h
DavidVoid
25h
> Product owners and engineers could initially collaborate on this spec and on test cases to enforce business rules. Those should be checked into the project repositories along with the implementing code. There would need to be automated pull-request checks verifying not only that tests pass but that code conforms to the spec. This specification, and not the code that materializes it, is what the team would need to understand, review, and be held accountable for.
This just sounds like typical requirements management software (IBM DOORS for example, which has been around since the 90s).
It's kind of funny how AI evangelists keep re-discovering the need for work methods and systems that have existed for decades.
When I worked as a software developer at a big telecom company and I had no say in what the software was supposed to do, that was up to the software design people--they were the ones responsible for designing the software and defining all the requirements--I was just responsible for implementing that behavior in code.
1. irishcoffee
tyleo
710h
The underlying mechanism is still the same: humans type and products come out.
So something which must be true if this author is right is that whatever the new language is—the thing people are typing into markdown—must be able to express the same rigor in less words than existing source code.
Otherwise the result is just legacy coding in a new programming language.
1. SoftTalker
  110h
  > Otherwise the result is just legacy coding in a new programming language.
  And this is why starting with COBOL and through various implementations of CASE tools, "software through pictures" or flowcharts or UML, etc, which were supposed to let business SMEs write software without needing programmers, have all failed to achieve that goal.
ninalanyon
29h
> Rework is almost free
Is it? All the electricity and capital investment in computing hardware costs real money. Is this properly reflected in the fees that AI companies charge or is venture capital propping each one up in the hope that they will kill off the competition before they run out of (usually other people's) money?
1. fractaled
  01h
  Even ignoring the AI costs, 'rework' is going to be more expensive as soon as you have customers. For example any sort of data migration. Or UX expectations. Or public API interface. None of these can change without some thought, so one would be leaning on these specs quite a lot.
adelks
059m
"A sufficiently precise spec is code". I've read somewhere here before.
So guardrails, i.e. sufficiently precise spec and tests, will need to be as strict as the LLM is bad at getting the right context and asking back the right questions. I suppose at that point not much difference between a human engineer and it.
jmull
49h
The lesson I've learned from our new AI age is how little a large number of people who've worked in software development their entire careers understand software development.
I suppose all the money floating around AI helps dummify everything, as people glom on to narratives, regardless of merit, that might position them to partake.
What we actually have now is the ability to bang out decent quality code really fast and cheaply.
This is massive, a huge change, one which upends numerous assumptions about the business of software development.
...and it only leaves us to work through every other aspect of software development.
The approach this article advocates is to essentially pretend none of this exists. Simple, but will rarely produce anything of value.
This paragraph from the post gives you the gist of it:
> ...we need to remove humans-in-the-loop, reduce coordination, friction, bureaucracy, and gate-keeping. We need a virtually infinite supply of requirements, engineers acting as pseudo-product designers, owning entire streams of work, with the purview to make autonomous decisions. Rework is almost free so we shouldn’t make an effort to prevent incorrect work from happening.
As if the only reason we ever had POs or designers or business teams, or built consensus between multiple people, or communicated with others, or reviewed designs and code, or tested software, was because it took individual engineers too long to bang out decent code.
AI has just gotten people completely lost. Or I guess just made it apparent they were lost the whole time?
blacob
02h
> Then where does the rigor go? Similar to the Thoughtworks report, my first bet would be specifications (which is not the same as prompts) and tests (which is not the same as TDD).
This is what we're building for at Saldor (https://saldor.com). It's a hard problem, to get a team in the habit of writing good specs. Probably because it's a hard thing to do: thinking of the behavior of your program, especially at the edges. But I agree (biased) that this is probably the way forward for writing code in the near future. I'm excited to see other people thinking about it.
phyzix5761
18h
I wonder if with the speed of iteration with AI the industry will switch back to waterfall. Clear documentation first so the LLM can easily produce what's being asked with a round of testing before going back to the documentation stage and running it again. History does repeat itself.
1. yibers
  08h
  We already switched
humbleharbinger
34h
My amazon orgs leadership has been obsessed with spec driven development while individual engineers tell me the only use they have is to placate leadership. I'm tired
1. culi
  24h
  How does spec driven development differ from test driven development?
montroser
112h
This could very well be a pattern that some teams evolve into. Specs are the new source -- they describe the architectural approach, as well as the business rules and user experience details. End to end tests are described here too. This all is what goes through PRs and review process, and the code becomes a build artifact.
1. vips7L
  010h
  It just doesn’t work though. Anthropic couldn’t even get Claude to build a working C compiler which has a way better specification than any team can write and multiple reference implementations.
hombre_fatal
19h
Yeah, this has been my process for months now.
I might even start my own blog to write about things I've found.
1. Always get the agent to create a plan file (spec). Whatever prompt you were going to yolo into the agent, do it in Plan Mode first so it creates a plan file.
2. Get agents to iterate on the plan file until it's complete and thorough. You want some sort of "/review-plan <file>" skill. You extend it over time so that the review output is better and better. For example, every finding should come with a recommended fix.
3. Once the plan is final, have an agent implement it.
4. Check the plan in with the impl commit.
The plan is the unit of work really since it encodes intent. Impl derives from it, and bugs then become a desync from intent or intent that was omitted. It's a nicer plane to work at.
From this extends more things: PRs should be plan files, not code. Impl is trivial. The hard part is the plan. The old way of deriving intent from code sucked. Why even PR code when we haven't agreed on a plan/intent?
This process also makes me think about how code implementation is just a more specific specification about what the computer should do. A plan is a higher level specification. A one-line prompt into an LLM is the highest level specification. It's kinda weird to think about.
Finally, this is why I don't have to read code anymore. Over time, my human review of the code unearthed fewer and fewer issues and corrections to the point where it felt unnecessary. I only read code these days so I can impose my preferences on it and get a feel for the system, but one day you realize that you can accumulate your preferences (like, use TDD and sum types) in your static prompt/instructions. And you're back to watching this thing write amazing code, often better than what you would have written unless you have maximum time + attention + energy + focus no matter how uninteresting the task, which you don't.
moritzwarhier
59h
Entertaining flag name!
React team seems to really have set a precedent with their "dangerouslySetInnerHTML" idea.
Or did they borrow it somewhere?
I'm just curious about that etymology, of course the idea is not universally helpful: for example, for dd CLI parameters, it would only make a mess.
But when there's a flag/option that really requires you to be vigilant and undesired the input and output and all edge cases, calling it "dangerous" is quite a feat!
1. wrxd
  48h
wizzwizz4
1812h
> There would need to be automated pull-request checks verifying not only that tests pass but that code conforms to the spec.
As I understand, this is an unsolved problem.
1. InsideOutSanta
  1110h
  Step 1: solve the halting problem.
debesyla
010h
I found that adding "philosophy" descriptions help guide the tooling. No specs, just general vibes what's the point, because we can't make everyone happy and it's not a goal of a good tool (I believe).
Technology, implementation may change, but general point of "why!?" stays.
Ozzie-D
06h
the irony is that AI is making this exact problem worse. ppl are generating entire codebases now without reading any of it -- the flag might as well be the default. the skill thats actually becoming scarce isnt writing code, its reading code you didnt write and knowing if its correct.
retinaros
16h
markdown became the language I hate the most thank to LLMs and specs-driven approach. everything feels so dumb right now in agentic coding. looping blindlessly and aimlessly until it compiles then until the playwright server or whatever devtools shows that it somehow works. push the code, have a llm autoreview/autofix,push to prod, run a mythos (perfect name) to identify the bug that opus 4.7 create. loops on loops on loops of some kind of zombie processes running to a "goal" that everyone seems to mystify in talks to just hide the fact that we do nothing anymore. the bottleneck never was code. it was the gate that was keeping away the Elizabeth Holmes and SBF from software engineering and it just opened.
1. abalashov
  03h
testplzignore
09h
> Product owners and engineers could initially collaborate on this spec and on test cases to enforce business rules.
LOL. I had to check if this was published on April 1st.
lesscode
18h
Instead of accepting 20,000 lines of slop per PR (and never-ending combinatorial complexity), maybe we should aim to think about abstractions and how to steer LLMs to generate code similar to that of a skilled human developer. Then it could actually be a maintainable artifact by humans and LLMs alike.
1. crnkofe
  05h
  I don't get why every AI article is so hyper-focused on coding speed. If the coding is so fast doesn't it make sense to invest more time into quality, learning, documentation, testing refactoring, making a better product? I'm beginning to think that the slopcoders are evaluated by kLOCs of lines written in addition to LLM token usage and they're just maximising the measured metrics. Whether that actually ends up in production or is used by any real person is seemingly irrelevant. Likely the more bugs that are produced the more agents can be spun in parallel to simulate busywork.
farmerbb
06h
I legit can't tell if this article is satire, or not.
Ecys
212h
very true. and we already know and agree with this.
user experience/what the app actually does >>> actually implementing it.
elon musk said this a looong time ago. we move from layer 1 (coding, how do we implement this?) to layer 2 thinking (what should the code do? what do we code? should we implement this? (what to code to get the most money?))
this is basic knowledge
1. duskdozer
  110h
  Elon Musk has been saying Teslas would have fully autonomous self-driving within 1-3 years since 2013
Uptrenda
03h
Does this post mark the top of the hype train or is there still more to come?
dundunUp
041m
[dead]
donbventures
04h
[dead]
fijiol
09h
[dead]

hckrnws

hckrnws

-​-dangerously-skip-reading-code

--dangerously-skip-reading-code