hckrnws
Open source Kanban desktop app that runs parallel agents on every card
by vitriapp
by vitriapp
I feel 30 minutes of planning and 30 minutes of implementation in my solo side project's repo is too big to review. At minute 5, I may ask the AI to redo stuff even as its spitting out code.
I’ve looked at the outputs here and there - and holy hell would it never pass review if I were trying to make something robust and anti-fragile. But since I can just have AI spit out a fix for the horrific “code” when it breaks in a totally predictable manner it’s just not worth my time to try to actually sit down and get it done right. Or even fight with AI by providing a good specification and design guidelines.
I imagine this is how things are going in the real world, given 30 years of working with various levels of humans. So long as the output is “good enough” it is the extreme minority of folks who care about much else. And that’s for mid-level to senior folks who have the experience to know better. Juniors wouldn’t even be able to pick out most of even the most obvious anti-patterns AI tends to spit out such as putting configuration within code, etc.
Refactoring is just in a new world too, that us olds probably have a hard time with. It’s no longer examine the code, identify design gaps, find high leverage places to start fixing, etc. It’s now “this is broken, rewrite from scratch” when it eventually turns into too much spaghetti.
In some ways being entirely focused on the outcomes is freeing in a way. But man under the hood is crazy and a whole new world.
For me, strong file structure helps as well. Reviewing a 3,000 line file it just created is abysmal. I wouldn't accept that from human nor machine :) Multiple files in the right places helps reduce cognitive load.
Sometimes I'll also review with the agent interactively. What is the most important file to review first, etc?
I like to stage changes into a "LGTM" pile. Then if I want changes, I'll have the agent "review unstaged changes - I want something different done here."
Product managers never cared about the code. Engineering managers don't care about code as much as they did when they were engineers. Directors couldn't care less about code. CTOs don't know what code looks like anymore. We are at the end of the chain, and somehow we always took pride of well written and maintainble code because we knew deep inside that good systems are built based on good code. But now we are jeopardizing ourselves, it's us the engineers who don't care anymore about code and with AI that problems is amplified.
Personally, I always end up tweaking something the agent produced. I wonder if I should let go of that control...
I'm like a 1-2 chats at a time kind of guy. I just don't see how I could keep my exact vision for the project otherwise.
I gave it the existing modem, and had it build rigging to build test vectors. I had it specify the work in the modem. And to confirm that legacy<>legacy produced the same streams as the new code. I've also recorded test vectors vs. other modems.
I've since launched it on targeted refactoring and code reduction projects.
I am mostly not looking at the code. There's a 100KSLOC lump of code that is much cleaner than a decompilation but a fair bit dirtier than what I would write myself. It is not factored terribly. I have some hope of getting it to trim this down to 70KSLOC that then I can accept in small blocks.
It outperforms the original softmodem, hitting higher RX rates for the same line quality and using less CPU. It also has additional functionality.
So, you know, I would never have written something this large for a hobby myself. And it's cost me $200 and 20-30 minutes per day for a few weeks to get a huge functional surface that I do believe I will be able to trust at the end of the process.
Now when using it for my job... that's a totally different story: I review all the changes, so a single chat session with an agent can lead to a whole day of review. And it's great, sometimes the agent uses patterns and functions I don't know, so I learn a lot.
Maximize providers profits. What can go wrong.
[deleted]
[dead]
My most successful autonomous runs have been expanding scrapers across a number of similar but different portals. I had examples and targets and it just kept searching for new ones and adding them.
But even doing basic ML auto research k have found it to be surprisingly poor except at trivial but useful augmentation of models. Yes it can implement things but somehow I am required a lot even though I set up a lot of framework around it.
My mental model is that it's very good at complex deterministic work like reading bad API docs and getting some connectors to work.
But perhaps I care less about being stuck in a local optimum there.
They do not have any users. Meanwhile, i've to do code reviews and all otherwise my 12,000+ users will be pissed off if anything in their workflow breaks.
This means i really cannot release more than 1 tiny feature a day. And using parallel agents, well that's good for testing but i don't think i need to add that many features to add anything.
[dead]
Personally somehow I am working on stuff that has like 25% not trivial stuff and that is enough to have the same experience as you have.
But also lots of people just don't care about quality and they might be right with their customers/audience. In these cases when someone catches one, an agent is going to iterate on it and make it (seemingly) go away, bandage applied, who cares again. This has a market, I am sure. Lots of programmer folks are just as bad.
[dead]
The Vibe Kanban developers unfortunately decided that they didn't see a path to profitability and have stopped investing in the project. It's open source and so you can run it locally / fork it, but it has stopped improving and there are still annoying bugs that need to be fixed (and I don't have time to maintain it personally). This makes me sad because I would be willing to pay for Vibe Kanban, but I didn't need the features their paid plan offered (in retrospect maybe I should have paid anyway).
I'll give Kanbots a go :) I'd recommend liberally copying features from Vibe Kanban. In particular the remote support and "Open in VS Code" button (which in my case opens a local VSCode client pointing to a remote VSCode server) are critical for me.
This tool on the other hand is all about "jam as much work as you can come up with into being created in parallel". Obviously there is no managing of any flow of quality outputs, and no limiting of any work because you just shove everything into the agent and burn tokens like crazy.
Calling this a "Kanban" really irks me ... its like blasphemy or something.
This is table-stakes for me to consider adoption of a tool like this.
If AI is agentic I would expect it takes an hour of chatting for any PM to integrate some agent Ralph loop with Jira. Jira or Trello or Linear or Basecamp all have APIs and I guess CLIs any agent can use to talk to them. No developer or SaaS should be needed to make them understand tasks are checked out when you start work and contain instructions and when you are done you move the ticket to DONE.
For example, if I have a webapp, I want each of the worktrees to spin up its own infrastructure, and be accessible on its own unique local url, so that I can see the changes locally for each worktree, or I can have agents automate visual checks using something like agent-browser.
Currently I use docker for my infrastructure, each service running in its own container. I have a script that has a ./app worktree create worktreename. That creates a worktree as "worktreename" and spins up all of my docker infrastructure with prefixes for things like "WORKTREENAME", and I can access all my urls at worktreename.myapp.test (or just myapp.test for the main worktree).
This is working fine for now, but it'd be cool if one of these apps was compatible with this concept so I could move over to that.
What a fitting first error to run into for vibe coded software.
Having agents on kanban is really a level-up in terms of what you can do and how you can organize.
[dead]
I want to have a fullblown cursor instance/window for each task I have, and a central Hub that manages spawning those instances, setting up the worktrees, etc.
Cursor seems to pretty much have all the available tools there already (it can already spawn agents to their own worktrees with proper setup scripts, for example). I don't get why they don't do it and instead insist on a buggy and confusing agents experience.
Unfortunately, most attempts at this seem to assume I want a model where "1 task = 1 agent = 1 chat", whereas what I really want is "1 task = 1 worktree = 1 full IDE around it".
With the full IDE I can have multiple agents/conversations, review code thoroughly and also chip in once in a while. I can have multiple models (that I pick) in multiple chats, iterate forwards, backwards, you name it.
I really don't understand why there seems to be this idea that "parallel agents" should live in their own little restricted flow that's limited to a tiny chat interface. I want the full flow for every agent!
I was hoping cursor would do this, but they really seem to be going the direction of turning their absolutely terrible web agents UI (where you can't even CHANGE THE MODEL!!!!) into a desktop thing. Sad, as I've been an Ultra paying customer and might have to leave soon with the direction they're heading.
Just a heads up, the website is extremely choppy on WebKit (Orion Browser) for me when scrolling
edit: not working with Claude Code on Amazon Bedrock, it needs a claude scription
I'm a bit anxious about putting myself out there, but I'd be curious if my efforts cross that bar for you or not? https://ouijit.com/ (and the repo is at https://github.com/ouijit/ouijit)
I do not use any of those.
Also why all of those vibe coded websites are so slow on mobile.
Also I do not understand why Software Developer people are work so hard to make themselves obsolete. Why? You guys do not enjoy eating and having place to sleep?
[dead]
[deleted]
It makes me not caring about potentially good product in no time
[deleted]
[dead]
[dead]
[dead]
[dead]
[dead]
[dead]
[dead]
[dead]
[dead]
[dead]
[dead]
But .. you know something cute? AI makes using Jira fun, again.
Case in point: I recently asked an LLM to write a pile of code to compile historical baseball stats to test betting success against the results of my hand-written code that evolves genetic algorithms. I marveled for a little while at the unbelievable improvement in EV/ROI that this script was showing could have been achieved from certain small tweaks. I only noticed after pushing a total bet that the push registered on the output as a win - and only because I was carefully staying on top of it. A single stupid recursively operating >= instead of > had caused completely nonsensical results that looked plausible.
Imagine, like, trusting a 10k loc script to give you data for something you were going to build in the physical world, and hoping an LLM hadn't made a mistake like that.
Personally it's probably the biggest struggle, trying to rein in the "spray and pray" approach LLMs typically like to take, and reducing the "patch on top of patch" syndrome too.
My team is using AI for most of the code, but the human review layer is crucial and unavoidable if you're interested in things like reliability, uptime, controlled feature rollouts, the integrity if your user's data, etc.
[dead]
[deleted]
[deleted]
I don't know many serious software engineers who'd take that approach, the convention was always to actually open up the code, evaluate the quality, see if they seem to know what they're doing, then chose the libraries you know works and could be adjusted to fit whatever you wanted it. At least for professional development inside companies, not a single library would be included unless you at least reviewed that the top-level dependency you pull in actually had code worth pulling in in the first place.
And this approach just as well today as it used to, you literally have to spend like 3-5 minutes browsing the code, evaluate the abstractions they've built and then say "Yes, looks good enough to try to use" or "Clearly these people just hacked this together as fast as they could".
Not saying code quality isn't important - it is. But I think what is described as quality code will change.
It definitely has fewer bugs than a senior developer, but it really hinges on getting the plan right. 20 minutes of planning and 20 of implementation sounds about right for my workflow as well, just make sure you have GPT as a reviewer. It's very nitpicky and finds lots of bugs.
I guess you actually review and actively participate in making the plan, you just don't review the code afterwards?
Could you share some more details on the specifics of your workflow? (What models/harnesses? do you use the same or different context windows? How exactly do you run the review, and how do you pass along and act upon the information from the review?) Also, how big are the changes you usually implement with one plan/develop/review cycle?
The changes aren't usually very big, basically what you'd put in one ticket. If I need to make large changes, I do them in self-contained stages, if that's possible, otherwise I will tell the LLM to add specific tests in the plan, and I will test thoroughly after.
I know a lot of engineers who skip the last part. They're over confident in their original plan. They're over confident the agent actually fulfilled the plan.
The answer won't be the same for all software, but you're assuming it will be.
First, that this is challenging to scale across large orgs. Even if your plans produce high quality code, that isn’t true for everyone. I’m definitely struggling with slop code being collectively mailed to me for review my our 1,000 engineers that were told to use their AI subscription all at once.
I feel like we should be taking “prompt engineering” more seriously. And when people mail me code to review, it should also include the agentic workflow and plan. So that when code isn’t up to quality, and can have a discussion about the prompts used to generate it.
My second thought is related to your senior engineer comment. This isn’t surprising, because in most engineering orgs, seniority is completely unrelated to code quality. In fact, many orgs incentive the opposite: “senior” devs that push out buggy code quickly and push accountability downhill to the junior devs.
I've been working for the last week or two on getting my new tool up to parity with VK with additional improvements. I've been posting some screenshots into the Vibe Kanban discord as well. Hopefully it'll be a great fit for your use case when I finally am ready to launch it.
(My tool aims for better features than VK in both the Kanban board and agent workspaces, while adding extra systems like desktop windowing, plugins, in-browser VSCode integration, and htmx-like server-rendered UI. The remote access also works differently - you host the whole thing like OpenClaw and access the remote desktop UI from the browser, rather than run a webserver on your laptop to access remote coding agents.)
[dead]
Would you say that <word> is a way for Nigerians to sound smart?
jira-cli and hermes, for example.
in fact, wiring hermes up to an existing Jira(/other_PM_system) is, well .. fruitful.
Also, Linear themselves are also working on this.
honestly creating these local scripts for automating the dev work was trivial and they combine well with all other cli tooling. thats why i havent tried any of the GUI apps yet. im not sure they're able to compete with my custom local setup that works exactly the way i want.
Tell your claude to set this up. Should do it in a single prompt
I am working on exactly this interface for my new tool called Kotkit. You start with kanban board management of workspaces. Each workspace (worktree on one/multiple repos) is a feature-rich IDE interface in a remote-capable in-browser desktop. You can spawn multiple agents with a good UI wrapper and full auditable logs, solve worktree rebase/merge with 1-click AI features, and there is also an embedded VSCode to solve edge cases. It also supports very deep plugin integration like IntelliJ.
Currently dogfooding it on my own projects and will be released sometime soon.
Edit: my original comment was related to how git worktrees are by default used in the implementation of these agent orchestration tools. I would rather prefer jj workspaces.
[dead]
I think a lot of the problems with the homogenous outputs of front-end design wouldn't be such a problem if the models naturally make their designs so much simpler, but they are LLM's so they are always going to be overly verbose.
I was curious so I had asked my agent to redesign and recreate your front page for comparison and it gave me this: https://ouijit-redesign.vercel.app
I open such a page and I immediately know it was Claude that produced it (probably end-to-end). Not that there's anything wrong with that, but it lacks soul… and that makes me kind of sad.
I'd also wager that far more % of code gets more coverage of review, via prompting AI to do it, than it did before.
Most PR's pass as long as they A. pass checks, B. dont introduce regressions, C. fix a bug or implement a feature. People talk about this era of humans reviewing code with nostalgia... but that never existed at scale.
Let us be honest, for your average dev, the assumption was that the number of github stars, npm/nuget downloads was a god proxy for quality.
Judging by how they struggle to communicate generally, I can't imagine their prompts are doing much heavy lifting.
About senior engineers, I guess that depends on the org you have experience with. My experience doesn't match yours.