Prompt-Driven Product - What Worked, What Didn't

Posted at — Jan 31, 2026

First, an apology. It has been almost six months since the last article in the series. If you were following along and expecting more, sorry about the silence.

To be honest, a lot happened during that period. Work, personal life, and growing frustration with the project itself. Motivation dropped quite a bit as a result. More on that below.

What happened? 🧩

I did actually start implementing the app using the strategy I described earlier, working mostly through an LLM chat and copy-pasting code back and forth into the IDE.

Initially, progress was great. I got a lot done. Intake progress widgets, the home screen, onboarding flow, settings, and quite a bit more. At first, it felt like things were moving fast.

But the more I built, the more difficult it became to manage everything with that approach. Gemini started hallucinating more often, referencing random versions of classes from earlier in the conversation. The codebase quickly became a mess. Every small change turned into a painful process. I constantly had to step in and fix things myself just to keep the project moving.

Notifications challenges 😵‍💫

Then I started working on reminders and local notifications for both platforms. I knew this was going to get trickier the moment I crossed into native functionality, but I don’t think I was fully prepared for how painful it would be.

There was an endless amount of back and forth just to get notifications working, only for a small change to break them again. Trying to implement something that covered all use cases on both iOS and Android felt almost impossible.

I got blocked. Combined with everything else going on at the time, motivation dropped even further and eventually everything stopped.

Enter Cursor at work 🤖

Around that time, we started using coding agents a lot more at work. Initially, as a manager, I mostly used them to explain code to me, help me understand why something behaved a certain way, or navigate unfamiliar codebases. Even that was already impressive.

But then I actually implemented a feature myself in a work project, using two languages I had never used before. Or more accurately, I planned the work, reviewed the changes, and shipped the feature barely writing any of the code myself. The ease of it was honestly surprising. Planning the work, getting code well aligned with the project’s rules and guidelines, getting it merged and deployed.

All of it happened far more smoothly than I expected.

Replacing Gemini with Claude Code 🔁

That’s when I decided to subscribe to Claude Pro to start using Claude Code and give this project another go. At work I was using Cursor, mostly with Claude 4 model, so trying Claude Code directly felt like a natural next step.

The first thing I did was ask Claude to write a rules file for the project. A Flutter app, written in Dart, targeting both Android and iOS. I specifically asked for guidance around architecture, testability, separation of concerns, and clean code practices. Then I asked for a review of the entire project against those rules. Want to guess the outcome? 😄

It produced a plan with eight different priorities. Acting on that plan resulted in over 4,000 lines of code added and/or removed. There was a lot of refactoring, but while reviewing the changes myself, everything made sense.

With the Gemini chat approach, to be honest, any kind of consistent architecture was almost impossible unless I pasted the entire project into the prompt every single time. What I ended up with was a codebase where many things looked similar and behaved similarly, but were implemented in completely different ways. So I was not that surprised with how much refactor was required to get things in a good state.

The best part: accuracy 🎯

What really stood out with Claude Code was the accuracy. After every major change, I ran the app to test the flows. Every single time, everything was working exactly as expected.

It went through those eight priorities almost entirely on its own. I think I used around ten or eleven prompts in total. Eight times just saying “let’s move on to the next priority”, and two or three times to tweak small things that already needed tweaking anyway.

Compare that with the Gemini approach, where I had constant back and forth and still had to touch the code myself. What a difference. With Claude Code, the whole project was refactored, quite a few existing bugs were fixed, and I didn’t write a single line of code myself.

Conclusion: I was being stubborn 😅

I don’t know if it was the cost, the skepticism, or just me being stubborn, but trying to force myself to build the app using only an LLM chat was the wrong decision. You live and you learn. The important part is that the project is moving again, and I’m back working towards getting it live.

Is it over for LLM chats then? Not really. Not even in this project. They’re probably not the best tool for sustained development, but they’re still great for prototyping and discussing ideas. I’m also fairly sure they’ll be useful again when it’s time to navigate the publishing process, and maybe even promotion.

What’s next? 🤷

I’m not sure there’s a lot more to write as part of this series, and that’s okay. The original goal was to explore the experience of using AI to build a product, not to document individual features. At this point, the series itself deserves some closure.

Ironically, with the tools working as well as they do, there is not a lot to document. So I might write more about this project, or I might not. We’ll see how it feels, and whether there are any moments along the way that feel genuinely worth sharing.

I’ll definitely share when it goes live though, so you can give it a try. Stay tuned for that.

Joao Alves

Senior Engineering Manager at IndeedFlex