How I Actually Use LLMs to Build Software

The journey has been a bit uneven, and I've tried a few different approaches along the way. At this point, though, I feel I've landed on a setup that is not the cheapest, but works well enough to be genuinely useful. I use a handful of AI agents for different tasks, and, as an architect myself, I spend a surprising amount of time talking things through with an AI architect agent. In this article, I'll walk through the whole stack.

I'm using LLMs for many things in my life - while travelling, for day-to-day tasks at home, and for talking through and brainstorming ideas. I started experimenting with programming with AI around 2023, when the first genuinely useful large language models began to feel less like curiosities and more like practical tools.

I was not building serious production systems with them, and I did not trust them with anything especially complex. What they were good for, though, was helping me create toy scripts.

These were the sort of scripts you write to automate a repetitive task, transform a file, clean up a bit of data, or quickly test an idea without investing too much structure or ceremony in the process.

The output could be helpful, but it also had a certain fragility. You could get something working quickly, but not necessarily something you would want to trust as part of a broader engineering effort. So my relationship with AI-assisted coding during that period was exploratory rather than foundational. I was learning where it helped, where it fell short, and where its usefulness ended.

That began to change towards the end of 2024. By then, I found myself using these models in a more thoughtful and structured way. Rather than asking them simply to generate isolated snippets, I began to confront a more interesting question: how could AI fit into the way I already think about systems (and working with a dev team)?

As someone who cares deeply about architecture, trade-offs, boundaries, and long-term maintainability, I started to place AI-generated output alongside my own architectural judgement rather than beneath it or in place of it. In practice, that meant I was increasingly using AI in the context of architectural decisions - not to make those decisions for me, but to react to them, test them, challenge them, and sometimes help me move from a conceptual decision to an initial implementation more quickly.

From there, the next step felt almost inevitable. In 2025, I began programming not just with models, but with agents. That was a bigger shift than it might sound at first.

It was no longer simply a matter of opening a chat window and asking for code. Instead, I began to think in terms of roles, delegation, workflow, and structured collaboration between multiple AI components. The interaction became more deliberate. I was no longer just using AI to produce output on request. I was beginning to build a way of working in which AI could take part in different stages of software creation, while I remained firmly responsible for direction, architecture, and quality.

That progression matters because it reflects how trust is built in engineering: not through hype, but through repeated contact with reality. I needed to see what these systems could actually do, where they added value, and where they still required strong human guidance.

Let's start talking about the current workflow.

Current workflow

Everything starts from a new branch.

At the centre of that experience, for me, is OpenCode. That is where I interact with the system as an engineer. That is the surface I touch. That is where the work begins.

What matters most is that I do not really "talk to the models" directly in the abstract. In practice, I talk to agents, each with a role in a broader workflow. Quite often, each of them is a different model from a different vendor.

From my perspective as the user, the key relationship is with the architect agent. That is the only agent I interact with directly. I describe what I want to build, what I want to change, which constraints I care about, which trade-offs matter, and what kind of outcome I am aiming for. The conversation starts there, and it stays there until the shape of the work is properly understood.

The architect agent needs to be very strong, and it can be slow. That is still reasonable, and it offers good value for money, because this stage is mostly discussion. We do not need to exchange too many tokens.

That distinction matters, because the architect is not there to rush into implementation. Its role is to think with me. It helps frame the problem, inspect the codebase, reason through alternatives, and gradually turn an idea into a concrete plan. I use it much more as a design partner than as a code generator. If something feels too broad, too risky, or too vague, this is the point at which it gets refined. The goal is to make sure implementation begins only once the intent is genuinely clear.

One of the most useful guardrails in this workflow is that nothing moves into implementation until I explicitly write "approved". I like that rule because it creates a deliberate pause between thinking and doing. It stops the system from eagerly sprinting ahead on half-formed assumptions, which is one of the easiest ways to end up with impressive-looking but misaligned output. Until that approval happens, the work remains architectural: discussion, clarification, trade-offs, boundaries, scope, and intent.

Once I approve the plan, the system moves into a different mode. Under the surface, the architect writes a Markdown plan that captures the work in a more operational form. That plan is not just a rough note. It serves as the working contract for the next stages. Typically, it includes the goal of the change, the main design decisions, the files or components likely to be affected, the implementation steps, and the expected validation path, including tests or other checks. It turns a conversational design process into something explicit, reviewable, and durable. It also uses my template for these plans, so I can add specific accents where I want them.

That plan then becomes the basis for delegating work to the developer agent. It uses a cheaper model for producing a large amount of implementation output, but still a sensible one. I do not micromanage that hand-off. From my point of view, I stay with the architect, while the architect delegates implementation to a separate developer agent. This separation is one of the things I value most in the setup.

During implementation, the developer also reviews the plan and the technical details. Sometimes it goes back to the architect agent with observations, edge cases, or a different point of view, and they discuss the better solution. Sometimes the architect escalates a decision to me when it is not fully confident about the next steps.

After implementation, the work moves into review, using a few different models, both cheap and expensive. The system passes the result on to reviewing agents, which examine the output against both the plan and the actual changes.

Review is sometimes done by more than one agent, so they can discuss the outcome with the developer and, when a decision is needed, escalate it to the architect. The architect then escalates it to me if it feels important.

Alongside those plan files, I also keep an AGENTS.md file in the repository (as, I guess, everyone who coding with agents). It gives the agents an operational context: how the repository is structured, which conventions matter, how changes should be approached, which patterns are preferred, how testing is expected to work, and what the general rules of engagement are.

I also keep a file with errors and solutions, so my agents can learn the project-specific quirks, whether they come from configuration, infrastructure, a cloud vendor, or something else.

In practice, that means the Markdown files in the workflow serve different purposes (plan, project-level context, sometimes typical errors).

At the end of the process, I still review the code, but I'm not focus on every single line. Instead, I focus on the architecture, the trade-offs, the classes, and the methods.

What about the boring things?

Another part of the experience that I find especially valuable is that the surrounding tooling takes care of much of the "mechanical" engineering workflow for me. Necessary steps such as creating and managing the branch, making commits, pushing changes, and opening a pull request. I can stay focused on direction, judgement, and quality, while the repetitive scaffolding of delivery is handled consistently.

The same applies to validation. The system can run tests, update them where necessary, and use them as part of the implementation loop.

I am still responsible for the important decisions, but I am no longer doing every mechanical, boring step by hand.

What I like most about this approach is that it preserves the part of software engineering I care about most: intent, architecture, trade-offs, and coherence. I am still deeply involved, but at a different level.

Tech stack

A few points:

the main surface is OpenCode. It is where I work when building software with LLMs, and it is where the whole workflow begins. Rather than juggling separate chats and tools for each stage, I use it as the central environment for planning, delegation, implementation flow, and review orchestration;
the model I use for architectural reasoning is Opus. I rely on it when I want the strongest thinking around structure, boundaries, trade-offs, and implementation direction. Its role is not to rush into writing code, but to help turn an idea into a deliberate, well-formed plan;
the approval gate is the word "approved". Nothing moves into implementation until I explicitly write it. This is one of the most important guardrails in the whole setup because it enforces a clean separation between design and execution and stops the workflow drifting into premature coding before the problem has been properly understood;
the architect writes the plan down in Markdown once I approve a direction. These plan files capture the scope of the task, the main design decisions, the expected implementation steps, and the validation path. They form the bridge between conversation and execution, making the workflow more explicit, reviewable, and durable;
the developer agent takes over once the architectural plan has been approved. The model I use for implementation work is Sonnet. It suits the execution phase well, where the task is already structured and the main architectural decisions have been made. In that role, it becomes the engine that turns an approved plan into actual code changes;
the work is passed on to reviewer agents after implementation. Their job is to examine the result against both the written plan and the actual diff, rather than simply continuing the implementation loop. That gives the workflow a proper review stage that feels much closer to real engineering practice;
the default review model is Codex. I use it to inspect the output critically, identify issues, and catch things the implementation path may have missed. I like having a separate reviewer model because review is more useful when it comes from a different perspective than the one that produced the code;
the additional reviewer I may involve is Gemini. I use it when I want another angle, another set of instincts, or simply more diversity in the review process. Different models notice different things, and that variety makes the overall setup stronger;
the model I sometimes bring back as a reviewer is Opus. For especially important, sensitive, or high-impact changes, I use it less as a planner and more as a senior second opinion. It is particularly useful when I want a final judgement on whether the overall shape of the solution genuinely makes sense;
the final manual review happens in my IDE (in good, old-school style). Once the agents have finished their work and the architect comes back to me with a summary, I switch to my IDE to read the diff, inspect the final shape of the change, and decide whether I am happy for it to be committed. I still see this human pass as essential, even in a highly agent-driven workflow. BUT if change is super quick - I'm reviewing it via git diff or OpenCode.

How to setup

My setup is based on macOS.

It will be easiest if you already have:

Terminal or iTerm2;
Homebrew;
Git;
an account with model providers, or access through an aggregator or proxy;
the repository you want to work on.

Create directories for your working configuration

Example:

mkdir -p ~/code
mkdir -p ~/.config/opencode/agents

Install OpenCode

brew install anomalyco/tap/opencode

Verify the installation:

opencode --version

Connect your model providers

Start OpenCode from anywhere:

cd ~/code
opencode

In the OpenCode interface, add providers by typing:

/connect

Add at least:

Anthropic - for Opus and Sonnet;
OpenAI - Codex;
Google - Gemini.

To check the list of available models in OpenCode, type:

/models

Warning: the providers may not work the first time. If that happens, try again or use another method, such as entering the API keys manually.

Set up the project

Go to your repository:

cd ~/code/your-project
opencode

Run the initialisation command:

/init

This "tells" OpenCode to analyse the project and create an AGENTS.md file in the repository directory. This should be a routine to re-initialise project from time to time.

Define the agent structure

OpenCode supports:

primary agents - the main agents you can switch between;
subagents - agents called by other agents, or manually with @.

I suggest setting it up like this - this is how I have it configured:

architect = primary;
developer = subagent;
reviewers = subagent.

OpenCode has an explicit permission.task mechanism for controlling which subagents can be launched by another agent.

Create the global OpenCode configuration file

Create the file:

mkdir -p ~/.config/opencode
vim ~/.config/opencode/opencode.json

My file looks like this:

{
  "$schema": "https://opencode.ai/config.json",

  "model": "anthropic/claude-opus-4.6",

  "permission": {
    "edit": "ask",
    "webfetch": "ask",
    "bash": {
      "*": "ask",
      "git status*": "allow",
      "git diff*": "allow",
      "git log*": "allow",
      "grep *": "allow",
      "rg *": "allow",
      "find *": "allow",
      "ls *": "allow",
      "pwd": "allow"
    }
  },

  "agent": {
    "architect": {
      "mode": "primary",
      "description": "Plans features and bugfixes, writes plan files, and orchestrates developer and reviewers.",
      "model": "anthropic/claude-opus-4-6",
      "permission": {
        "edit": "ask",
        "webfetch": "ask",
        "bash": {
          "*": "ask",
          "git status*": "allow",
          "git diff*": "allow",
          "git log*": "allow",
          "grep *": "allow",
          "rg *": "allow",
          "find *": "allow",
          "ls *": "allow",
          "pwd": "allow"
        },
        "task": {
          "*": "deny",
          "developer": "allow",
          "reviewer-codex": "allow",
          "reviewer-gemini": "allow",
          "reviewer-opus": "allow"
        }
      }
    },

    "developer": {
      "mode": "subagent",
      "description": "Implements only the approved plan, then asks reviewers to review the diff.",
      "model": "anthropic/claude-sonnet-4-6",
      "permission": {
        "edit": "ask",
        "webfetch": "deny",
        "bash": {
          "*": "ask",
          "git status*": "allow",
          "git diff*": "allow",
          "grep *": "allow",
          "rg *": "allow",
          "find *": "allow",
          "ls *": "allow",
          "pwd": "allow",
          "npm test*": "allow",
          "pnpm test*": "allow",
          "pytest*": "allow",
          "go test*": "allow",
          "cargo test*": "allow"
        },
        "task": {
          "*": "deny",
          "reviewer-codex": "allow",
          "reviewer-gemini": "allow",
          "reviewer-opus": "allow"
        }
      }
    },

    "reviewer-codex": {
      "mode": "subagent",
      "description": "Reviews only the approved plan and the resulting diff. No edits.",
      "model": "openai/gpt-5.3-codex",
      "permission": {
        "edit": "deny",
        "webfetch": "deny",
        "bash": {
          "*": "ask",
          "git diff*": "allow",
          "git log*": "allow",
          "git status*": "allow",
          "grep *": "allow",
          "rg *": "allow",
          "find *": "allow",
          "ls *": "allow",
          "pwd": "allow"
        }
      }
    },

    "reviewer-gemini": {
      "mode": "subagent",
      "description": "Independent code reviewer focused on alternate solutions and blind spots.",
      "model": "google/gemini-flash-latest",
      "permission": {
        "edit": "deny",
        "webfetch": "deny",
        "bash": {
          "*": "ask",
          "git diff*": "allow",
          "git log*": "allow",
          "git status*": "allow",
          "grep *": "allow",
          "rg *": "allow",
          "find *": "allow",
          "ls *": "allow",
          "pwd": "allow"
        }
      }
    },

    "reviewer-opus": {
      "mode": "subagent",
      "description": "Senior reviewer used for important work or for resolving reviewer disagreements.",
      "model": "anthropic/claude-opus-4-6",
      "permission": {
        "edit": "deny",
        "webfetch": "deny",
        "bash": {
          "*": "ask",
          "git diff*": "allow",
          "git log*": "allow",
          "git status*": "allow",
          "grep *": "allow",
          "rg *": "allow",
          "find *": "allow",
          "ls *": "allow",
          "pwd": "allow"
        }
      }
    }
  }
}

Warning: in your /models list, the exact IDs may be slightly different. Adjust the model identifiers accordingly, otherwise there is a good chance it will not work. You only need to check your model IDs in OpenCode and enter or correct them in the configuration JSON file.

Add the instructions for the agents

Architect

vim ~/.config/opencode/agents/architect.md

File content:

---
description: Senior architect and orchestrator
mode: primary
---
You are the architect.
Your job is to:
- talk to the user until the feature or bugfix is fully understood
- clarify goals, scope, constraints, edge cases, limitations, and trade-offs
- shape the implementation plan together with the user
- keep all high-level architectural choices under tight control
- avoid implementation until the user explicitly writes: approved
Rules:
- Do not start coding before the exact word "approved" appears in the conversation from the user.
- Before approval, only analyse, ask questions, inspect the codebase, and propose architecture.
- Once approved, write a concrete low-level plan into a plan file inside the repository.
- The plan must mention the files to change, the components involved, and the expected tests.
- Then delegate implementation to @developer.
- After implementation, require independent review from reviewers.
- If reviewers disagree, you are the final arbiter.
- Prefer solutions already present in the codebase over introducing novel patterns.
- Optimise for maintainability and coherence with the existing codebase, not theoretical elegance.

Developer

vim ~/.config/opencode/agents/developer.md

File content:

---
description: Implements only approved plans
mode: subagent
---
You are the developer.
Your job is to:
- implement only what is in the approved plan file
- avoid making new architectural decisions unless absolutely necessary
- stay consistent with existing project patterns
- run relevant tests or checks where possible
- once implementation is complete, ask the reviewers to review the plan and the diff
Rules:
- Do not widen scope.
- Do not redesign the architecture.
- If the plan is unclear or conflicts with the codebase, escalate to the architect instead of improvising.
- If reviewers agree on improvements, implement them.
- If reviewers disagree, escalate to the architect with a concise summary of the conflict.
- Write code for humans before writing it for machines.
- Prefer clear names, small functions, and straightforward control flow.
- Avoid clever one-liners when a slightly longer solution is easier to understand.
- Make the intent of the code obvious.
- Each module should have a clear responsibility.
- Each function should do one thing well.
- Avoid "god classes" and overly broad utility modules.
- Split complex logic into composable pieces.
- Add type annotations to public APIs, service boundaries, and business logic.
- Treat type hints as part of the contract of the code.
- Prefer precise types over vague ones like Any.
- Make inputs and outputs predictable.
- Avoid hidden side effects.
- Prefer explicit dependencies over global state.
- Use clear abstractions at system boundaries.
- Use british english for comments.
- Test business rules thoroughly, not just trivial lines of code.
- During designing a unit tests please focus on correctness, edge cases, and failure scenarios.
- Raise meaningful exceptions.
- Do not swallow errors silently.
- Handle expected failures explicitly and fail loudly on unexpected ones.
- Add context to errors without leaking sensitive data.
- Log for operations, debugging, and auditability.
- Use structured, consistent log messages.
- Include correlation IDs, request IDs, or trace IDs where relevant.
- Never log secrets, passwords, tokens, or sensitive personal data.
- Keep dependencies minimal, avoid adding libraries for trivial problems.
- Remove unused packages quickly.
- Include meaningful runtime signals from the beginning.
- Treat public interfaces carefully.
- Always version APIs and schemas.
- Deprecate behaviour in a controlled way.
- Do not break consumers casually.
- Validate all external input.
- Watch out for common risks such as injection, insecure deserialisation, and unsafe file handling.

Reviewer: Codex

vim ~/.config/opencode/agents/reviewer-codex.md

File content:

---
description: Strict diff reviewer
mode: subagent
permission:
 edit: deny
 webfetch: deny
---
You are a strict code reviewer.
Review only:
- the approved plan
- the resulting diff
- relevant nearby code if needed
Your review must:
- identify correctness issues
- identify missing edge cases
- identify regressions
- identify places where the implementation diverges from the approved plan
- distinguish clearly between:
 - must fix
 - should fix
 - optional / pedantic
Do not rewrite the code yourself.
Do not praise for the sake of praise.
Prefer concise, concrete review points.

Reviewer: Gemini

vim ~/.config/opencode/agents/reviewer-gemini.md

File content:

---
description: Alternative-perspective reviewer
mode: subagent
permission:
 edit: deny
 webfetch: deny
---
You are an independent reviewer.
Focus on:
- blind spots
- alternative approaches
- surprising edge cases
- places where the chosen solution is locally correct but globally awkward
Be concise and concrete.
Mark each point as:
- must fix
- should fix
- optional

Reviewer: Opus

vim ~/.config/opencode/agents/reviewer-opus.md

File content:

---
description: Senior reviewer for important work
mode: subagent
permission:
 edit: deny
 webfetch: deny
---
You are a senior reviewer.
Review the approved plan and the diff with emphasis on:
- architecture consistency
- maintainability
- hidden risks
- correctness vs implementation effort trade-offs
Your role is especially important when reviewers disagree.
Be decisive and practical.

Set default content of the AGENTS.md file

vim ~/code/your-project/AGENTS.md

File content (only an example, you probably should add something more custom):

# AGENTS.md
## Project conventions
- Prefer existing patterns over introducing new abstractions.
- Keep changes minimal and local.
- Avoid speculative refactors.
- Add tests for behaviour changes when practical.
- Do not change public interfaces unless required by the task.
## Workflow
- The architect is the only agent that talks directly to the user.
- No implementation starts until the user writes "approved".
- Every non-trivial change must have an approved plan file.
- The developer implements only the approved plan.
- Reviewers review the plan and the diff.
- Reviewer disagreements go back to the architect.
## Review expectations
- Separate must-fix issues from optional suggestions.
- Optimise for correctness and maintainability.
- Avoid pedantry unless the issue is likely to matter in practice.

Set the plan file convention

Create a directory for plans in the repository:

mkdir -p plans

Create a template file:

# Plan: <feature name>

## Goal
What we are adding or fixing.

## Constraints
Technical and product constraints.

## Decisions
High-level choices that have already been approved.

## Files to change
- path/file1;
- path/file2.

## Planned changes
1. ...
2. ...
3. ...

## Tests / validation
- unit tests;
- integration tests;
- manual checks.

## Out of scope
- ...

How to work day to day

The operational working routine is simple.

Open the repository and start OpenCode

cd ~/code/your-project
opencode

Switch to the architect

If it is not active by default, use agent switching ([tab]).

Talk only about the goal

That is exactly the spirit of the workflow. Just tells the architect what you wants to achieve, then refines it through conversation, sometimes for an hour, until you have worked out the architecture and the trade-offs together.

Approve it with a single word

When the plan looks right to you, write:

"approved"

Architect will ask you for this explicitly when "he/she" "feel" that you "are in the right moment" for this.

The architect saves the plan

This happens automatically, so there is nothing else you need to do.

The developer implements it

The developer is not there to invent the architecture. Their job is to execute the plan, run sensible tests, and hand over the diff for review.

Review

After the implementation, either you ask for a review or the architect tells the developer to do it - example:

"Ask @reviewer-codex and @reviewer-gemini to review the approved plan and the current diff.
On important changes, also ask @reviewer-opus."

Integrating feedback

The rule is:

if the reviewers agree, the developer makes the changes;
if their feedback conflicts, it goes back to the architect.

Default safety rules

Since you are working in your own repository, do not put everything on full autopilot.

I set mine up like this:

edit: ask for architect/developer;
edit: deny for reviewers;
webfetch: deny for reviewers;
bash only for safe commands such as git diff, git status, rg, grep, and tests.

Common mistakes with this setup

These are the things most likely to derail this workflow.

Too little control over the architecture

If you do not understand the technology well enough, the model will make bad decisions that you will not catch in time. It then keeps building on those mistakes and adds more layers of mess.

Reviewing with the same model

That undermines the whole point of review, because a model tends to agree with itself. You might get a bit further by giving it a better prompt, such as "try to disagree with this code and look for problems", but in my case that still did not produce good enough results.

What's next

Here is how I plan to extend this approach in future:

handling and automating error markdown files;
automatically collecting issues from GitHub;
multiple OpenCode sessions across different machines;
more nitpicky reviews and refactoring - simplifying things, spotting over-engineered solutions, and proactively looking for better practices through the architect model (maybe from time to time in some scheduler or simple cronjob as Proof of Concept);
project manager as a service.