Opus Writes It. Codex Reviews It. They Loop Until It's Clean.

By Charles Jones, Senior Full-Stack EngineerJune 13, 20266 min read

Every developer using AI to ship code probably agrees on one thing: writing the code stopped being the slow part.

Building my SaaS projects and iOS apps with Claude Code made development a lot faster. Features that used to take a week now land in an afternoon, so I'm opening far more pull requests than I used to. The slow part now is review: reading every diff with the care it deserves doesn't keep up with how fast Claude writes the code.

So I've recently handed the review to a second AI. When a feature is done, Claude opens the PR. CI runs. Codex Code Review, the GitHub bot, reviews it the moment it opens. Then Claude works through every comment Codex leaves in a loop.

As of writing, this whole arrangement runs on a $20/month ChatGPT Plus plan (and my Claude plan), and it has paid for itself many times over. The pairing today is Codex on GPT-5.5 and Claude Code on Opus 4.8; the loop outlasts whichever models are current.

Two models from two labs, reviewing each other

The reason this works has nothing to do with either model being smarter than the other. It works because the reviewer didn't write the code.

When the same model reviews its own output, it carries the same blind spots into the review that it had while writing. It already decided that platform guard was fine. It already convinced itself the empty state was handled. A separate model, trained differently and reasoning differently, doesn't inherit those assumptions. It reads the diff cold and flags the thing the author talked itself out of worrying about.

So the pairing matters in a way that running two Claude sessions wouldn't. Codex reviewing Claude's work is genuine cross-examination. Each model's weaknesses are likely in different places, which means the second pass catches a class of issue the first pass structurally cannot.

And because Codex reviews on GitHub, it doesn't care what language the PR is in. The same loop runs over a TypeScript SaaS backend and a Swift iOS app. One review surface, every project.

The loop, start to finish

Here is the shape of it, generalized from how my PRs go in practice:

   Claude Code
   builds the feature -> commits -> opens the PR
            |
            v
   Codex Code Review
   reviews on PR open -> posts findings tagged P0 / P1 / P2
            |
            v
   +----------------------------------------------+
   |  for each finding, Claude decides:           |
   |    valid  -> fix, commit, push, reply        |
   |    wrong  -> reply with the reason, resolve  |
   +----------------------------------------------+
            |
            v
   all findings handled -> one "@codex review"
            |
            v
   Codex re-reviews the latest commit
            |
       +----+-----------------------------+
       |                                  |
   more findings                  "no major issues"
       |                                  |
       +--> back through the loop         v
                                       I merge

The setup

Two pieces: turning Codex Code Review on, and installing the /git-pr-codex-loop skill that drives Claude through the loop.

Codex Code Review

In ChatGPT's Codex settings, under Code review, connect the GitHub app and configure it per repository. The settings I run:

Auto code review: Review all PRs. Set this on each repo you want covered. I leave the personal "Auto review" default off and opt in per repository, so a throwaway repo doesn't get reviewed.
Review trigger: On PR open. Codex reviews automatically the moment the PR opens, no manual kick needed and no git workflow file or action to configure (unlike Claude Code Review).

Once it's on, @codex mentioned anywhere in the PR requests a fresh review on demand. That mention is the hinge the whole loop turns on.

Install the skill: `/git-pr-codex-loop`

I've packaged this loop as a Claude Code skill in my ai-git plugin, so you don't have to wire it up yourself. Add my marketplace and install the plugin in Claude Code:

/plugin marketplace add charlesjones-dev/claude-code-plugins-dev
/plugin install ai-git@claude-code-plugins-dev

Then, on any feature branch with Codex Code Review enabled and local changes pending, run:

/git-pr-codex-loop

It opens the PR if there isn't one, watches CI, waits for Codex, resolves every finding (replying on each thread and pushing fixes), requests a single re-review per pass, and loops until Codex comes back clean. It never merges, so that call stays yours. The full source is on GitHub.

What it catches, and what it costs

Most of what Codex catches, I would have shipped. The happy path worked when I tried it, so I moved on. None of it fails a typecheck or a unit test, which is exactly why a second reviewer earns its place.

The loop beats a single pass because each re-review reads the fix, not just the first diff. A fix that introduces its own problem gets caught next round, and when a round comes back clean Codex says so outright ("Didn't find any major issues"), so "done" is a signal I can read instead of a guess. It also helps that Claude pushes back when Codex is wrong, rather than churning the diff to satisfy a bad comment.

None of this is hands-off. I read the conversation and own the merge every time: two models from different labs agreeing is strong signal, but "both AIs were happy" is not the same as "the code is right." It runs in minutes while I do something else. On the $20 plan the reviews share a rate limit I only bump into on a heavy day. And it reviews the diff, not the plan, so whether the feature was worth building stays my call.

A two-model team

I've spent a lot of time being the bottleneck on my own code review. One person reading every diff, getting tired by the fourth PR of the day, skimming the part that turns out to matter. What changed is the kind of attention I pay. I mostly stopped reviewing line by line and started reading a conversation between two models that each catch what the other misses.

Claude writes the code and answers for it. Codex stress-tests it and signs off when it's clean. They go back and forth without me in the middle, and I step in for the calls that are mine to make. For $20 a month I have a code reviewer that never sleeps, never gets bored on PR number forty, and was trained by a different lab than the one that wrote the code.

That loop is one command now: /git-pr-codex-loop. I'd rather ship through it than without it.

Opus Writes It. Codex Reviews It. They Loop Until It's Clean.

Two models from two labs, reviewing each other

The loop, start to finish

The setup

Codex Code Review

Install the skill: `/git-pr-codex-loop`

What it catches, and what it costs

A two-model team

Share this article

Contact

Opus Writes It. Codex Reviews It. They Loop Until It's Clean.

Two models from two labs, reviewing each other#

The loop, start to finish#

The setup#

Codex Code Review#

Install the skill: /git-pr-codex-loop#

What it catches, and what it costs#

A two-model team#

Share this article

Two models from two labs, reviewing each other

The loop, start to finish

The setup

Codex Code Review

Install the skill: `/git-pr-codex-loop`

What it catches, and what it costs

A two-model team