How to Automate Code Review with Claude Code

Learn how to automate code review with Claude Code. This guide covers PR review setup, subagent architecture, and how to add visual QA that catches what diff-based review misses.
Advanced computer use agent
Production-grade reliability
Transparent Execution

The Code Review Bottleneck Nobody Talks About

Behavior-First PR Verification
When a PR opens, Sai does not just read the diff — it opens your preview deployment, logs into a test account, and clicks through the affected user flows step by step. It screenshots every state transition and flags anything that breaks, giving reviewers visual evidence instead of code comments.
Automated Bug Reproduction from Screenshots
Paste a user's bug screenshot into Sai. It explores the app, finds the exact sequence of clicks that triggers the issue, and generates an engineering-ready ticket with steps to reproduce, expected vs. actual behavior, and annotated screenshots — turning vague reports into actionable context for Claude Code.
Closed-Loop Fix Verification
After Claude Code patches the code, Sai re-runs the same test flow automatically. It captures before-and-after screenshots, checks Sentry for new errors, and posts a structured pass/fail report to Slack or GitHub — so your team never merges a fix without confirming it actually works in the product.

The Code Review Bottleneck Nobody Talks About

Your team ships faster than it reviews.

AI coding agents — Claude Code, Cursor, GitHub Copilot — generate pull requests faster than any human reviewer can read them. A senior engineer who used to review three PRs before lunch now faces twelve. The code looks clean. The tests pass. The linter is quiet.

But the checkout page is broken.

This is the code review gap in 2025: the distance between "the code is correct" and "the product works." Traditional code review — whether human or AI — reads diffs. It checks logic, patterns, and syntax. It does not open the app, click through the checkout flow, apply a coupon, and notice the total drops to negative four dollars.

Most AI code review tools make this gap wider, not smaller. They generate more comments, more suggestions, more noise. Engineers on Reddit describe the pattern: "AI review creates more work than it saves because every comment needs a human to verify whether it's real."

The problem is not that code review is too slow. The problem is that code review is incomplete. It reviews the code. Nobody reviews the product.

This guide walks through three tiers of code review automation:

  1. Manual review — how most teams do it today
  2. Claude Code review — automated diff analysis with /review and GitHub Actions
  3. Behavior-first review — Claude Code reads the code while Sai tests the product

By the end, you will know exactly how to set up each tier, when to use which, and where the real time savings come from.

How Claude Code Reviews Pull Requests

Claude Code is Anthropic's AI coding agent that runs in your terminal. It reads your codebase, understands project context, and can review code at a level that goes well beyond simple linting.

The /review command

The fastest way to get a Claude Code review is the built-in /review command:

# Review your current working changes
claude review

# Review a specific PR
claude review --pr 142

Claude Code analyzes the diff using multiple specialized subagents:

  • Logic Reviewer — checks for correctness, edge cases, and regressions
  • Security Reviewer — scans for vulnerabilities, secret exposure, and injection vectors
  • Style Reviewer — enforces naming conventions, patterns, and readability standards
  • Architecture Reviewer — flags structural issues and pattern violations

Each subagent focuses on its domain and reports independently. The result is a structured review with categorized findings, severity levels, and suggested fixes.

Claude Code as a GitHub Action

For automated PR review on every push, Claude Code offers a GitHub Action:

name: Claude Code Review
on:
  pull_request:
    types: [opened, synchronize]

jobs:
  review:
    runs-on: ubuntu-latest
    steps:
      - uses: anthropics/claude-code-action@v1
        with:
          trigger: "review"

Once configured, Claude Code:

  1. Receives the PR webhook from GitHub
  2. Clones the repository and checks out the PR branch
  3. Reads the full diff plus surrounding context
  4. Posts inline comments on specific lines
  5. Adds a summary comment with overall assessment

You can also trigger reviews manually by commenting @claude review on any PR.

What Claude Code review does well

Claude Code review is genuinely useful for:

  • Catching logic bugs that require understanding the codebase context, not just the changed lines
  • Identifying missing error handling — it reads the surrounding code and notices when a function that can throw is called without a try/catch
  • Spotting regressions — it understands what the code did before and flags when new changes break existing behavior
  • Reducing reviewer fatigue — it handles the mechanical checks so human reviewers can focus on architecture and product decisions
What Claude Code review cannot do

Claude Code reviews code. It does not run code. This means it cannot:

  • Open a browser and test the actual UI
  • Verify that a CSS change looks correct on different screen sizes
  • Check that a payment flow completes end-to-end
  • Notice that a button is now hidden behind another element
  • Reproduce a bug from a user screenshot
  • Access auth-walled tools like Sentry, Datadog, or admin dashboards

This is not a limitation of Claude Code specifically — it is the fundamental limitation of diff-based review. No tool that only reads code can tell you whether the product works.

The Gap: What AI Code Review Still Misses

Here is a real scenario. Your team uses Claude Code review on every PR. It is configured, running, and catching real bugs. Then this happens:

PR #247: Update coupon logic for cart discounts

Claude Code reviews the diff and finds:

  • No logic errors in the discount calculation
  • Proper null checks on coupon object
  • Tests pass for apply/remove coupon
  • No security issues

The PR gets merged.

Two hours later, a user reports: "I applied a $5 coupon to my cart, then removed an item. The total is now negative. I cannot check out."

What happened? The coupon logic was correct in isolation. But the interaction between coupon application and cart item removal created a state that no test covered and no diff reviewer — human or AI — would catch by reading code alone.

This is the class of bugs that grows as codebases get more complex:

  • State interaction bugs — two features that work independently but break together
  • Visual regressions — layout shifts, overlapping elements, broken responsive designs
  • Flow-dependent bugs — issues that only appear after a specific sequence of user actions
  • Environment-specific failures — staging behaves differently from local

These bugs share one characteristic: you can only find them by using the product.

The Full Loop: How Sai + Claude Code Does Code Review

Sai is an AI agent that runs on a cloud desktop. It can open browsers, click through applications, take screenshots, read error logs, and interact with tools like Sentry, Slack, and GitHub — all while running autonomously.

When paired with Claude Code, the review loop changes fundamentally:

Traditional AI review: PR opens → AI reads diff → AI posts comments → Human verifies

Sai + Claude Code review: PR opens → Claude Code reads diff → Sai opens the app → Sai tests the flows → Sai screenshots issues → Claude Code fixes the code → Sai re-tests → Structured report posted

The key difference: Claude Code reviews the code. Sai reviews the product.

How the 8-step loop works

Step 1: Trigger

The loop starts from one of three sources:

  • A GitHub PR is opened or updated (webhook trigger)
  • A user reports a bug ("checkout total is negative after applying a coupon")
  • A Sentry alert fires with a new error

Traditional review starts from the diff. This loop can start from the user experience.

Step 2: Claude Code analyzes the code

Claude Code reads the PR diff, understands the codebase context, and identifies potential issues at the code level — logic errors, missing edge cases, security concerns.

Step 3: Sai opens the preview deployment

While Claude Code reads the code, Sai opens the preview URL in a real browser on its cloud desktop. It logs in with a test account and navigates to the affected area.

Step 4: Sai tests the actual user flows

This is the critical step that no other AI review tool performs. Sai:

  • Adds items to the shopping cart
  • Applies the coupon code
  • Modifies quantities and removes items
  • Proceeds to checkout
  • Checks that totals, taxes, and discounts calculate correctly
  • Screenshots every step

Step 5: Sai generates steps-to-reproduce

If something breaks, Sai produces an engineering-ready bug report:

Steps to reproduce:1. Add 3 oranges at $2 each. Cart total: $6.002. Apply coupon code SAVE5. Cart total: $1.003. Remove one orange. Cart total: -$1.004. Click checkout. Error: "Cannot process negative total"
Expected: Total should recalculate as $4.00 - $4.00 (capped) = $0.00Actual: Total shows -$1.00Screenshots: [before_coupon.png] [after_remove.png]Console errors: NoneSentry: No new errors logged

This is not a vague comment on a diff. This is a QA ticket with evidence.

Step 6: Claude Code fixes the code

Claude Code receives the structured reproduction steps, screenshots, and error context from Sai. Instead of guessing what might be wrong, it knows exactly:

  • Which page is affected
  • What operation sequence triggers the bug
  • What the expected behavior should be
  • What the actual behavior is

It generates a targeted fix — not a speculative suggestion.

Step 7: Sai re-tests the fix

After Claude Code patches the code, Sai runs the same test sequence again:

  • Apply coupon → remove item → check total
  • Verify the total no longer goes negative
  • Capture before/after screenshots
  • Check Sentry for new errors

Step 8: Structured report to Slack / GitHub

The final output is a structured QA report posted to your team's channel:

Sai QA Review: PR #247 — Coupon Discount Logic

Status: ✅ Fixed and verified

Issue found:
Cart total became negative when removing items after applying coupon.

Root cause:
Coupon discount was applied as fixed amount without 
recalculating against updated cart total.

Fix applied:
Added cap logic — discount cannot exceed current cart subtotal.

Verification:
- Before fix: Total = -$1.00 after removing item [screenshot]
- After fix: Total = $0.00, coupon capped correctly [screenshot]
- Sentry: No new errors
- Checkout flow: Completes successfully

Step-by-Step: Setting Up Claude Code Review with Sai

Prerequisites

  • A GitHub repository with preview deployments (Vercel, Netlify, or similar)
  • A Claude Code account (for code analysis)
  • A Sai account (for visual QA and browser testing)

Step 1: Set up Claude Code GitHub Action

Add the Claude Code review action to your repository:

# .github/workflows/claude-review.yml
name: Claude Code Review
on:
  pull_request:
    types: [opened, synchronize]
  issue_comment:
    types: [created]

jobs:
  review:
    if: |
      github.event_name == 'pull_request' ||
      contains(github.event.comment.body, '@claude')
    runs-on: ubuntu-latest
    steps:
      - uses: anthropics/claude-code-action@v1
        with:
          trigger: "review"
          anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}

This gives you automated diff-level review on every PR.

Step 2: Connect Sai to your GitHub repository

In Sai, set up a webhook workflow that triggers on PR events:

  1. Open Sai → Settings → Workflows
  2. Create a new webhook workflow
  3. Select GitHub as the provider
  4. Choose your repository
  5. Set trigger event to pull_request.opened

Step 3: Define test flows

Tell Sai what to test when a PR touches specific areas:

When a PR modifies files in /src/checkout/:
1. Open preview deployment URL
2. Log in with test account
3. Add 3 items to cart
4. Apply coupon TESTCOUPON
5. Modify quantities
6. Remove one item
7. Proceed to checkout
8. Screenshot each step
9. Report any total that is negative or mismatched

Step 4: Configure reporting

Choose where Sai sends results:

  • GitHub PR comment — inline with the code review
  • Slack channel — for team visibility
  • Linear ticket — for blockers that need tracking

Step 5: Run and iterate

The first few PRs will calibrate the system. Sai learns which flows matter, what "correct" looks like, and where false positives occur. After a week, you will have a review pipeline that catches both code-level and product-level issues automatically.

Aspect Tier 1: Manual Tier 2: Claude Code Tier 3: Claude Code + Sai
Setup time None 15 min (GitHub Action) 30 min (webhook + flows)
Review speed 30-60 min / PR 2-5 min / PR 3-7 min / PR
Catches logic bugs
Catches visual bugs
Tests user flows
Provides evidence Text comments Inline comments Screenshots + STR
Verifies fixes Manual re-review ✅ Automated re-test
Human time / PR 30-60 min 10-15 min 2-5 min

Five Real-World Scenarios

Scenario 1: E-commerce checkout bug

Trigger: PR updates payment processing logic.

Claude Code finds: Missing error handling for declined cards.

Sai finds: After a declined card, the "Place Order" button remains disabled even when the user enters a valid card. The loading spinner never clears.

Result: Claude Code fixes the error handling. Sai verifies the button re-enables after a successful card entry. QA report posted to Slack.

Scenario 2: Dashboard responsive design break

Trigger: PR refactors the dashboard grid layout.

Claude Code finds: No logic issues. CSS changes look correct.

Sai finds: On tablet viewport (768px), the sidebar overlaps the main content area. Two chart widgets are completely hidden behind the navigation panel.

Result: Sai screenshots the overlap at three breakpoints. Claude Code adjusts the grid breakpoint values. Sai re-tests and confirms the layout is clean at all sizes.

Scenario 3: Authentication flow regression

Trigger: PR updates the OAuth integration for Google Sign-In.

Claude Code finds: Token refresh logic looks correct. Scopes are properly configured.

Sai finds: After signing in with Google, the redirect lands on a 404 page because the callback URL was updated in the code but not in the Google Cloud Console configuration.

Result: Sai screenshots the 404. The team updates the Google Cloud Console. Sai re-tests the full OAuth flow — sign in, redirect, session creation — and confirms it works end-to-end.

Scenario 4: Reproducing a user bug report from a screenshot

Trigger: A user posts a screenshot in Slack: "This page looks broken."

Claude Code alone: Cannot process a screenshot. Needs code context.

Sai: Opens the same page, identifies the broken layout, clicks through to reproduce the exact state. Generates steps-to-reproduce with three annotated screenshots. Hands Claude Code the file paths, the page URL, and the expected vs. actual behavior.

Result: Claude Code identifies a z-index conflict in a recently merged PR. Fixes it. Sai verifies the page renders correctly.

Scenario 5: API change breaks frontend silently

Trigger: Backend PR changes the response shape of /api/orders — renames total_amount to totalAmount.

Claude Code finds: The API change is consistent with the new naming convention. Backend tests pass.

Sai finds: The frontend order history page shows "$NaN" for every order total. The frontend code still references total_amount.

Result: Sai screenshots the broken order history. Claude Code finds the frontend reference and updates it. Sai re-tests the order history page with real data.

Stop doing repetitive tasks. Let Sai handle them for you.

Sai is your AI computer use agent — it operates your apps, automates your workflows, and gets work done while you focus on what matters.

Try Sai

FAQS

})