We Redesigned 14 Landing Pages — Through Our Own AI Review Gates

← Back to Blog

This week we shipped a full light-theme rollout across 14 marketing landing pages: a real "rise from below" CI banner, WCAG 2.2 AA contrast pass, mobile sticky behavior, animated theme toggle, and per-page light-mode overrides for every dark hardcoded color we'd ever written.

Three PRs. 23 commits. +2,200 lines of HTML/CSS/JS.

Every single commit went through the same pipeline gates we sell to customers. No special-cased "marketing site" exception. The same Alpha gate (AI diff review). The same SAST + prompt-injection + brute-force scans on every push. The same multi-persona review process. Same receipts.

If we don't trust our gates with our own brand pages, why would you?

What "dogfooding" actually means in practice

Most teams say they dogfood. Then you ask "how" and the answer is "we use it internally sometimes." That's not dogfooding — that's occasional sampling.

Real dogfooding means: every PR we open against our own repo runs through the same gates a customer's PR runs through. The bot sometimes blocks our own merges. We argue with it in PR comments. When it's wrong, we improve the prompt. When it's right and we're annoyed, we still fix the code.

Here's what one PR's pipeline result looks like. This isn't a screenshot we built for marketing — it's the real bot comment that gets posted on every PR, including the one that shipped the very page you're reading this on:

AI Review

Clean

claude-haiku-4-5

Repo Tests

5 / 5

41s · chromium

SAST findings

on this diff

Prompt injections

on this diff

Brute-force attempts

on this diff

The Alpha gate: AI review with personas

The marquee change in this redesign was a sticky CI banner that "rises" from the bottom of the viewport as you scroll past the hero. Easy to describe, surprisingly hard to get right.

So before merging the main PR (#473), I ran a 5-persona expert review through the AI review pipeline. Five separate Claude agents, each given a different lens, each producing an independent punch list with file:line references and concrete fixes:

📈

Conversion

8 findings

♿

A11y / WCAG

10 findings

🎨

Visual craft

10 findings

📱

Mobile UX

8 findings

⚡

Motion

10 findings

The results were brutal. The accessibility persona flagged a WCAG 2.4.11 violation I'd missed — the new sticky CI banner was obscuring focused hero CTAs during keyboard tab. The motion persona pointed out that the "rise" effect didn't actually rise: the banner pinned to the viewport bottom from page load, so there was no animated entrance. The visual designer caught a 1,432-line override file with 246 hardcoded hex literals and 36 !important declarations in direct violation of our own theme-vars rule.

I would have shipped that PR thinking it was great. Three of those five personas would have correctly hated it.

One human reviewer can hold maybe two perspectives at once. Five focused AI personas, each running independently, will catch what one human misses every time. This isn't replacing human review — it's catching the obvious before a human ever has to read the diff.

P0 issues (8) and P1 issues (10) all got fixed before merge. P2 structural debt (the 1,432-line override file refactor) got tracked as a separate concern. Same workflow we recommend to customers.

Security gates on a marketing site? Yes.

Every PR in our repo also runs three security checks: SAST findings, prompt injections blocked, and brute-force attempts detected. The numbers from these scans are what populates the live "0 SAST findings · 0 prompt injections blocked · 0 brute-force attempts detected" line in the new CI banner you can see at the top of the homepage.

You might ask: "It's a marketing page. Why scan it for security?"

Three reasons:

HTML can leak. Hardcoded API tokens in inline JavaScript. Webhook URLs in script tags. Customer email addresses in test fixtures someone forgot to remove. SAST finds them.
The pipeline doesn't know it's "just" a marketing page. If we flagged this PR with skip-security, we'd be teaching the team that some PRs don't need scanning. That's how things slip in.
The receipt matters. When the live CI banner says 0 SAST findings, that number isn't a marketing claim — it's a fact derived from the most recent build of this exact site. If a SAST issue lands tomorrow, the number on the homepage updates. Self-incriminating telemetry is the only kind worth showing.

The bot blocked our own PR — and that's the point

Halfway through the redesign work, the QualityMax Test Suite gate started failing on every PR — including this one. The five backend tests it ran (Stripe billing portal, webhook idempotency, GitHub App install, repo import, onboarding consistency) all returned Unknown error.

Nothing to do with the landing page. Turned out to be a regression in our own cloud-dispatch path that landed two hours earlier in a different PR — exactly the kind of in-memory-vs-database race that CLAUDE.md warns about. The progress oscillation (0% → 40% → 60% → 0% → 100% failed) was the smoking gun.

The fix was a small workflow change to:

Add paths-ignore so HTML/CSS-only PRs don't trigger backend integration tests they can't possibly affect
Replace silent "Unknown error" with the full /trigger response and a structured per-test error dump

So the next time the gate fails, the operator gets a real diagnosis instead of a vibe.

This is what dogfooding catches. We use our own product enough that its weird edges become our edges. We fix them, ship the fix, and the customer who would have hit the same issue six months from now never does.

The compounding loop

Every PR that goes through our pipeline produces three artifacts:

The merged code
A bot comment thread (review + reactions + replies) that becomes per-repo memory for the next review
A row in our internal ledger of what the AI got right and what it got wrong

That third artifact is what makes the system get better over time. When the multi-persona review caught a real WCAG violation, that finding became a calibration signal: future reviews on UI changes get a higher weight on focus management. When the AI flagged a false positive on a deliberate code pattern, a 👎 reaction taught the reviewer to skip it next time (we wrote about that loop in Teaching the Reviewer).

Three months from now, the AI reviewer that runs on your repo will have learned from every false positive a different team hit on this exact landing redesign work. That's the compounding part.

What this proves

Three things, in order of importance:

Gates that work on marketing sites work on production code. If we can ship a 14-page redesign through SAST + prompt-injection + brute-force + AI review + Playwright in 23 commits without the gates becoming theater, your real-stakes pull requests will work fine too.
Multi-persona AI review finds what one reviewer can't. Eight P0 + ten P1 findings on a "this looks fine" PR. The accessibility violation alone would have shipped to every screen-reader user. The mobile catastrophe (sticky banner eating 27% of an iPhone viewport) would have shipped to half the traffic.
Self-incriminating telemetry beats a thousand testimonials. The live "0 failures" badge on our homepage updates when our own build fails. We can't fake the number — and that constraint is what makes it credible.

If the AI reviewer is good enough for the page that pays our bills, it's good enough for yours.

Run our gates on your next PR

Connect QualityMax to your repo in 3 minutes. The Alpha gate (AI review), Gamma (your test suite), Delta (AI-generated framework scripts), and Beta (Playwright against your preview) all run on every PR. Same gates we use on ours.

Get Started