← Back to Blog

Two weeks ago I posted about shipping our iPhone app to TestFlight in 4 days. That native app exists as a staff dashboard. But this trip’s real workhorse was the boring one: QualityMax’s own mobile web in iPhone Safari — the same product, the same routes, just rendered responsive on a 6.1-inch screen.

The last week has been a stress test of that idea, but in the other direction. I haven’t just watched from the phone. I’ve been merging from it. Features. Bug fixes. One revert.

A six-day Faroe Islands trip: Berlin → Copenhagen → Tórshavn on Monday May 18, Tórshavn → Berlin on Saturday May 23. Cliffs, ferries, guesthouse Wi-Fi, weather changing every ten minutes, zero days at a proper desk. The work didn’t slow down. If anything it sped up, because the pipeline did the work that I would normally have done by re-reading my own diff at 11pm.

Faroe Islands cliffs during a phone-first QualityMax shipping week
The actual backdrop: Faroe Islands cliffs, a phone, and a CI pipeline doing the heavy lifting.

Here’s what that looked like.

PRs landed

43
6 days · git log --since=May 18

Mix

features · fixes · CI
backend, mobile UI, pipeline plumbing

Days behind a desk

0 / 6
phone + Woodpecker the whole way

Reverts

1
the gates were right; I wasn’t

What’s actually in my pocket

The phone isn’t writing the code. I’m not pretending you can hammer out a Playwright executor refactor on a 6.1-inch screen. The phone is the observer and the merge button. The writing happens elsewhere — mostly in qmax-code on a laptop session I kicked off before I left, or in Claude Code, or in Codex. Then it lands as a PR, and the pipeline takes over.

Three browser tabs and one app do the work:

And one important thing that isn’t on the phone: any local terminal access. Everything I merge from the phone has to have already passed enough gates that I’m willing to trust it without re-running anything myself. That’s the whole point.

The four QM gates that did the work

Every PR runs through four sequential gates in Woodpecker. Three of them are hard blocks. One is a soft warn. Together they’re the thing that lets me click “Merge” from a ferry terminal.

ALPHA

AI review + SAST
5-persona structured review, SAST scan, prompt-injection check, secret scan. Hard block on BLOCK verdict.

GAMMA

Native test suite
pytest + Go + Rust + JS. Lint, type-check, unit, integration. The full local suite, in CI.

DELTA

QM-generated scripts
Playwright tests the QM crawler wrote against earlier builds. Soft warn — signal, not gate.

BETA

Preview-deploy E2E
Playwright run against the live Railway preview. Last word before merge.

The non-gate plumbing matters too: ruff, mypy, pylint, promptfoo, semgrep, supply-chain pytest, prompt-lint, two layers of SPA navigation E2E, fresh HTTP-integration coverage on the AI-crawler service. Each runs in its own Woodpecker step. If any of them are red I can see exactly which in the dashboard from the phone.

The thing that lets me trust a phone-merge isn’t any one gate. It’s that there’s nowhere for a bad commit to hide.

Bugs I found because I was actually using the app

The funniest thing about doing your phone work in your own product is that you start finding your own bugs. On a phone. As a phone user. With phone-sized hands and one-handed thumb reach.

A clean batch of mobile-responsive UI fixes landed across the trip’s final weekend:

Production didn’t crash, no one was paged — but these are the paper-cuts visible only to users on the surface I was now living on. The dogfooding loop only works if you actually use the thing on its target surface, and a Faroe Islands roadside stop with uneven signal is a much better mobile testbed than any office.

Backend features I shipped from the same phone

The interesting half of these six days is the backend work that landed while I was nowhere near a backend.

The pattern was always the same: I’d kick off a qmax-code or Claude Code session before leaving, give it a ticket as context, watch it open a PR. Then the four gates would run, the AI review would post, the SAST would scan, the Playwright preview would deploy and assert. By the time I opened the PR on my phone the review was already there.

What actually shipped from the phone:

The revert

The honest part of this post.

On Sunday May 24 — the day after I flew back from the islands, still living on the phone — around lunchtime I merged a P0 stuck-crawl fix that bundled two AI-crawl improvements into one PR. The AI review had been generally positive. Two of the five personas had flagged blockers about completed-step semantics and missing test coverage for one of the integration modes. I read them on the phone, decided they could be follow-ups, hit merge.

They could not be follow-ups.

Within an hour, the stuck-crawl behavior on the exact test case I’d been chasing was worse than before the fix. Bugsink picked up new exceptions. The bug-bot started ticking on them. Beta on subsequent PRs started timing out where it hadn’t before.

This is the receipt:

Sun 12:31
P0 fix merged after I overrode two persona-review blockers. Phone, away from the desk.
Sun 13:24
Bugsink + Beta on the next PR show a regression on the same test case. Mobile Safari surfaces both within seconds.
Sun 13:48
Opened a revert PR on the phone. Pre-commit hooks ran in CI; the four gates passed on a clean revert.
Sun 14:02
Revert merged. Bugsink event rate drops back to baseline.
Mon — Tue
Re-do in three smaller PRs — pipeline resilience, retry budget v2, selector rewrite v2. This time the persona blockers were addressed before merge, not after. Last one landed Tuesday.

The lesson isn’t that I shouldn’t merge from a phone. The lesson is the one the AI review was already telling me: when persona-review flags a blocker, “follow-up” is not an answer. The system was correct. I overrode it. The system then caught the consequences within an hour and gave me an unbroken way to undo it. Total customer-visible damage: zero, because Bugsink + Beta + the bug-bot triangulated the regression before any user hit it in earnest.

If you want one sentence on why we run our own pipeline against our own commits: it’s so the system is right even when I’m wrong.

Why Woodpecker, and why all this plumbing

People ask why we run our own CI. Three reasons:

Also: our Woodpecker setup is now battle-tested in the way only production teaches. The hard-won lessons are written into the contributor docs as durable instructions for the next agent that touches the pipeline — including a four-PR silent-pipeline outage that turned out to be a config file-vs-directory ambiguity, and a separate parse-time failure mode where a missing CI secret aborts the whole manifest set before any when: clause is evaluated.

What this proves

Three things, in order of importance:

We’ll keep building the platform the same way we sold it: every commit through the gates, every bug a future test, every system improvement compounding the next one. The next post is about what happens when the bug-detection bot starts opening PRs against itself.

Run the same gates on your own PRs

Alpha (AI review + SAST + prompt-injection), Gamma (your test suite), Delta (QM-generated scripts), Beta (Playwright on preview). Same pipeline we ran on every PR in this post. Connect your repo and the bot ships the same receipts on your next PR — no Woodpecker required on your side.

Get Started