This week we shipped our iPhone app to TestFlight. Day 0 to day 5, signed and ready to share with the team.
But we started initially as a web E2E testing platform. Web Playwright runs. Web pipeline gates. Web everything. The platform doesn't even have first-class mobile testing yet — that's still on the roadmap.
So how did we build mobile without any mobile-first tooling? And — more interesting — what did we learn from doing it?
Time to TestFlight
Stack
Mobile engineers
Net-new patterns
Why mobile when you sell web testing?
Same reason any founder ends up with a phone in their hand at 11pm: ops dashboards.
We were already running an internal staff dashboard for live CI run state, bug-detection bot tick progress, error tracking, infra health checks, billing audits. All useful. All built for the desktop. None of it accessible the minute you stepped away from your laptop.
And the kicker — over the last few weeks we'd been actively building more things that demanded immediate attention. A bug-detection bot that opens PRs from production errors. A capability-gated Vibe builder that streams tests as it builds them. Each of those wants you to see what's happening now, not when you next sit down.
A mobile app for our own staff dashboard wasn't a feature. It was infrastructure for the way we already work.
The dogfooding paradox: how do you test mobile if you build web testing?
This was the joke we had to confront on day one. Our platform — the thing we sell to customers — runs Playwright against a live preview environment, gates every PR with five AI personas, and posts findings as a structured comment thread. None of that touches a phone.
So we did the only honest thing: we treated the mobile app as a downstream consumer of the same backend, not as a new testable surface. Everything the mobile app reads — projects, runs, bug-bot status, infra health, chat history — already had server-side tests in the platform repo. Everything the app does to itself — navigation, layout, theme — gets the lightest-possible test treatment because we'd rather catch UI bugs in TestFlight with our own team than build the wrong mobile-testing system before we even know what we want.
The honest answer to "how do you test mobile?" was: we don't yet, and we shipped anyway. The next post in this series is about building mobile testing into the platform — but you can't sensibly do that before you've felt the pain of not having it.
What unblocked us
Two things, both compounding:
(1) We scoped the V1 to ourselves. No public marketing for the app. No App Store push. No customer-facing features. Just the staff dashboard, the chat / Vibe entry point, infrastructure health, and bug-bot status. When the app's only user is the founding team, the bar for "done" is "we'd actually use this from a phone" — measurable in days, not quarters.
(2) We let our own AI tooling do the heavy lifting. The web staff dashboard is HTML + plain JS. Our AI agent has a year of conversational context on that codebase. So porting it screen-by-screen into a React Native + Expo app was, mechanically, a sequence of "here is the web component, mirror its behavior in RN" prompts. Each new screen averaged 30-60 minutes from "I'd like this screen on mobile" to a working build on the simulator.
We've been writing about closed-loop dogfooding for a year now — see how we ran our own marketing redesign through the same review gates — and this is the same pattern at a larger scope. The platform's job is to make us as effective on its surface area as we want our customers to be on theirs. Building a mobile app for ops was a forcing function to find the gaps.
Hero anecdote: the app caught its own production bug
The thing that made everything we'd just built suddenly feel worth it happened the day after we shipped to TestFlight.
I was at a coffee shop. I opened the iPhone app to check Bugsink — our self-hosted error tracker, which the app renders in a dashboard view. And there it was, fresh, five events in the last few minutes:
UnboundLocalError: cannot access local variable 'datetime' where it is not associated with a value at services/chat_service.py:488 in _save_messages_to_db during /api/chat/ events: 5 first seen: 2 hours ago
I recognized it instantly. We'd added a Vibe builder UI to the mobile app earlier that day. Its first call hit /api/chat/ — the same endpoint as the web Vibe — and that endpoint is supposed to write rows to chat_messages. The Bugsink trace pointed at a classic Python scope trap:
# module-level: only UTC is imported from datetime import UTC # …deep in _save_messages_to_db: if recent_msg.data and len(recent_msg.data) > 0: from datetime import datetime, timedelta # inline import … # later, outside that branch: if not conversation_id: conversation_id = f"conv_{datetime.now(UTC).strftime(...)}_..."
The inline from datetime import datetime, timedelta made datetime a local variable for the entire function at compile time. When the if recent_msg.data branch didn't execute — exactly the path a brand-new user's first chat message takes — the later line tried to use datetime before it had been bound. UnboundLocalError. The outer try/except swallowed it and the endpoint still returned 200. Silently, every new user's first Vibe message was being dropped.
Here's the full receipt:
UnboundLocalError.chat_service.py.from datetime import datetime creating the scope shadow./api/chat/ request succeeds end-to-end. Bugsink stops recording the error.Under 20 minutes, from "noticed on phone" to "fixed in production." The mobile app caught a real bug in our backend before any user got around to complaining about it.
And the loop closed itself. The app was built so we could see what's happening in production from anywhere. The first interesting thing it showed us was a bug in production. The bug fix shipped through the same pipeline gates we'd just rebuilt the app with. The next test the bot tries to write may use the very chat endpoint we just unbroke.
What this proves
Three things, in order of importance:
- Scope-to-self beats scope-to-market. A mobile app for ourselves shipped in 4 days. A mobile app for customers would have taken months. When your dogfooding loop closes tight, you ship the things you actually need now instead of the things you might need later.
- AI agents make capability gaps less expensive. We've never staffed a mobile engineer. We have a year of accumulated context on our own web codebase, which our AI agents use to port that codebase into a new surface. The cost of acquiring a new capability is no longer "hire someone with that capability." It's "find someone who can hold the system in their head and ask the right questions."
- Production observability is the first feature any internal tool needs. If you build something for ops and it doesn't surface production errors immediately, you've built a status page, not an operations tool. The 20-minute fix anecdote above only happens because the app was a Bugsink dashboard from day one — not a Bugsink dashboard we added later.
We're not a mobile testing platform. Yet. But shipping a mobile app changes what we can test next — and that, more than the app itself, is what made this week worth writing about.
Same gates we ran on our own mobile build
The PR that fixed the datetime bug went through the same Alpha (AI review), Gamma (test suite), and SAST + prompt-injection gates we run on every customer PR. Connect QualityMax to your repo and the bot ships the same receipts on your next PR.