Six Failures and a Local Test

How six consecutive CI failures on Fastlane taught me to stop treating GitHub Actions like a REPL, and the release workflow that emerged from the wreckage.

Eyal Harush6 min read

I'm staring at a GitHub Actions tab, watching a macOS runner spin up for the sixth time. Five minutes to boot, ten seconds to read a Fastlane error, two minutes to push a fix, five more minutes to boot again. I've been doing this for two hours.

The fix, when it finally lands, will be two lines.

The problem with continuous everything

Up until day 24, every merge to master deployed straight to production. Server, web, iOS. No staging. No TestFlight build. No buffer between "I think this works" and "this is in front of real players."

For a solo dev moving fast in the first two weeks, that was fine. But by day 24 I had a backend with a real database, an economy with real coin balances, and an App Store listing that real humans had downloaded. Pushing directly to production every time I merged a PR was no longer scrappy. It was reckless.

I needed three things: a staging environment to catch server problems before they hit players, a TestFlight channel for beta testing iOS builds before App Store submission, and a release workflow that aligned with Apple's review cycle. Feature-based releases, not continuous deployment.

The design

The system I sketched out in the brainstorming session looked clean on paper. Three tiers:

Local dev — Simulator hits localhost:3000. Fast iteration, no deployment involved.

Stagingrelease/* branches auto-deploy the server to a staging Cloud Run service, upload iOS builds to TestFlight. Separate staging database on the same Cloud SQL instance. Beta testers validate here.

Production — Tags with the v* pattern deploy the server and web to production, submit iOS builds to the App Store. Master becomes integration-only: CI runs tests, but nothing deploys from a plain push.

The implementation went fast. Created the ayatower-staging database on GCP, added staging secrets, rewired both CI workflows for the three-tier trigger system, added an iOS staging scheme with a build-config-driven API URL (no more #if targetEnvironment(simulator) checks in APIClient.swift), and split the /ship skill into /integrate (merge to master) and /distribute (release to staging or production). That last split was my idea and I'm still happy with it. The two operations have fundamentally different risk profiles.

Then I needed something real to push through the pipeline. The economy rebalance was ready: a new reward formula (v1.1.0), milestone coin bonuses, updated catalog. It touched server, iOS, and the shared packages/catalog package. A perfect first release candidate because it exercised every path.

The shared-type lesson

The first thing that broke wasn't Fastlane. It was TypeScript.

I'd added a milestoneCoins field to a shared interface in the catalog package and only ran the test file for the module I was editing. CI ran the full suite and caught five downstream type errors in the progression service tests. Five files that imported the same interface, all expecting the old shape.

The fix was mechanical: update the types, re-run the tests. But the lesson stuck. After any change to a shared type: tsc --noEmit on the whole project, then the full test suite. Running one file is not enough. Added this rule to the pre-commit checklist the same day.

The Fastlane wall

With the economy rebalance merged to a release branch and the staging server deploying cleanly, it was time for the iOS pipeline. Fastlane for TestFlight on release/* pushes, Fastlane for App Store on v* tags. Automated signing with match, automated uploads with pilot.

"FAILED AGAIN !!!!!!!"

That was my reaction after failure number three. Here's the sequence:

Failure 1: Invalid CLI flag. Fastlane's --api_key_path flag doesn't work the way the docs suggest when you're passing it from a GitHub Actions secret. The secret is a file content string, not a file path.

Failure 2: Wrong file-path resolution. Switched to key_filepath but Fastlane resolves paths relative to the Fastfile location, not the working directory. The file was being written to one path and read from another.

Failure 3: Switched to key_content (pass the raw JSON string directly). Failed because the git credentials for the certificates repo weren't set up. match needs to clone a private repo to fetch signing certificates, and the macOS runner had no access.

Failure 4: Added MATCH_GIT_BASIC_AUTHORIZATION as a CI secret. Failed because code signing was set to "automatic" in the Xcode project for the staging configuration, but match needs "manual" signing to work.

Failure 5: Switched to manual signing for Staging and Release configs, kept automatic for Debug. Failed because precheck_include_in_app_purchases was set to true, which isn't supported with API key authentication.

Failure 6: Disabled the precheck. Failed because — actually, I don't remember. By failure 6 I had stopped reading the error messages carefully and started guessing at fixes.

"are we trying to invent the wheel here? there is NO way this is that difficult. fastlane is insanely popular"

I was right that it shouldn't be that difficult. The fixes were all trivial: a flag name, a path, a credential, a toggle, another toggle. None of them required understanding anything deep about Fastlane or code signing. But each one required a full CI round-trip to validate: push the fix, wait five minutes for the macOS runner to boot, watch it fail in ten seconds, read the error, push another fix. By failure 4, the five-minute wait had turned me into a guesser instead of a reader.

"no way to test this workflow locally?"

There was. Of course there was. I ran bundle exec fastlane staging on my laptop. It worked on the first attempt.

The rule

CI is for validation, not exploration. If a tool can run on your machine, run it there first. Every CI failure is a minimum five-minute round-trip on a macOS runner. Six failures is thirty minutes of staring at a progress bar, and by the fourth failure you've stopped reading error messages and started cargo-culting fixes.

The local test that finally worked took about twenty seconds. Same command, same Fastfile, same credentials from the same .env. When it failed locally, I got the error right away, fixed it, re-ran. No five-minute dead zone between cause and effect.

I added "test locally before pushing to CI" to the project's CLAUDE.md that same evening. It's the kind of rule that sounds obvious written down. It took six failures to actually learn it.

v0.1.0

After the local Fastlane test passed, I pushed the fix and the CI pipeline went green on the first attempt. Tagged v0.1.0. All three workflows triggered in parallel: server deployed to production, web deployed to production, iOS build uploaded to the App Store.

The economy rebalance was live. TestFlight had a staging build. The release workflow was real.

One lingering mystery: the previous App Store build (v0.0.2) was stuck in "Ready for Distribution" but had never actually gone live. Auto-release was enabled, the review had passed, but the build just sat there. I still don't know why. Possibly an agreements issue, possibly an App Store Connect UI bug.

"man you suck," I told Claude after we'd both failed to figure it out.

Some bugs aren't code bugs.

What the pipeline looks like now

The release workflow that survived day 24:

  • Feature branches PR into release/* branches, never directly into master
  • Merging into a release branch auto-deploys the server to staging and uploads a TestFlight build
  • Tagging with v* deploys everything to production and submits the App Store build
  • Master is integration-only: CI validates, nothing deploys
  • /integrate handles feature merges, /distribute handles releases
  • After any shared type change: tsc --noEmit + full test suite before committing

Eight hours of work. Thirty minutes of which were watching macOS runners boot up to tell me things I could have learned in twenty seconds on my own laptop.

This is post 17 of 18 in a series about building Geo Climber with Claude Code. The release pipeline that shipped v0.1.0 was designed in one sitting and debugged over six CI failures. The seventh attempt worked because I ran it on my laptop first. Join the Discord and download Geo Climber on the App Store.

ci-cdfastlanerelease-workflowstagingdebugging