Track crash-free user rate per app version

Distinct users who hit a crash divided by DAU, per app version. The industry-standard mobile health metric — surface version-by-version drops fast.

Published

difficulty intermediate · time to value 2 minutes · tools used 2

Just say this

What's our crash-free rate by app version? Distinct users who hit any crash ÷ DAU. Surface any version with a sudden drop.

Why this matters

Crash-free user rate is the single number that mobile teams live and die by. Apple and Google’s stated guidance is >99.5% — drop under 99% and you start hemorrhaging users to uninstalls and see your App Store ranking get downgraded by their stability scoring.

But the rate that matters is per app version, not aggregate. Aggregate looks fine because 80% of your users are on the stable old version. The 20% who upgraded to v4.2 yesterday are hitting a crash on first launch — and your aggregate dashboard hides it because their misery is averaged out.

Agentry computes per-version crash-free rate as a single query. Pin it to a Routine that runs every hour during a rollout, and you’ll catch a bad version before it reaches the 50% rollout step.

What you get

  • Crash-free user rate per app version, sorted by version (newest first)
  • DAU per version (so you can ignore versions with no traffic)
  • A clear delta — which versions are below your target (typically 99.5%)

Walk through it

You

What’s our crash-free rate by app version? Distinct users who hit any crash ÷ DAU. Flag any version that’s below 99% over the last 24h.

Agent

One HogQL pass: distinct users who had a crash, distinct users who used the app, grouped by version.

agentry_analytics_query {
  project_id: "default",
  query: "
    with active as (
      select properties.app_version as ver,
        count(distinct distinct_id) as dau
      from events
      where event = 'app_open'
        and timestamp > now() - interval 24 hour
      group by ver
    ),
    crashed as (
      select properties.app_version as ver,
        count(distinct distinct_id) as crashed_users
      from events
      where event = '$exception'
        and timestamp > now() - interval 24 hour
      group by ver
    )
    select
      a.ver,
      a.dau,
      coalesce(c.crashed_users, 0) as crashed,
      1 - (coalesce(c.crashed_users, 0) / a.dau) as crash_free_rate
    from active a
    left join crashed c on c.ver = a.ver
    where a.dau > 50
    order by ver desc
  "
}
{
  "rows": [
    { "ver": "4.2.0", "dau":  18420, "crashed": 442, "crash_free_rate": 0.9760 },
    { "ver": "4.1.3", "dau":  62100, "crashed":  92, "crash_free_rate": 0.9985 },
    { "ver": "4.1.2", "dau":  41880, "crashed":  61, "crash_free_rate": 0.9985 },
    { "ver": "4.1.1", "dau":  12340, "crashed":  18, "crash_free_rate": 0.9985 },
    { "ver": "4.0.5", "dau":   3420, "crashed":   5, "crash_free_rate": 0.9985 }
  ]
}
Agent

Found it. v4.2.0 is at 97.6% crash-free — that’s a regression from your normal 99.85%. 442 users crashed in 24h out of 18,420 active. Let me look at what’s crashing.

agentry_list_cases {
  project_id: "default",
  matching: "",
  filter: { property: "app_version", value: "4.2.0" },
  window_hours: 24
}
{
  "cases": [
    {
      "case_id": "c_mob_1",
      "fingerprint": "f_42_a",
      "error_type": "NullPointerException",
      "message": "Attempt to invoke virtual method 'getString' on null PushTokenManager",
      "events_count": 380,
      "users_affected": 380,
      "introduced_in_deploy": "4.2.0"
    },
    {
      "case_id": "c_mob_2",
      "fingerprint": "f_42_b",
      "error_type": "OutOfMemoryError",
      "message": "Failed to allocate 64MB during ImageDecoder init",
      "events_count":  62,
      "users_affected":  62,
      "introduced_in_deploy": "4.2.0"
    }
  ]
}
Agent

Two issues, both introduced in 4.2.0:

  1. NullPointerException in PushTokenManager (380 users — accounts for 86% of the crash budget). Looks like push registration runs before Application.onCreate finishes. Easy fix.
  2. OutOfMemoryError in ImageDecoder (62 users). 64MB allocation — bigger images or you forgot a downsample. Affects low-RAM Androids.

Recommendation: roll back v4.2.0 from the rollout (right now you’re at 18k DAU on it; if the rollout proceeds you’ll have 200k+ users crashing). Push 4.2.1 with these two fixes.

Want me to set up a Routine that auto-pages oncall if any new version drops below 99%?

The output

Crash-free user rate by app version (24h)

Version    DAU       Crashes   Crash-Free   Target (99.5%)
─────────────────────────────────────────────────────────────
4.2.0     18,420       442       97.60%      FAIL ★★★
4.1.3     62,100        92       99.85%      PASS
4.1.2     41,880        61       99.85%      PASS
4.1.1     12,340        18       99.85%      PASS
4.0.5      3,420         5       99.85%      PASS

★ v4.2.0 regression: -2.25 pp vs baseline

Crash budget for v4.2.0 (24h):
  Total crashes:           442
  Top contributor (86%):   NullPointerException in PushTokenManager
  Second (14%):            OutOfMemoryError in ImageDecoder

Recommended actions (urgent):
  1. HALT v4.2.0 rollout in App Store / Play Console
  2. Patch the two fingerprints
  3. Ship 4.2.1 once verified in staging

Setting it up

Mobile crashes flow to /v1/logs/. The two properties you need are app_version (so the query can group) and user.id (so distinct-user counts work):

// Android global crash handler
Thread.setDefaultUncaughtExceptionHandler { thread, err ->
  val payload = JSONObject().apply {
    put("message", err.message)
    put("stack", err.stackTraceToString())
    put("error_type", err.javaClass.simpleName)
    put("environment", BuildConfig.BUILD_TYPE)
    put("app_version", BuildConfig.VERSION_NAME)
    put("user", JSONObject().put("id", currentUserId()))
  }
  // Fire-and-forget POST to /v1/logs/ — see agentry.md for the helper
  postToAgentry("/v1/logs/${PROJECT_ID}/", payload)
  defaultHandler.uncaughtException(thread, err)
}
// iOS — pair with NSSetUncaughtExceptionHandler + a signal handler
func reportCrash(_ exception: NSException) {
  var req = URLRequest(url: URL(string: "https://api.agentry.sh/v1/logs/\(projectId)/")!)
  req.httpMethod = "POST"
  req.setValue("Bearer \(dsn)", forHTTPHeaderField: "Authorization")
  req.setValue("application/json", forHTTPHeaderField: "Content-Type")
  req.setValue("myapp-ios/\(Bundle.main.releaseVersion)", forHTTPHeaderField: "User-Agent")  // REQUIRED
  let body: [String: Any] = [
    "message": exception.reason ?? "",
    "stack": exception.callStackSymbols.joined(separator: "\n"),
    "error_type": exception.name.rawValue,
    "app_version": Bundle.main.releaseVersion,
    "user": ["id": currentUserId()],
  ]
  req.httpBody = try? JSONSerialization.data(withJSONObject: body)
  URLSession.shared.dataTask(with: req).resume()
}

Fire app_open (or session_start) on cold start with the same user.id so DAU computes correctly:

// Application.onCreate
postAnalytics("app_open", mapOf(
  "app_version" to BuildConfig.VERSION_NAME,
  "platform" to "android",
  "os_version" to Build.VERSION.RELEASE
))

Variations

  • “Same metric but per OS version — is the crash hitting only Android 14, or all versions?”
  • “Crash-free SESSIONS instead of users — closer to Google Play’s metric definition.”
  • “Set up a Routine: every hour, recompute, page #mobile-oncall if any active-rollout version drops below 99%.”
  • “Compare crash-free rates for users who came from organic vs paid installs — sometimes ad networks send bot traffic that crashes immediately.”

Try this recipe in your own agent.

Paste the prompt above into your agent. It'll set up Agentry against your data, then run the recipe and show you the real output.

Install https://agentry.sh/agentry.md and set it up