Why this matters
Count-based triage is a trap. The loudest fingerprint in your error tracker is usually a cron loop hitting a stale row 5,000 times an hour — annoying, low-impact, fixable Tuesday.
Meanwhile the bug that three Pro accounts hit during checkout — twice each, total of 6 events — generated 80% of the revenue risk in that window. Sentry sorts it 80th on the list. You won’t see it until those three accounts churn.
The fix is to weight every error by the affected user’s plan value. Datadog won’t do this. Sentry won’t do this. The combination of error data + analytics data lives in one place in Agentry, so the agent can JOIN them in a single HogQL query and rank by dollars-at-risk instead of dumb count.
What you get
- Top 5 open cases ranked by estimated ARR-at-risk (users-affected × per-user ARR)
- For the #1 case: the specific accounts involved, with their plan tiers
- A suggestion for who at your company should call those customers today
Walk through it
Show me open cases ranked by dollars-at-risk. For each, multiply users_affected by their plan ARR. Top 5.
Let me pull open cases with their affected user lists, then join against the latest plan value per user.
agentry_list_cases {
project_id: "default",
status: "open",
since: "14d",
include_affected_distinct_ids: true,
limit: 25
}
{
"cases": [
{ "id": "c_f72a", "message": "TypeError on user.plan", "users_affected": 14,
"event_count": 89, "distinct_ids": ["[email protected]", "[email protected]", "..."] },
{ "id": "c_1cd9", "message": "fetch failed: invoice.pdf", "users_affected": 9,
"event_count": 11, "distinct_ids": ["..."] },
{ "id": "c_2d8e", "message": "checkout 500: stripe webhook timeout", "users_affected": 3,
"event_count": 6, "distinct_ids": ["[email protected]", "[email protected]", "..."] }
]
}
Now let me join the affected distinct_ids against current plan tier.
agentry_analytics_query {
project_id: "default",
hogql: "SELECT distinct_id, argMax(properties.plan, timestamp) AS plan FROM events WHERE distinct_id IN ({ids}) GROUP BY distinct_id",
params: { ids: ["[email protected]", "[email protected]", "[email protected]", "[email protected]", "..."] }
}
{
"rows": [
{ "distinct_id": "[email protected]", "plan": "free" },
{ "distinct_id": "[email protected]", "plan": "starter" },
{ "distinct_id": "[email protected]", "plan": "enterprise" },
{ "distinct_id": "[email protected]", "plan": "scale" }
]
}
Ranked by ARR-at-risk (assuming starter=$300, scale=$3,000, enterprise=$30,000):
c_2d8e— checkout 500 on Stripe webhook timeout — $63,000 ARR at risk Only 6 events but it’s hitting 2 enterprise + 1 scale account. This is your #1.c_f72a— TypeError on user.plan — $4,200 ARR 14 users but 11 are on free tier (the bug literally fires on the free-tier path).c_1cd9— invoice PDF fetch — $2,700 ARR 9 users, mostly starter.
The numbers flip completely. c_2d8e is 15x more important than c_f72a by revenue, but raw count says the opposite. Want me to pull the case detail on c_2d8e and draft a Slack message to whoever owns Stripe?
The output
Top 5 cases by ARR-at-risk (14d, open)
$63,000 c_2d8e checkout 500: stripe webhook timeout
3 users · 6 events
Affected: [email protected] (enterprise $30k),
[email protected] (scale $3k),
[email protected] (enterprise $30k)
↗ Owner: @stripe-oncall — investigate webhook latency
$4,200 c_f72a TypeError on user.plan (free-tier path)
14 users · 89 events · mostly free tier
↗ Low ARR, but fixable in 4 lines
$2,700 c_1cd9 invoice.pdf fetch failed
9 users · 11 events · mixed plans
↗ Probably CORS regression on invoice-service
$600 c_a3e1 phone validation NANP-only
7 users · 14 events · all starter
↗ Low priority unless we want international growth
$300 c_4f01 pagination edge case
2 users · 3 events · starter
↗ Easy fix, ship next sprint
Net ARR at risk: $70,800
Setting it up
The recipe joins two things: cases (your error data) and events (your analytics data, where plan lives). For the join to work, both must use the same distinct_id.
// Error reporting — distinct_id (or user.id) MUST match analytics
await fetch(`https://api.agentry.sh/v1/logs/${PROJECT_ID}/`, {
method: "POST",
headers: {
"Authorization": `Bearer ${process.env.AGENTRY_DSN}`,
"Content-Type": "application/json",
"User-Agent": "myapp/1.0", // REQUIRED — Cloudflare 403s default UAs
},
body: JSON.stringify({
message: err.message,
stack: err.stack,
user: { id: currentUser?.email }, // ← same id as analytics
}),
});
// Analytics — every event carries `plan` so we can JOIN later
await fetch(`https://api.agentry.sh/v1/analytics/${PROJECT_ID}/`, {
method: "POST",
headers: {
"Authorization": `Bearer ${process.env.AGENTRY_DSN}`,
"Content-Type": "application/json",
"User-Agent": "myapp/1.0",
},
body: JSON.stringify({
event: "feature_used",
distinct_id: currentUser.email, // ← same id as errors
properties: { plan: currentUser.plan, mrr: currentUser.mrr },
}),
});
If you don’t track plan on every event, an alternative is to fire a one-shot plan_changed event whenever it changes — the HogQL argMax(properties.plan, timestamp) pattern picks up the latest value either way.
Variations
- “Same ranking, but only count enterprise customers (plan = ‘enterprise’).”
- “Show errors ranked by ARR-at-risk over the last 24 hours, not 14 days. Daily oncall summary.”
- “For the top case, draft a per-customer outreach email apologizing and offering credit.”
- “Run this every weekday at 9am and post the top-3 to #cs-oncall with the affected emails.” (uses a Routine)