Why this matters
Average latency hides pain. One route can be fine for free users and brutal for enterprise accounts with larger payloads. This recipe tells engineering where performance work maps to customer value.
What you get
- p50, p95, and p99 latency by route
- Error rate and request volume beside latency
- Segments by customer and plan
- A ranked fix list weighted toward paid usage
Walk through it
Show slow API routes for paid customers.
I’ll verify the api_request shape and rank routes by p95 latency for paid plans.
GET /v1/projects/:project_id/event-property-keys {
project_id: "default",
events: ["api_request"]
}
POST /v1/projects/:project_id/analytics/query {
project_id: "default",
hogql: "SELECT properties.route AS route, properties.plan AS plan, count() AS requests, quantile(0.95)(toFloat(properties.duration_ms)) AS p95_ms, quantile(0.99)(toFloat(properties.duration_ms)) AS p99_ms, avg(toInt(properties.status_code) >= 500) AS error_rate FROM events WHERE event = 'api_request' AND timestamp > now() - INTERVAL 24 HOUR GROUP BY route, plan ORDER BY p95_ms DESC LIMIT 30"
}
The output
The agent returns a table of routes with p95/p99, request volume, error rate, and which customers are affected. It should avoid recommending a route with 3 requests over one with 30,000 unless the affected customer is high value.
Setting it up
Emit one api_request event from middleware. Keep route templates stable (/v1/users/:id, not raw IDs) so aggregation works.
Variations
- “Only show enterprise customers.”
- “Compare p95 this week vs last week.”
- “Find routes where retries increased after the last deploy.”