Find queue latency and dead-letter spikes

Why this matters

Queues fail quietly. The UI still loads, but emails arrive late, imports stall, and webhooks lag. This recipe makes queue health visible from the same agent loop as errors and deploys.

What you get

p95 job latency by queue and job type
Retry and dead-letter spikes
First deploy or cron window correlated with the spike
The job type to inspect first

Walk through it

You

Are background jobs backing up?

Agent

I’ll compare completed and failed job events by queue.

POST /v1/projects/:project_id/analytics/query {
  project_id: "default",
  hogql: "SELECT event, properties.queue AS queue, properties.job_type AS job_type, count() AS count, quantile(0.95)(toFloat(properties.latency_ms)) AS p95_latency_ms, max(toInt(properties.attempt)) AS max_attempt FROM events WHERE event IN ('job_completed','job_failed') AND timestamp > now() - INTERVAL 24 HOUR GROUP BY event, queue, job_type ORDER BY count DESC LIMIT 50"
}

The output

The agent returns a ranked table and identifies whether the issue is latency, retries, or terminal failures. If failures map to cases, it should fetch the top case before suggesting a fix.

Setting it up

Emit job lifecycle events from the queue worker. Use stable job type names and queue names.

Variations

“Only show jobs with p95 latency over 5 minutes.”
“Compare the queue after the last deploy.”
“Group dead letters by exception fingerprint.”

Try this recipe in your own agent.

Ask your agent to adapt the starter prompt to your saved signal map and live events, then run it against your data.

Install agentry.sh/install.md for meInstall https://agentry.sh/install.md for me

Agent will onboard itself and then your app

Find queue latency and dead-letter spikes

Why this matters

What you get

Walk through it

The output

Setting it up

Variations

Related recipes

Mobile background-sync failure monitoring

Find connection-pool / timeout patterns in errors

Find the cron that fails every Sunday at 3am

Try this recipe in your own agent.