Monitor webhook delivery reliability

Track failed, retried, and slow webhook deliveries by provider, event type, and customer before integrations silently break.

Published

difficulty intermediate · time to value 5 minutes

Start from this

Show webhook delivery health for the last 7 days: failures, retries, p95 delivery time, and customers with repeated delivery problems.

Why this matters

Webhook failures are support tickets with a delay. Your customer thinks your product stopped syncing, but the real signal is buried in retry logs. Agentry can rank unreliable integrations from the event stream.

What you get

  • Failure and retry rates by provider and event type
  • Customers with repeated delivery problems
  • Slow delivery outliers
  • A short remediation plan: customer endpoint issue, provider outage, or your retry worker

Walk through it

You

Which webhooks are failing this week?

Agent

I’ll group delivery events by provider, event type, and customer.

POST /v1/projects/:project_id/analytics/query {
  project_id: "default",
  hogql: "SELECT properties.provider AS provider, properties.event_type AS event_type, properties.customer_id AS customer, count() AS deliveries, countIf(properties.status = 'failed') AS failed, countIf(toInt(properties.attempt) > 1) AS retried, quantile(0.95)(toFloat(properties.duration_ms)) AS p95_ms FROM events WHERE event = 'webhook_delivery' AND timestamp > now() - INTERVAL 7 DAY GROUP BY provider, event_type, customer ORDER BY failed DESC, retried DESC LIMIT 50"
}

The output

The agent returns a list of customers and event types that need action. It should distinguish one-off retries from persistent failure and include whether failures started after a deploy.

Setting it up

Emit webhook_delivery from the worker that sends outbound webhooks. Include both terminal failures and successful retries.

Variations

  • “Alert if any enterprise customer has more than 10 failed deliveries in an hour.”
  • “Which event type is causing most retries?”
  • “Compare webhook health before and after the new retry policy.”

Try this recipe in your own agent.

Ask your agent to adapt the starter prompt to your saved signal map and live events, then run it against your data.

Install agentry.sh/install.md for me
Agent will onboard itself and then your app