Why this matters
Webhook failures are support tickets with a delay. Your customer thinks your product stopped syncing, but the real signal is buried in retry logs. Agentry can rank unreliable integrations from the event stream.
What you get
- Failure and retry rates by provider and event type
- Customers with repeated delivery problems
- Slow delivery outliers
- A short remediation plan: customer endpoint issue, provider outage, or your retry worker
Walk through it
Which webhooks are failing this week?
I’ll group delivery events by provider, event type, and customer.
POST /v1/projects/:project_id/analytics/query {
project_id: "default",
hogql: "SELECT properties.provider AS provider, properties.event_type AS event_type, properties.customer_id AS customer, count() AS deliveries, countIf(properties.status = 'failed') AS failed, countIf(toInt(properties.attempt) > 1) AS retried, quantile(0.95)(toFloat(properties.duration_ms)) AS p95_ms FROM events WHERE event = 'webhook_delivery' AND timestamp > now() - INTERVAL 7 DAY GROUP BY provider, event_type, customer ORDER BY failed DESC, retried DESC LIMIT 50"
}
The output
The agent returns a list of customers and event types that need action. It should distinguish one-off retries from persistent failure and include whether failures started after a deploy.
Setting it up
Emit webhook_delivery from the worker that sends outbound webhooks. Include both terminal failures and successful retries.
Variations
- “Alert if any enterprise customer has more than 10 failed deliveries in an hour.”
- “Which event type is causing most retries?”
- “Compare webhook health before and after the new retry policy.”