Find the cron that fails every Sunday at 3am

Bucket errors by hour-of-day and day-of-week to surface periodic failures hiding inside normal noise.

Published

difficulty intermediate · time to value 1 minute · tools used 2

Just say this

Are there error patterns that only fire at certain times? Bucket errors by hour-of-day and day-of-week and show me anomalies.

Why this matters

Some bugs fire 200 events a day and you see them. Some bugs fire 50 events but only between 02:50 and 03:10 UTC, every Sunday — these look like noise in a daily count but jump out as “weekly backup job is broken” the moment you bucket by hour.

Sentry’s UI shows you a time-series for one fingerprint at a time; spotting periodicity requires you to already suspect it. Datadog can do hour-of-day heatmaps if you build the dashboard. Nobody builds the dashboard.

The agent builds it on demand. One prompt, one heatmap, the cron-shaped bugs reveal themselves.

What you get

  • A per-fingerprint heatmap (hour-of-day × day-of-week) for errors trending above baseline
  • Top 3 fingerprints with the highest periodicity score (concentration in a narrow window)
  • A guess at what they map to (3am UTC Sunday = weekly backup, Monday 9am = digest email, etc.)

Walk through it

You

Are there error patterns that only fire at certain times? Bucket errors by hour-of-day and day-of-week and show me anomalies.

Agent

Let me pull error counts bucketed by (fingerprint, hour, day-of-week) for the last 28 days.

agentry_analytics_query {
  project_id: "default",
  hogql: "SELECT properties.fingerprint AS fp, toHour(timestamp) AS hour, toDayOfWeek(timestamp) AS dow, count() AS cnt FROM events WHERE event = '$exception' AND timestamp >= now() - INTERVAL 28 DAY GROUP BY fp, hour, dow HAVING cnt > 5 ORDER BY fp"
}
{
  "rows": [
    { "fp": "f_a3e1", "hour": 9,  "dow": 1, "cnt": 124 },
    { "fp": "f_a3e1", "hour": 10, "dow": 1, "cnt": 18 },
    { "fp": "f_a3e1", "hour": 9,  "dow": 2, "cnt": 4 },
    { "fp": "f_b7c2", "hour": 3,  "dow": 7, "cnt": 47 },
    { "fp": "f_b7c2", "hour": 3,  "dow": 14, "cnt": 0 },
    { "fp": "f_72ac", "hour": 14, "dow": 1, "cnt": 22 },
    { "fp": "f_72ac", "hour": 14, "dow": 2, "cnt": 24 },
    { "fp": "f_72ac", "hour": 14, "dow": 3, "cnt": 19 }
  ]
}
Agent

Two fingerprints have very high periodicity (>80% of events fall in one hour-of-week bucket):

  1. f_b7c2 — 47 events, all Sunday 03:00 UTC. Classic weekly cron failure. Let me pull the case.
  2. f_a3e1 — 124 events, Monday 09:00 UTC. Looks like a digest/welcome email batch.

f_72ac is high-volume but evenly distributed across weekdays — that’s regular user traffic, not periodic.

agentry_get_case { case_id: "f_b7c2" }
{
  "id": "f_b7c2",
  "error_type": "TimeoutError",
  "message": "Statement timeout: backup-rotation took >300s",
  "stack_top": "src/jobs/weekly-backup.ts:88",
  "first_seen_at": 1745020800
}
Agent

f_b7c2 is your weekly backup rotation timing out. It’s been firing every Sunday at 03:00 UTC for 4 weeks, ~12 errors per run. Whoever wrote this hasn’t seen the alert because they’re asleep and nobody triages Sundays. Fix: paginate the rotation or bump the statement timeout. Want me to draft the PR?

The output

Periodic-error heatmap (28d)

f_b7c2  TimeoutError: backup-rotation took >300s
         src/jobs/weekly-backup.ts:88
         Periodicity: 100% in Sunday 03:00 UTC bucket (4 occurrences)
         ↗ Weekly cron — your `runs every Sunday 3am` job

f_a3e1  SMTPError: rate-limit exceeded (Postmark)
         src/jobs/weekly-digest.ts:42
         Periodicity: 88% in Monday 09:00 UTC bucket (4 occurrences)
         ↗ Weekly digest batch — hitting Postmark cap

NON-PERIODIC (regular user traffic):
f_72ac  TypeError on auth.ts:42 — evenly distributed across weekdays
         (see deploy-regression or silent-bugs for triage)

Setting it up

This works out of the box — no extra instrumentation needed. The recipe just slices on the event timestamps you’re already sending.

A small upgrade: tag your scheduled jobs with tags.job_name so the agent can ground its guess:

try {
  await runWeeklyBackup();
} catch (err) {
  await fetch(`https://api.agentry.sh/v1/logs/${PROJECT_ID}/`, {
    method: "POST",
    headers: {
      "Authorization": `Bearer ${process.env.AGENTRY_DSN}`,
      "Content-Type": "application/json",
      "User-Agent": "myapp-jobs/1.0",  // REQUIRED — Cloudflare 403s default UAs
    },
    body: JSON.stringify({
      message: err.message,
      stack: err.stack,
      tags: {
        job_name: "weekly-backup-rotation",  // ← THIS LINE
        cron_expr: "0 3 * * 0",
      },
    }),
  });
  throw err;
}

With job_name set, the agent says “your weekly-backup-rotation job is failing every Sunday at 3am” instead of guessing from the stack trace.

Variations

  • “Same analysis but for analytics events — are there usage patterns that spike on specific weekdays?”
  • “Find errors that only fire during business hours (9-5 UTC). Probably user-triggered.”
  • “Show errors that NEVER fire on weekends. They’re probably internal-tool bugs.”
  • “Daily cron: surface any new fingerprint with >70% periodicity score in last 14 days.” (uses a Routine)

Try this recipe in your own agent.

Paste the prompt above into your agent. It'll set up Agentry against your data, then run the recipe and show you the real output.

Install https://agentry.sh/agentry.md and set it up