Find the cron that fails every Sunday at 3am

Why this matters

Some bugs fire 200 events a day and you see them. Some bugs fire 50 events but only between 02:50 and 03:10 UTC, every Sunday — these look like noise in a daily count but jump out as “weekly backup job is broken” the moment you bucket by hour.

Sentry’s UI shows you a time-series for one fingerprint at a time; spotting periodicity requires you to already suspect it. Datadog can do hour-of-day heatmaps if you build the dashboard. Nobody builds the dashboard.

The agent builds it on demand. One prompt, one heatmap, the cron-shaped bugs reveal themselves.

What you get

A per-fingerprint heatmap (hour-of-day × day-of-week) for errors trending above baseline
Top 3 fingerprints with the highest periodicity score (concentration in a narrow window)
A guess at what they map to (3am UTC Sunday = weekly backup, Monday 9am = digest email, etc.)

Walk through it

You

Are there error patterns that only fire at certain times? Bucket errors by hour-of-day and day-of-week and show me anomalies.

Agent

Let me pull error counts bucketed by (fingerprint, hour, day-of-week) for the last 28 days.

agentry_analytics_query {
  project_id: "default",
  hogql: "SELECT properties.fingerprint AS fp, toHour(timestamp) AS hour, toDayOfWeek(timestamp) AS dow, count() AS cnt FROM events WHERE event = '$exception' AND timestamp >= now() - INTERVAL 28 DAY GROUP BY fp, hour, dow HAVING cnt > 5 ORDER BY fp"
}

{
  "rows": [
    { "fp": "f_a3e1", "hour": 9,  "dow": 1, "cnt": 124 },
    { "fp": "f_a3e1", "hour": 10, "dow": 1, "cnt": 18 },
    { "fp": "f_a3e1", "hour": 9,  "dow": 2, "cnt": 4 },
    { "fp": "f_b7c2", "hour": 3,  "dow": 7, "cnt": 47 },
    { "fp": "f_b7c2", "hour": 3,  "dow": 14, "cnt": 0 },
    { "fp": "f_72ac", "hour": 14, "dow": 1, "cnt": 22 },
    { "fp": "f_72ac", "hour": 14, "dow": 2, "cnt": 24 },
    { "fp": "f_72ac", "hour": 14, "dow": 3, "cnt": 19 }
  ]
}

Agent

Two fingerprints have very high periodicity (>80% of events fall in one hour-of-week bucket):

f_b7c2 — 47 events, all Sunday 03:00 UTC. Classic weekly cron failure. Let me pull the case.
f_a3e1 — 124 events, Monday 09:00 UTC. Looks like a digest/welcome email batch.

f_72ac is high-volume but evenly distributed across weekdays — that’s regular user traffic, not periodic.

agentry_get_case { case_id: "f_b7c2" }

{
  "id": "f_b7c2",
  "error_type": "TimeoutError",
  "message": "Statement timeout: backup-rotation took >300s",
  "stack_top": "src/jobs/weekly-backup.ts:88",
  "first_seen_at": 1745020800
}

Agent

f_b7c2 is your weekly backup rotation timing out. It’s been firing every Sunday at 03:00 UTC for 4 weeks, ~12 errors per run. Whoever wrote this hasn’t seen the alert because they’re asleep and nobody triages Sundays. Fix: paginate the rotation or bump the statement timeout. Want me to draft the PR?

The output

Periodic-error heatmap (28d)

f_b7c2  TimeoutError: backup-rotation took >300s
         src/jobs/weekly-backup.ts:88
         Periodicity: 100% in Sunday 03:00 UTC bucket (4 occurrences)
         ↗ Weekly cron — your `runs every Sunday 3am` job

f_a3e1  SMTPError: rate-limit exceeded (Postmark)
         src/jobs/weekly-digest.ts:42
         Periodicity: 88% in Monday 09:00 UTC bucket (4 occurrences)
         ↗ Weekly digest batch — hitting Postmark cap

NON-PERIODIC (regular user traffic):
f_72ac  TypeError on auth.ts:42 — evenly distributed across weekdays
         (see deploy-regression or silent-bugs for triage)

Setting it up

This works out of the box — no extra instrumentation needed. The recipe just slices on the event timestamps you’re already sending.

A small upgrade: tag your scheduled jobs with tags.job_name so the agent can ground its guess:

try {
  await runWeeklyBackup();
} catch (err) {
  await fetch(`https://api.agentry.sh/v1/logs/${PROJECT_ID}/`, {
    method: "POST",
    headers: {
      "Authorization": `Bearer ${process.env.AGENTRY_DSN}`,
      "Content-Type": "application/json",
      "User-Agent": "myapp-jobs/1.0",  // REQUIRED — Cloudflare 403s default UAs
    },
    body: JSON.stringify({
      message: err.message,
      stack: err.stack,
      tags: {
        job_name: "weekly-backup-rotation",  // ← THIS LINE
        cron_expr: "0 3 * * 0",
      },
    }),
  });
  throw err;
}

With job_name set, the agent says “your weekly-backup-rotation job is failing every Sunday at 3am” instead of guessing from the stack trace.

Variations

“Same analysis but for analytics events — are there usage patterns that spike on specific weekdays?”
“Find errors that only fire during business hours (9-5 UTC). Probably user-triggered.”
“Show errors that NEVER fire on weekends. They’re probably internal-tool bugs.”
“Daily cron: surface any new fingerprint with >70% periodicity score in last 14 days.” (uses a Routine)

Find the cron that fails every Sunday at 3am

Why this matters

What you get

Walk through it

The output

Setting it up

Variations

Related recipes

Detect a 429-spike before customers tweet about it

Find connection-pool / timeout patterns in errors

Find what broke after the last npm update

Try this recipe in your own agent.