Nightly Routine: draft PRs for low-risk silent bugs

Routine runs at 2am, picks the top silent bug, investigates it, opens a draft PR if the fix is small and contained. Lights-out triage for the next morning's review.

Published

difficulty advanced · time to value 1 hour to set up · tools used 3

Just say this

Set up a Routine that runs every night at 2am. Pick the top silent bug (cases with users_affected ≥ 5 but no agent_summary). Investigate it. If the fix touches < 20 LOC and no schema/config, open a draft PR. Otherwise file a 1-paragraph triage note.

Why this matters

The silent-bugs recipe is the on-demand version: you sit at your desk, ask “what’s silently breaking right now,” the agent finds them. Great but bounded — you have to be there.

This is the lights-out version. The Routine wakes at 2am, picks the top silent bug, investigates it, and either:

  • Opens a draft PR (if the fix is small, contained, and well-understood)
  • Files a one-paragraph triage note (if the fix needs human judgment)

You wake up to a Slack notification: “I drafted PR #420 for the silent NullPointer in BillingForm — 14 LOC, no schema, no config change. Review and merge if it looks right.” Or: “I triaged the runaway image-decoder bug but the fix needs a new env var — see the note.” Either way, the work moved forward overnight.

Most teams never get to nightly bug-fixes because the labor cost is real (an engineer attached to each one). With a Routine, the engineer-attached-cost is “5 minutes reviewing a PR while drinking coffee.”

What you get

  • A scheduled cloud agent that runs nightly
  • Per run: picks the top silent bug, investigates, drafts a PR or a note
  • Strict guardrails (PR only if <20 LOC + no schema/config touched + tests still pass)
  • A morning notification with the result

Walk through it

You

Set up a Routine that runs every night at 2am. Pick the top silent bug (cases with users_affected ≥ 5 but no agent_summary). Investigate it. If the fix touches < 20 LOC and no schema/config, open a draft PR. Otherwise file a 1-paragraph triage note.

Agent

This is a big-but-safe Routine because of the strict guardrails. Let me schedule it.

/schedule "0 2 * * *" "
1. agentry_list_cases { status: 'open', has_agent_summary: false, min_users_affected: 5,
                        order_by: 'users_affected desc', limit: 1 } → pick TARGET_CASE
   If empty: log 'no silent bugs to triage', exit silently.

2. agentry_get_case { case_id: TARGET_CASE.id } → get stack, message, env, fingerprint

3. Reproduce locally: cd into the project, search for the file:line in the stack,
   read 50 lines around the call site. Read related files.

4. Hypothesize the fix. Validate against the audit:
   - LOC touched < 20?
   - Any schema migrations needed?     → if yes, abort PR, file triage note
   - Any config / env / secret changes?  → if yes, abort PR, file triage note
   - Tests for the affected file exist?  → if no, abort PR
   - Does the change pass typecheck and existing tests?  → if no, abort

5. If all guards pass:
   a) Create a branch: fix/<fingerprint>
   b) Apply the fix
   c) Add a regression test
   d) gh pr create --draft --title 'fix(<area>): null-guard in <file>' --body '<full
      summary including: case link, root cause, fix description, test added>'
   e) agentry_remember case_id=TARGET_CASE.id memo='Auto-PR #<num> opened nightly; review tomorrow'

6. Otherwise:
   a) agentry_remember case_id=TARGET_CASE.id memo='Auto-triage: <one-paragraph
      hypothesis + why I couldn't auto-fix it (e.g. 'needs new env var X')>'

7. Post to wh_morning Slack: a one-line summary of step 5 or step 6.
"
{ "routine_id": "rt_nightly", "next_run_at": "2026-05-17T02:00:00Z" }
Agent

Done. Routine rt_nightly fires every night at 2am. The morning Slack message will be one of two forms:

🌙 Nightly triage — May 17

  ✅ Drafted PR #420
     Case: c_npe_1 (BillingForm.tsx:118 — null user.address)
     Fix: 14 LOC, added null-guard + regression test
     Tests pass: ✓     Typecheck: ✓
     Review: https://github.com/you/repo/pull/420

Or:

🌙 Nightly triage — May 17

  📝 Triage note (no PR — needs human)
     Case: c_oom_2 (ImageDecoder OOM at 64MB)
     Hypothesis: image dimensions exceeded ImageDecoder default budget
     Why no auto-PR: fix needs a new env var IMAGE_MAX_BUDGET_MB and
       a config change to deploy/staging.yaml — out of guardrail scope.
     Read the note: https://agentry.sh/case/c_oom_2#memory

The output

auto-fix Routine (rt_nightly)

  ✓ Schedule:        nightly at 02:00 UTC
  ✓ Target:          top silent bug (≥5 users, no prior summary)
  ✓ PR guardrails:
      max 20 LOC, no schema, no config, tests must pass
  ✓ Slack:           wh_morning   → #eng-mornings
  ✓ First run:       2026-05-17T02:00:00Z

Expected morning notification (2 forms):

  ✅ Drafted PR #N (full fix)
       Case + diff link
       Test added, all checks green

  📝 Triage note only (humans needed)
       Hypothesis + why auto-fix was blocked
       Read note: agentry.sh/case/...

Setting it up

1. The Routine itself. The /schedule prompt above is the full Routine. The agent running it has access to the MCP toolbox + a shell with git and gh configured.

2. GitHub CLI auth. The Routine needs gh authenticated to a bot account or scoped PAT. Set up the credentials in the Routine’s execution context — see your Routines docs for secret injection.

3. Tune the guardrails. Defaults are conservative for a reason. After a week of green PRs, you might relax LOC to 50 or allow simple package upgrades. Don’t relax “no schema / no config” — that’s where Routines get genuinely dangerous.

4. Pair with weekly-agent-digest. Friday’s digest will include “drafted 4 PRs, 2 merged, 2 closed” — a quick read on whether the Routine is actually shipping value or just generating PRs you ignore.

5. Add a kill switch. Set up a feature flag (disable_nightly_auto_fix) that the Routine checks before acting. If you ever need to pause it, flip the flag instead of editing the schedule:

// At the top of the Routine prompt:
//   if agentry_evaluate_feature_flag('disable_nightly_auto_fix', 'system') is true,
//   log and exit silently.

Variations

  • “Same Routine but scoped to my mobile app only — production-Android cases ≥ 100 users.”
  • “Hourly variant: don’t wait for 2am, just run continuously. Each hour pick the top untriaged case.”
  • “For cases the Routine drafts a PR on, auto-assign to the engineer who last modified the file (CODEOWNERS lookup).”
  • “Add a ‘maximum PRs per day’ cap (3?) so you don’t wake up to a flood when your error rate spikes.”
  • “On Slack message, include a ‘snooze 24h’ button that adds the case fingerprint to a no-touch list.”

Try this recipe in your own agent.

Paste the prompt above into your agent. It'll set up Agentry against your data, then run the recipe and show you the real output.

Install https://agentry.sh/agentry.md and set it up