Why this matters
The silent-bugs recipe is the on-demand version: you sit at your desk, ask “what’s silently breaking right now,” the agent finds them. Great but bounded — you have to be there.
This is the lights-out version. The Routine wakes at 2am, picks the top silent bug, investigates it, and either:
- Opens a draft PR (if the fix is small, contained, and well-understood)
- Files a one-paragraph triage note (if the fix needs human judgment)
You wake up to a Slack notification: “I drafted PR #420 for the silent NullPointer in BillingForm — 14 LOC, no schema, no config change. Review and merge if it looks right.” Or: “I triaged the runaway image-decoder bug but the fix needs a new env var — see the note.” Either way, the work moved forward overnight.
Most teams never get to nightly bug-fixes because the labor cost is real (an engineer attached to each one). With a Routine, the engineer-attached-cost is “5 minutes reviewing a PR while drinking coffee.”
What you get
- A scheduled cloud agent that runs nightly
- Per run: picks the top silent bug, investigates, drafts a PR or a note
- Strict guardrails (PR only if <20 LOC + no schema/config touched + tests still pass)
- A morning notification with the result
Walk through it
Set up a Routine that runs every night at 2am. Pick the top silent bug (cases with users_affected ≥ 5 but no agent_summary). Investigate it. If the fix touches < 20 LOC and no schema/config, open a draft PR. Otherwise file a 1-paragraph triage note.
This is a big-but-safe Routine because of the strict guardrails. Let me schedule it.
/schedule "0 2 * * *" "
1. agentry_list_cases { status: 'open', has_agent_summary: false, min_users_affected: 5,
order_by: 'users_affected desc', limit: 1 } → pick TARGET_CASE
If empty: log 'no silent bugs to triage', exit silently.
2. agentry_get_case { case_id: TARGET_CASE.id } → get stack, message, env, fingerprint
3. Reproduce locally: cd into the project, search for the file:line in the stack,
read 50 lines around the call site. Read related files.
4. Hypothesize the fix. Validate against the audit:
- LOC touched < 20?
- Any schema migrations needed? → if yes, abort PR, file triage note
- Any config / env / secret changes? → if yes, abort PR, file triage note
- Tests for the affected file exist? → if no, abort PR
- Does the change pass typecheck and existing tests? → if no, abort
5. If all guards pass:
a) Create a branch: fix/<fingerprint>
b) Apply the fix
c) Add a regression test
d) gh pr create --draft --title 'fix(<area>): null-guard in <file>' --body '<full
summary including: case link, root cause, fix description, test added>'
e) agentry_remember case_id=TARGET_CASE.id memo='Auto-PR #<num> opened nightly; review tomorrow'
6. Otherwise:
a) agentry_remember case_id=TARGET_CASE.id memo='Auto-triage: <one-paragraph
hypothesis + why I couldn't auto-fix it (e.g. 'needs new env var X')>'
7. Post to wh_morning Slack: a one-line summary of step 5 or step 6.
"
{ "routine_id": "rt_nightly", "next_run_at": "2026-05-17T02:00:00Z" }
Done. Routine rt_nightly fires every night at 2am. The morning Slack message will be one of two forms:
🌙 Nightly triage — May 17
✅ Drafted PR #420
Case: c_npe_1 (BillingForm.tsx:118 — null user.address)
Fix: 14 LOC, added null-guard + regression test
Tests pass: ✓ Typecheck: ✓
Review: https://github.com/you/repo/pull/420
Or:
🌙 Nightly triage — May 17
📝 Triage note (no PR — needs human)
Case: c_oom_2 (ImageDecoder OOM at 64MB)
Hypothesis: image dimensions exceeded ImageDecoder default budget
Why no auto-PR: fix needs a new env var IMAGE_MAX_BUDGET_MB and
a config change to deploy/staging.yaml — out of guardrail scope.
Read the note: https://agentry.sh/case/c_oom_2#memory
The output
auto-fix Routine (rt_nightly)
✓ Schedule: nightly at 02:00 UTC
✓ Target: top silent bug (≥5 users, no prior summary)
✓ PR guardrails:
max 20 LOC, no schema, no config, tests must pass
✓ Slack: wh_morning → #eng-mornings
✓ First run: 2026-05-17T02:00:00Z
Expected morning notification (2 forms):
✅ Drafted PR #N (full fix)
Case + diff link
Test added, all checks green
📝 Triage note only (humans needed)
Hypothesis + why auto-fix was blocked
Read note: agentry.sh/case/...
Setting it up
1. The Routine itself. The /schedule prompt above is the full Routine. The agent running it has access to the MCP toolbox + a shell with git and gh configured.
2. GitHub CLI auth. The Routine needs gh authenticated to a bot account or scoped PAT. Set up the credentials in the Routine’s execution context — see your Routines docs for secret injection.
3. Tune the guardrails. Defaults are conservative for a reason. After a week of green PRs, you might relax LOC to 50 or allow simple package upgrades. Don’t relax “no schema / no config” — that’s where Routines get genuinely dangerous.
4. Pair with weekly-agent-digest. Friday’s digest will include “drafted 4 PRs, 2 merged, 2 closed” — a quick read on whether the Routine is actually shipping value or just generating PRs you ignore.
5. Add a kill switch. Set up a feature flag (disable_nightly_auto_fix) that the Routine checks before acting. If you ever need to pause it, flip the flag instead of editing the schedule:
// At the top of the Routine prompt:
// if agentry_evaluate_feature_flag('disable_nightly_auto_fix', 'system') is true,
// log and exit silently.
Variations
- “Same Routine but scoped to my mobile app only — production-Android cases ≥ 100 users.”
- “Hourly variant: don’t wait for 2am, just run continuously. Each hour pick the top untriaged case.”
- “For cases the Routine drafts a PR on, auto-assign to the engineer who last modified the file (CODEOWNERS lookup).”
- “Add a ‘maximum PRs per day’ cap (3?) so you don’t wake up to a flood when your error rate spikes.”
- “On Slack message, include a ‘snooze 24h’ button that adds the case fingerprint to a no-touch list.”