Why this matters
Unattended agents are powerful. That power needs guardrails — not “audit log later” but “alert now on the actions that, if you didn’t take them, mean something bad happened.”
The high-stakes mutations:
feature_flag.delete— could roll back a stable production state for many userssuppression.addwith broad pattern — could silence a real production alertcohort.delete— could undo a long-running analytics setuppublication.createwith surprising scope — could leak data through a public dashboardwebhook.deletefor the security webhook itself — meta-attack
If any of these happen in the middle of the night, you want to know right then. The Routine is the cheap way to enforce that — no IAM rewriting, no fancy SIEM, just a scheduled prompt that compares the audit log to “what would I tolerate at 3am.”
What you get
- A scheduled check (every 30 min off-hours, every 4h on-hours)
- Webhook delivery to #security on any matching high-impact action
- A self-protection clause (the Routine alerts if anyone tries to delete its own webhook)
Walk through it
Set up a Routine that watches the audit log for high-impact actions: feature_flag deletes, suppression adds with broad patterns, cohort deletes, publication creates. If any happens outside business hours (9am-6pm Pacific Mon-Fri), ping #security at https://hooks.slack.com/services/T0000/B_sec/X.
Webhook first (and important: protect this specific webhook from deletion via the very same check we’re about to build).
agentry_register_webhook {
project_id: "default",
url: "https://hooks.slack.com/services/T0000/B_sec/X",
description: "Audit log security alerts (DO NOT DELETE)",
events: ["routine.security_alert"]
}
{ "webhook_id": "wh_sec", "signing_secret": "agws_…", "active": true }
Now the Routine. Off-hours run is every 30 min; business-hours run is every 4h. The prompt does the time check inline.
/schedule "*/30 * * * *" "
1. Compute now in America/Los_Angeles. If hour is 9-17 AND weekday Mon-Fri, this is business hours.
2. Call agentry_recent_changes { hours: 1 } to get last hour of audit.
3. Filter to high-impact actions:
- feature_flag.delete
- suppression.create where fingerprint_pattern is broad (length < 8 chars, OR contains '*')
- cohort.delete
- publication.create
- webhook.delete where webhook_id = wh_sec ← SELF-PROTECTION
4. For each matching row, classify severity:
- During business hours: low (only ping if > 5 in 1h)
- Off hours: HIGH (ping for any single occurrence)
5. POST to webhook wh_sec with:
- The action, actor, timestamp
- Severity classification
- Link to the audit row
- Suggested action ('Acknowledge' / 'Rollback' / 'Investigate')
"
{ "routine_id": "rt_sec", "next_run_at": "2026-05-16T14:00:00Z" }
Done. Routine rt_sec fires every 30 minutes. Off-hours = strict; business hours = relaxed (only pings on volume).
Critical safety note about the self-protection clause: if anyone (or any compromised key) tries to delete wh_sec itself, the next Routine run will detect that and surface the deletion via the backup channel I should also configure. Let me set that up too.
agentry_register_webhook {
project_id: "default",
url: "https://api.pagerduty.com/integration/.../enqueue",
description: "Audit security backup (PagerDuty)",
events: ["routine.security_alert_backup"]
}
{ "webhook_id": "wh_sec_backup", "signing_secret": "agws_…", "active": true }
Now any mutation to wh_sec will also fire wh_sec_backup. Even if Slack is sabotaged, PagerDuty rings.
Want me to extend the Routine to log a positive-acknowledgment ping every Monday morning (“security audit is live, 0 issues last 168h”) so you notice if it stops firing?
The output
security_alert Routine (rt_sec)
✓ Schedule: every 30 minutes
✓ Primary webhook: wh_sec → #security
✓ Backup webhook: wh_sec_backup → PagerDuty
✓ Self-protection: monitors webhook.delete on wh_sec itself
✓ Business hours: Mon-Fri 09:00-17:00 America/Los_Angeles
✓ High-impact actions monitored:
- feature_flag.delete
- suppression.create (broad patterns)
- cohort.delete
- publication.create
- webhook.delete (especially wh_sec)
✓ First run: 2026-05-16T14:00:00Z
A Slack alert (off-hours):
🚨 SECURITY: high-impact mutation off-hours
Action: feature_flag.delete
Resource: flag_main_login_v2 ← production auth flag
Actor: api_key_prefix=agk_x7…
Occurred: 2026-05-17T02:14:00Z (02:14 PT, Saturday)
Severity: HIGH (off-hours + production flag)
Did you do this? If NO, take the following actions IMMEDIATELY:
1. agentry_rotate_key (revoke the key that did it)
2. Restore flag from audit log snapshot
3. Pull last 24h of audit log for full IR
Audit detail: https://agentry.sh/audit/c_aud_42
Setting it up
The key pieces:
1. Define your high-impact action list. The defaults above are conservative — adapt to your environment. Some teams add member.add (new teammate joins), recipe.run for specific destructive recipes, etc.
2. Define “business hours” precisely. Time-of-day alerts are noisy if your “after hours” window includes 5pm Friday in EU. Use the timezone where most engineering activity happens, and treat “anyone on call” hours as business hours.
3. Set up the backup channel. Single-channel monitoring is fragile (the channel itself can be compromised). A second target on a different vendor (PagerDuty, email, SMS) makes the whole thing resistant to sabotage.
4. Test fire it. Once registered, manually do one of the watched actions (delete a test flag) outside business hours and confirm both channels light up. This is the only way to know your wiring actually works.
// Test fire — create then delete a test flag
const flag = await fetch(
`https://api.agentry.sh/v1/projects/${PROJECT_ID}/feature-flags`,
{
method: "POST",
headers: {
"Authorization": `Bearer ${process.env.AGENTRY_API_KEY}`,
"Content-Type": "application/json",
"User-Agent": "audit-test/1.0", // REQUIRED — Cloudflare 403s default UAs
},
body: JSON.stringify({ key: "test_audit_canary", active: false }),
},
).then(r => r.json());
await fetch(
`https://api.agentry.sh/v1/projects/${PROJECT_ID}/feature-flags/${flag.flag.id}`,
{
method: "DELETE",
headers: {
"Authorization": `Bearer ${process.env.AGENTRY_API_KEY}`,
"User-Agent": "audit-test/1.0",
},
},
);
// next Routine run should fire wh_sec
Variations
- “Add a ‘change-management window’ override — during a planned release, suppress alerts for known mutations the on-call engineer is making.”
- “Per-resource severity: deleting a flag that’s at 100% rollout is worse than deleting one at 0%. Weight by impact, page only on real risk.”
- “Integrate with your IR runbook: on a HIGH alert, also fetch the last 5 actions by that api_key_prefix and include them in the alert for context.”
- “Geographic check — alert if any action originates from an IP not in your usual office/VPN ranges.”