Alert on unexpected agent mutations

Why this matters

Unattended agents are powerful. That power needs guardrails — not “audit log later” but “alert now on the actions that, if you didn’t take them, mean something bad happened.”

The high-stakes mutations:

feature_flag.delete — could roll back a stable production state for many users
suppression.add with broad pattern — could silence a real production alert
cohort.delete — could undo a long-running analytics setup
publication.create with surprising scope — could leak data through a public dashboard
webhook.delete for the security webhook itself — meta-attack

If any of these happen in the middle of the night, you want to know right then. The Routine is the cheap way to enforce that — no IAM rewriting, no fancy SIEM, just a scheduled prompt that compares the audit log to “what would I tolerate at 3am.”

What you get

A scheduled check (every 30 min off-hours, every 4h on-hours)
Webhook delivery to #security on any matching high-impact action
A self-protection clause (the Routine alerts if anyone tries to delete its own webhook)

Walk through it

You

Set up a Routine that watches the audit log for high-impact actions: feature_flag deletes, suppression adds with broad patterns, cohort deletes, publication creates. If any happens outside business hours (9am-6pm Pacific Mon-Fri), ping #security at https://hooks.slack.com/services/T0000/B_sec/X.

Agent

Webhook first (and important: protect this specific webhook from deletion via the very same check we’re about to build).

agentry_register_webhook {
  project_id: "default",
  url: "https://hooks.slack.com/services/T0000/B_sec/X",
  description: "Audit log security alerts (DO NOT DELETE)",
  events: ["routine.security_alert"]
}

{ "webhook_id": "wh_sec", "signing_secret": "agws_…", "active": true }

Agent

Now the Routine. Off-hours run is every 30 min; business-hours run is every 4h. The prompt does the time check inline.

/schedule "*/30 * * * *" "
1. Compute now in America/Los_Angeles. If hour is 9-17 AND weekday Mon-Fri, this is business hours.
2. Call agentry_recent_changes { hours: 1 } to get last hour of audit.
3. Filter to high-impact actions:
   - feature_flag.delete
   - suppression.create where fingerprint_pattern is broad (length < 8 chars, OR contains '*')
   - cohort.delete
   - publication.create
   - webhook.delete where webhook_id = wh_sec  ← SELF-PROTECTION
4. For each matching row, classify severity:
   - During business hours: low (only ping if > 5 in 1h)
   - Off hours: HIGH (ping for any single occurrence)
5. POST to webhook wh_sec with:
   - The action, actor, timestamp
   - Severity classification
   - Link to the audit row
   - Suggested action ('Acknowledge' / 'Rollback' / 'Investigate')
"

{ "routine_id": "rt_sec", "next_run_at": "2026-05-16T14:00:00Z" }

Agent

Done. Routine rt_sec fires every 30 minutes. Off-hours = strict; business hours = relaxed (only pings on volume).

Critical safety note about the self-protection clause: if anyone (or any compromised key) tries to delete wh_sec itself, the next Routine run will detect that and surface the deletion via the backup channel I should also configure. Let me set that up too.

agentry_register_webhook {
  project_id: "default",
  url: "https://api.pagerduty.com/integration/.../enqueue",
  description: "Audit security backup (PagerDuty)",
  events: ["routine.security_alert_backup"]
}

{ "webhook_id": "wh_sec_backup", "signing_secret": "agws_…", "active": true }

Agent

Now any mutation to wh_sec will also fire wh_sec_backup. Even if Slack is sabotaged, PagerDuty rings.

Want me to extend the Routine to log a positive-acknowledgment ping every Monday morning (“security audit is live, 0 issues last 168h”) so you notice if it stops firing?

The output

security_alert Routine (rt_sec)

  ✓ Schedule:              every 30 minutes
  ✓ Primary webhook:       wh_sec       → #security
  ✓ Backup webhook:        wh_sec_backup → PagerDuty
  ✓ Self-protection:       monitors webhook.delete on wh_sec itself
  ✓ Business hours:        Mon-Fri 09:00-17:00 America/Los_Angeles
  ✓ High-impact actions monitored:
      - feature_flag.delete
      - suppression.create (broad patterns)
      - cohort.delete
      - publication.create
      - webhook.delete (especially wh_sec)
  ✓ First run:             2026-05-16T14:00:00Z

A Slack alert (off-hours):

🚨 SECURITY: high-impact mutation off-hours

  Action:       feature_flag.delete
  Resource:     flag_main_login_v2   ← production auth flag
  Actor:        api_key_prefix=agk_x7…
  Occurred:     2026-05-17T02:14:00Z (02:14 PT, Saturday)
  Severity:     HIGH (off-hours + production flag)

  Did you do this? If NO, take the following actions IMMEDIATELY:
    1. agentry_rotate_key  (revoke the key that did it)
    2. Restore flag from audit log snapshot
    3. Pull last 24h of audit log for full IR

  Audit detail: https://agentry.sh/audit/c_aud_42

Setting it up

The key pieces:

1. Define your high-impact action list. The defaults above are conservative — adapt to your environment. Some teams add member.add (new teammate joins), recipe.run for specific destructive recipes, etc.

2. Define “business hours” precisely. Time-of-day alerts are noisy if your “after hours” window includes 5pm Friday in EU. Use the timezone where most engineering activity happens, and treat “anyone on call” hours as business hours.

3. Set up the backup channel. Single-channel monitoring is fragile (the channel itself can be compromised). A second target on a different vendor (PagerDuty, email, SMS) makes the whole thing resistant to sabotage.

4. Test fire it. Once registered, manually do one of the watched actions (delete a test flag) outside business hours and confirm both channels light up. This is the only way to know your wiring actually works.

// Test fire — create then delete a test flag
const flag = await fetch(
  `https://api.agentry.sh/v1/projects/${PROJECT_ID}/feature-flags`,
  {
    method: "POST",
    headers: {
      "Authorization": `Bearer ${process.env.AGENTRY_API_KEY}`,
      "Content-Type": "application/json",
      "User-Agent": "audit-test/1.0",  // REQUIRED — Cloudflare 403s default UAs
    },
    body: JSON.stringify({ key: "test_audit_canary", active: false }),
  },
).then(r => r.json());

await fetch(
  `https://api.agentry.sh/v1/projects/${PROJECT_ID}/feature-flags/${flag.flag.id}`,
  {
    method: "DELETE",
    headers: {
      "Authorization": `Bearer ${process.env.AGENTRY_API_KEY}`,
      "User-Agent": "audit-test/1.0",
    },
  },
);
// next Routine run should fire wh_sec

Variations

“Add a ‘change-management window’ override — during a planned release, suppress alerts for known mutations the on-call engineer is making.”
“Per-resource severity: deleting a flag that’s at 100% rollout is worse than deleting one at 0%. Weight by impact, page only on real risk.”
“Integrate with your IR runbook: on a HIGH alert, also fetch the last 5 actions by that api_key_prefix and include them in the alert for context.”
“Geographic check — alert if any action originates from an IP not in your usual office/VPN ranges.”

Alert on unexpected agent mutations

Why this matters

What you get

Walk through it

The output

Setting it up

Variations

Related recipes

Get a weekly digest of what your agent did

Nightly Routine: draft PRs for low-risk silent bugs

Watch staging — page only on genuinely new errors

Try this recipe in your own agent.