Why this matters
Most status pages are theater. The vendor SaaS (Statuspage.io, BetterUptime, etc.) gives you a green-bar template that someone on call has to manually flip to yellow when an incident hits. They almost never do, because:
- Mid-incident is the worst time to also update a status page
- The status page becomes a lie (“all systems operational” while customers are getting 500s)
- Engineers stop trusting it; customers stop trusting it; it’s purely decorative
The honest version: a data-driven status page that reads directly from your real metrics. No human flips a switch — the page shows actual ingest rates, actual incident counts, actual MTTR. When it lies, you have a real bug to fix; when it shows yellow, customers can see why.
Agentry gives you live rows for each widget. Serve those rows from your own backend so the browser never receives AGENTRY_API_KEY.
What you get
- A public status page with three sections:
- Now: ingest health (events received in last 5 min vs baseline)
- Recent incidents: open cases in last 24h with ≥ 10 users affected
- MTTR (7d): median time from
case.createdtocase.resolved
- A server-side JSON/view pattern for
status.yourdomain.com - No vendor SaaS, no monthly fee, no green-by-default lie
Walk through it
Compose a public status dashboard: current ingest health (logs / analytics / deploys), recent incidents (cases in last 24h with users_affected ≥ 10), and MTTR for the last 7 days. Return rows I can serve from status.mydomain.com.
Three server-side widget reads. Let me build each, then hand back the status-page data contract.
POST /v1/projects/:project_id/analytics/query {
project_id: "default",
query: "
with recent as (
select count() as n_5min
from events
where timestamp > now() - interval 5 minute
),
baseline as (
select count() / (7 * 24 * 12) as baseline_per_5min
from events
where timestamp > now() - interval 7 day
)
select
r.n_5min,
b.baseline_per_5min,
r.n_5min / b.baseline_per_5min as ratio,
case
when r.n_5min / b.baseline_per_5min between 0.5 and 2.0 then 'operational'
when r.n_5min / b.baseline_per_5min < 0.5 then 'degraded'
else 'elevated_volume'
end as status
from recent r cross join baseline b
"
}
{ "rows": [{ "n_5min": 812, "baseline_per_5min": 790, "ratio": 1.03, "status": "operational" }] }
GET /v1/projects/:project_id/cases?status=open&limit=10
{ "cases": [/* filter server-side for recent high-impact incidents */] }
GET /v1/projects/:project_id/cases?status=resolved&limit=100
{ "cases": [/* compute MTTR in your server route from resolved case timestamps */] }
Three data sources mapped. Here’s the server-rendered page shape you can drop on status.yourdomain.com:
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>System status — yourdomain</title>
<style>
body { font-family: ui-sans-serif, system-ui, sans-serif; max-width: 720px; margin: 2rem auto; padding: 0 1rem; color: #1c1c1c; }
section { border: 1px solid #e5e5e5; border-radius: 8px; padding: 1rem; margin-bottom: 1rem; }
h1, h2 { letter-spacing: -0.01em; }
.grid-3 { display: grid; gap: 0.75rem; grid-template-columns: repeat(3, 1fr); margin-bottom: 1.5rem; }
.status-good { color: #15803d; } .status-bad { color: #b91c1c; }
</style>
</head>
<body>
<h1>System status</h1>
<section id="ingest-health">
<h2>Right now</h2>
<!-- Render rows from your server-side /status-data route. -->
</section>
<section id="active-incidents">
<h2>Active incidents (24h)</h2>
</section>
<section id="mttr">
<h2>7-day MTTR</h2>
</section>
<p style="color: #6b7280; font-size: 0.85rem;">
Driven by real production telemetry, refreshed every minute.
No human flips this dashboard between green and red — if it shows yellow,
something is actually misbehaving.
</p>
</body>
</html>
One detail: cache your
/status-dataroute for 60 seconds so a traffic spike to the status page does not re-run every query per visitor.
The output
Server-rendered status page components
ingest-health "Now" — events in last 5min vs 7d baseline
active-incidents "Incidents" — 24h cases with ≥10 users
mttr "MTTR" — median + p90 over 7d
Sample rendered output:
System status — yourdomain
─────────────────────────────────────
Right now
Events/5min: 12,400 (1.04× baseline) ✓ operational
Active incidents (24h)
[none]
7-day MTTR
median: 42 minutes
p90: 2h 14m
resolved this week: 14
Driven by real telemetry, refreshed every minute.
Setting it up
1. Subdomain. Point status.yourdomain.com to an app route or serverless function that can read AGENTRY_API_KEY server-side.
2. Caching. Cache the server-side response for 60 seconds — protects you from someone hitting the page 1000 times during an incident.
3. Key-safe. Never put AGENTRY_API_KEY in the browser. Pull JSON server-side and render HTML:
// Astro / Next / your framework
const res = await fetch(
`https://api.agentry.sh/v1/projects/${projectId}/analytics/query`,
{
method: "POST",
headers: {
"Authorization": `Bearer ${process.env.AGENTRY_API_KEY}`,
"Content-Type": "application/json",
"User-Agent": "status-page/1.0",
},
body: JSON.stringify({ query: ingestHealthQuery }),
}
);
const { rows } = await res.json();
const status = rows[0].status; // 'operational' | 'degraded' | ...
4. Branding. Render the rows into your own HTML and CSS.
5. Honest copy. Resist the urge to add manual “we’re investigating” banners. The whole point of this page is that it never lies, and a human-curated banner re-introduces the failure mode of vendor status pages.
Variations
- “Add a 4th component: per-service breakdown. ingest/auth/checkout/api as separate rows.”
- “Make the incident list show only the agent_summary (not the raw error message) — friendlier for customers.”
- “Per-region status: filter ingest health by
properties.regionso EU/US/APAC are separately displayed.” - “Add a ‘historical uptime %’ computed as (1 - hours_with_active_incidents / 720) over the last 30 days.”
- “Embed in a Notion page using their /embed block instead of a standalone subdomain.”