Why this matters
A/B testing pricing is the highest-leverage experiment most SaaS companies run. A 15% lift in checkout conversion is worth more than 6 months of “growth hacks.”
The reason teams don’t do it: infrastructure tax. Setting up Statsig / LaunchDarkly / Optimizely is a 1-week procurement + integration cycle. Doing it in-house means flag-eval code + experiment-config table + conversion-tracking pipeline + stats calculation. Founders pick “ship the new price and pray” because the alternative takes a week.
Agentry has feature flags + analytics + cohorts in one product. The agent wires them all together in one prompt: create the flag, mint the cohorts, define the conversion metric, publish a live results dashboard.
What you get
- A feature flag your app checks to decide which price to show
- Two cohorts (“control” and “variant”) auto-built from flag evaluations
- A live dashboard tracking conversion rate per cohort with statistical significance
- A “stop the test early if X” rule
Walk through it
Create a feature flag pricing_v2_test, 50/50 split. Variant A = current pricing ($29/$99). Variant B = new pricing ($49/$149). Set up conversion tracking comparing subscription_started events between the two cohorts. Run 14 days. Publish a results dashboard.
Setting up. Three pieces: flag, cohorts, dashboard.
agentry_create_feature_flag {
project_id: "default",
key: "pricing_v2_test",
name: "Pricing V2 — $49/$149 vs $29/$99",
filters: {
multivariate: {
variants: [
{ key: "control", rollout_percentage: 50 },
{ key: "variant", rollout_percentage: 50 }
]
}
}
}
{ "flag_id": "ff_8c12", "key": "pricing_v2_test", "active": true }
agentry_create_cohort {
project_id: "default",
name: "pricing_v2_control",
filter: { event: "$feature_flag_called",
properties: { feature_flag: "pricing_v2_test", value: "control" } }
}
{ "cohort_id": "co_pcv1c" }
agentry_create_cohort {
project_id: "default",
name: "pricing_v2_variant",
filter: { event: "$feature_flag_called",
properties: { feature_flag: "pricing_v2_test", value: "variant" } }
}
{ "cohort_id": "co_pcv1v" }
agentry_publish_query {
recipe_id: "ab_test_conversion",
description: "Pricing V2 test — conversion comparison",
params: {
control_cohort: "co_pcv1c",
variant_cohort: "co_pcv1v",
conversion_event: "subscription_started",
window_days: 14
}
}
{
"publication_id": "pub_pcv1",
"public_url": "https://agentry.sh/q/pub_pcv1?key=agp_xxx"
}
Done. Three things to wire into your app:
- Eval the flag when rendering the pricing page — see the code example below.
- Capture
subscription_startedwhen the user clicks the upgrade CTA (you likely already do). - Wait. Stats need ~500 conversions per cohort for confidence at α=0.05 with realistic lift sizes.
Live dashboard: https://agentry.sh/q/pub_pcv1?key=agp_xxx
Want me to also set up a Routine that checks daily and DMs you when statistical significance is reached (p < 0.05)?
The output
A live dashboard at agentry.sh/q/pub_pcv1?key=… that updates as data flows in:
📊 Pricing V2 Test — Day 7 of 14
Control ($29/$99) Variant ($49/$149)
Exposures 2,140 2,108
Subscriptions 97 84
Conversion 4.53% 3.98%
Lift baseline -12.1%
95% CI [3.69%, 5.49%] [3.20%, 4.91%]
p-value — 0.346 (not significant)
VERDICT (day 7): Variant is trending worse but not significantly.
Continue test to day 14 unless lift becomes definitive (>30%).
Effective ARPU:
Control: $29 × 4.53% = $1.31 expected revenue per visitor
Variant: $49 × 3.98% = $1.95 expected revenue per visitor (+48%)
Even with lower conversion, variant has higher revenue per visitor.
This is the metric that matters.
Setting it up
Your app needs to check the flag and behave accordingly. Agentry has no SDK — the flag-eval call is a raw POST to /v1/projects/<project_id>/feature-flags/evaluate. Note this endpoint takes an API key (agk_…), not a DSN — flag config is owner-side, so it lives behind the same auth as cases / cohorts / surveys.
// Server-side flag eval (recommended — fast, no race with page render,
// and the API key never reaches the browser)
async function evaluateFlag(key: string, distinctId: string) {
const res = await fetch(
`https://api.agentry.sh/v1/projects/${process.env.AGENTRY_PROJECT_ID}/feature-flags/evaluate`,
{
method: "POST",
headers: {
"Authorization": `Bearer ${process.env.AGENTRY_API_KEY}`, // agk_… (NOT the DSN)
"Content-Type": "application/json",
"User-Agent": "myapp/1.0", // REQUIRED — Cloudflare 403s default UAs
},
body: JSON.stringify({ key, distinct_id: distinctId }),
},
);
const { value } = await res.json() as { value: string };
return value; // "control" | "variant" — deterministic per user
}
export async function loader({ request }) {
const userId = await getUserIdFromRequest(request);
const variant = await evaluateFlag("pricing_v2_test", userId);
return { variant };
}
// In your component
function PricingPage({ variant }) {
const prices = variant === "variant"
? { pro: 49, scale: 149 }
: { pro: 29, scale: 99 };
return <PricingCards prices={prices} />;
}
// Capture the conversion event when user upgrades — analytics ingest uses
// the DSN (different token from the flag eval above)
async function handleUpgrade(plan: string, userEmail: string, variant: string) {
await fetch(`https://api.agentry.sh/v1/analytics/${process.env.AGENTRY_PROJECT_ID}/`, {
method: "POST",
headers: {
"Authorization": `Bearer ${process.env.AGENTRY_DSN}`, // agnt_… (NOT the api key)
"Content-Type": "application/json",
"User-Agent": "myapp/1.0",
},
body: JSON.stringify({
event: "subscription_started",
distinct_id: userEmail,
properties: {
plan,
amount_cents: prices[plan] * 100,
pricing_variant: variant, // optional but useful for ad-hoc queries later
},
}),
});
}
If you’d rather evaluate client-side, proxy the flag call through your own backend — the agk_ key must never reach the browser.
Variations
- “3-way test: $29, $39, $49. Adjust the cohort logic accordingly.”
- “Only show variant B to users who came from organic search (cohort-targeted experiment).”
- “Add a guardrail: if variant conversion drops below 50% of control, kill the variant automatically.”
- “Run this experiment but on enterprise tier specifically — different signal than self-serve.”
- “After the test ends, draft a writeup post for the company blog summarizing what we learned.”