Key takeaways
- Hourly check. Not real-time. Hourly is faster than mortals can act on and slow enough to avoid noise.
- Per-step baselines. Each funnel step gets its own rolling baseline with hour-of-day and day-of-week effects.
- Three hypotheses per alert. Recent deploys, recent campaigns, breakdowns. Not a guess at the root cause.
- Minimum sample size. Two hundred events per step per hour. Below that, no alert.
- False positive feedback loop. One click on the alert card retrains for the next month.
What this agent does
A growth team watches three funnels. Onboarding, activation, payment. Each funnel has five to twelve steps. Mixpanel exposes those steps in the dashboard. The team checks the dashboard once a week, sometimes once a day. A step that drops on a Tuesday afternoon and recovers Wednesday morning never gets noticed because nobody was looking. By the end of the week the team is debugging a forty-eight-hour-old issue and the deployment that caused it is hard to identify.
The agent eliminates the looking. Every hour it queries Mixpanel's Funnel API, computes the per-step conversion rate, compares each step to its baseline, and posts an alert to Slack if any step drops more than the configured threshold. The alert is not a number flashing red, it is a small narrative that says "Step 3 of the Payment funnel dropped 22 percent at 14:00 UTC. The drop coincides with deploy a4f2c at 13:48. The drop is concentrated on iOS in the US, with no change on web or Android. Three suspected causes follow." The on-call engineer reads, decides, and acts.
What the agent does not do: roll back deploys, page the team on its own, pause ad campaigns, write to Mixpanel, or modify Mixpanel funnels. It alerts. Humans act. The same principle that holds for weekly Google Analytics summary agents applies here, the agent surfaces, the team decides.
Sources of truth
Mixpanel for the funnels. Two or three adjacent sources for the hypotheses.
- Mixpanel Funnel API. Funnel definition and step conversion rates. The agent does not invent funnels; it uses the ones in your Mixpanel project.
- Mixpanel Insights API. Breakdowns by platform, country, app version, marketing source. Used in the hypothesis section of the alert.
- Your deploy tool. GitHub Actions, Vercel, Netlify, AWS CodeDeploy, or a custom webhook. Used to correlate the drop window with recent deploys.
- Ad platforms. Meta Ads, Google Ads, TikTok Ads. Used to correlate the drop with campaign changes (pause, budget change, creative refresh).
- Optional: status page. Statuspage or BetterStack. Used to filter out drops that coincide with known incidents the team already responded to.
The agent does not read user-level event streams. The drop is detected at the funnel level. Drilling into the events for a specific user is a separate workflow that lives in Mixpanel itself, accessible from a link in the alert card. Keeping the agent at funnel level is a deliberate choice that simplifies permissions and audit. The same scoping idea appears in how to give an agent access to email safely.
How the agent detects anomalies
Per-step baselines, hourly bucket, two-standard-deviation threshold.
For each step in each whitelisted funnel, the agent maintains a baseline conversion rate per hour-of-day and day-of-week. The baseline is computed from the past four weeks of data, recomputed daily, and includes a confidence interval. When the current hour's measured conversion falls below the baseline by more than two standard deviations, the step is flagged. Threshold and lookback window are configurable per funnel, payment funnels tend to use a tighter threshold than onboarding funnels.
Minimum sample size matters more than threshold. A step with fifty events per hour will produce noisy alerts no matter the threshold. The agent skips any step where the current hour has fewer than two hundred events. The minimum is configurable but rarely changed. The corollary is that low-traffic funnels (early-stage product, niche markets) get fewer alerts. That trade-off is intentional. False alerts erode trust faster than missed alerts.
Output: the alert card
Each alert is a single Slack message with a fixed structure. No charts, no live data, no follow-up prompts. The structure trains the reader's eye in three weeks.
- Headline. "Funnel X, step N dropped Y percent at HH:MM."
- Magnitude. Baseline conversion rate vs current. Confidence interval shown in plain language ("expected 47-52 percent, observed 31 percent").
- Recent deploys. Up to three deploys in the past four hours, each with a deploy id, time, and a one-line description from the commit message.
- Recent campaign changes. Pauses, budget changes, creative changes within the past twelve hours.
- Breakdown. Top one or two segments where the drop is concentrated. Examples: "iOS users in the US (-38 percent)", "Web traffic from organic search (-15 percent)".
- Three hypotheses. The agent's short narrative. Each hypothesis names a likely cause and ranks confidence in plain words.
- Buttons. Mark false positive, ack, link to Mixpanel, link to the deploy.
The alert card is the entire customer-facing surface of the agent. Everything else is logging. The principle is the one described in how to monitor agent activity: the human-readable output is the contract, the rest is implementation.
Guardrails
Three guardrails are mandatory.
- Read-only on every system. Mixpanel, the deploy tool, ad platforms, status page. The agent's tokens have no write scopes anywhere.
- No paging. The alert goes to Slack. It does not call PagerDuty or send SMS. Funnel anomalies are usually not on-call events; the people who triage them are the growth team, not on-call SREs. If a particular alert deserves to page, an on-call engineer chains the alert to PagerDuty manually in the alert card.
- Quiet hours. Alerts are deferred outside the team's working hours unless explicitly opted in. A 3 a.m. alert wakes someone who cannot do anything until morning anyway.
The agent rate-limits its Mixpanel API calls. Mixpanel quotas are project-level and a runaway agent can affect every dashboard the team uses. Conservative pacing matters.
Common mistakes
Alerting too sensitively. A 1.5 standard-deviation threshold sounds careful and produces five alerts a day, most of them noise. Two standard deviations is the practical floor. Going lower causes alert fatigue inside a week.
Skipping the deploy correlation. The single most useful field on the alert is the recent deploy list. Without it the team spends an hour deciding whether to act. With it, the decision is fast and reversible. Same principle as the test discipline in how we test AI agents with 80 tests per capability.
Alerting on every step of every funnel. The whitelisting matters. Five to twenty steps across two or three funnels is the right footprint. Trying to alert on twenty funnels at once produces a wall of alerts the team cannot act on.
Treating alerts as incident records. The alert says "something dropped". It does not say "the system is broken". Some drops are real, some are seasonal, some are upstream marketing changes. Pretending every alert is an incident wastes engineering time and erodes trust in the agent. The mark-false-positive button exists for this reason and using it is healthy.
Reacting to alerts at minute one. The agent runs hourly. By the time the alert lands, the drop has already been measured against a meaningful sample. Waiting fifteen minutes to see whether the next hour's data confirms it is usually right. Acting immediately on a single alert is the wrong instinct for funnel anomalies.
Frequently asked questions
Can an AI agent detect Mixpanel funnel anomalies?
Yes. The agent queries Mixpanel for the funnels you whitelist every hour, compares the latest step conversions to a rolling baseline, and posts an alert when any step drops below the configured threshold. The alert lists the funnel, the step, the deviation, and three hypotheses the agent has narrowed to.
How is this different from Mixpanel's built-in anomaly detection?
Mixpanel's anomaly detection alerts on raw metric drops. The agent additionally correlates the drop with recent deploys, marketing campaigns, and breakdowns by platform or country, and writes a short narrative naming the three most likely causes. The agent does not replace Mixpanel's detection, it sits on top of it and adds the why.
How does the agent decide what is anomalous?
Each step has a rolling baseline computed from the past four weeks at the same hour-of-day and day-of-week. The agent flags any drop more than two standard deviations below the baseline. The threshold is configurable per funnel. The agent never alerts on small-sample windows; the minimum is two hundred events per step per hour.
Does the agent investigate root causes itself?
It generates three hypotheses by correlating the drop window with recent deploys (from your deployment tool), recent marketing campaigns (from the ad platforms you wire up), and breakdowns by platform, country, and app version (from Mixpanel itself). It does not investigate code, log files, or backend metrics. Those are for humans or other agents.
Can the alert reduce false positives over time?
Yes. When the on-call human marks an alert as a false positive in the alert card, the agent records the time, funnel, and the apparent cause. After about twenty marks the agent recognises the recurring patterns and tightens thresholds for those windows. False positive rate typically drops below five percent inside the first month.
Three takeaways before you close this tab
- Hourly is the right cadence. Faster is noise. Slower is useless.
- Three hypotheses is the right output. Single guesses overcommit; long reports get skimmed.
- Read-only on every system. The agent is a narrator, not an operator.
Sources
- Mixpanel. Funnels Query API documentation. Tier 1.
- Mixpanel. Insights API for breakdowns. Tier 1.
- Mixpanel. Anomaly detection and alerts feature. Tier 1.
- Mixpanel. API rate limits per project. Tier 1.