How Vibe AI Feedback Shaped Gravity: The Signals That Became Decisions

Vibe AI did not get to product-market fit, but it got to enough users to be useful. A few hundred actives across late 2025 and early 2026 generated about 1,800 support tickets, 47 cancellation surveys, and roughly 300 NPS-style follow-up conversations. That is signal, not vibes. When I shut Vibe AI down and started Gravity, I read all of it end to end before writing a single line of new product code. This post is what came out of that pass.

The Vibe AI postmortem at vibe-ai-postmortem covers why the product itself failed. The abstraction-level argument lives at describe outcome, not workflow. This post is the lineage layer: which specific user signals became which specific Gravity design decisions. The point is that Gravity is not a clean-sheet product. It is the second derivative of Vibe AI plus the framework from three startups, three shutdowns.

What Vibe AI was

Vibe AI was an attempt to make AI workflow automation accessible through chat. Users typed a description, the system generated a workflow with triggers and actions, and the workflow would run when the trigger fired. Think Zapier with an LLM in front of it. The product launched in mid-2025 and got to a few hundred active users before I shut it down in early 2026 for unit-economics and abstraction reasons covered in the postmortem.

The interesting thing for this post is not why it failed. The interesting thing is the dataset it produced. Roughly 1,800 support tickets, 47 cancellation surveys with at least two sentences of text, and 300 informal follow-up calls and emails. That is a small sample by Series A standards and an enormous sample by pre-PMF founder standards. Most pre-PMF founders are pattern-matching on twenty-user feedback. Three hundred is the threshold where patterns start being real.

Signal one: outcome over trigger

The single most repeated piece of feedback across Vibe AI tickets was a variant of "I don't want to configure a trigger; I just want it to do the thing". Users would describe the outcome they wanted, the system would translate it into a workflow with triggers and actions, and users would respond with frustration when they had to review and approve the trigger configuration. Roughly forty percent of week-one drop-off was at the trigger-configuration step.

That signal is not subtle. It is the same signal Zapier has been ignoring for a decade. The product-design implication is structural: do not make the user think about triggers at all. Gravity's design choice is that the user describes the outcome, the system runs continuously, and trigger logic is internal. The full argument lives in describe outcome, not workflow; the signal that generated the argument lives in those Vibe AI tickets.

Signal two: persistence over single-shot

The second-most-repeated signal was that users wanted the agent to keep running. Vibe AI's mental model was single-shot: trigger fires, action runs, done. Users wanted "agent runs forever, watching for the conditions, doing the thing". The mismatch showed up in cancellation surveys: "I thought it would keep running" was the literal phrase in eleven of forty-seven surveys, roughly twenty-three percent.

Single-shot is the inheritance from Zapier. It is also wrong for autonomous agents. The user model for an agent is "a worker who does this job, ongoing". The trigger-action model is "a script that runs once". Gravity is built around the persistent-runner model: the agent deploys, runs continuously, and is supervised rather than triggered. That decision came directly from the eleven users who wrote some version of "I thought it would keep running".

Signal three: reliability over breadth

The third signal was about trust. Users with more than three weeks of usage on Vibe AI consistently asked the same question: "How do I know it will not break next week?" The product had a lot of integrations and a lot of features. What it did not have was a credible reliability signal. The 80-test methodology in how we test AI agents exists in part because Vibe AI users explicitly asked for it.

The product lesson is that breadth is a poor substitute for reliability. A founder facing the breadth-vs-reliability trade-off should pick reliability every time. The Gravity launch deliberately ships with fewer integrations than Vibe AI did, and each integration carries the 80-test cover. Users who care about trust trade integration count for reliability evidence; users who do not care will not stay anyway.

Vibe AI cancellation survey free-text responses, coded into four categories. Three of four map directly onto core Gravity design decisions.

What the churn data said

Vibe AI churn clustered at two distinct points: week two and week six. Week-two churn was the trigger-configuration friction; users hit it, did not get past it, and left. Week-six churn was the persistent-runner mismatch; users got past configuration, the workflow ran once, and they realised the agent was not the long-running worker they thought they were buying.

Both clusters are structural, not feature-level. You cannot fix week-two churn by improving the trigger-configuration UI; the entire trigger paradigm has to go. You cannot fix week-six churn by adding more triggers; the underlying mental model has to be persistent rather than single-shot. Gravity addresses both at the architecture level. The economics post at economics of bootstrapped AI agents walks through the per-task math that makes the persistent-runner model viable.

From signal to decision

Three signals, three Gravity decisions. Outcome over trigger became the core interaction model. Persistence over single-shot became the runtime model. Reliability over breadth became the launch sequencing principle. Each decision is testable: if a future Gravity user writes a cancellation survey with the same language as the Vibe AI surveys, the decision was wrong and we go back to the drawing board.

The meta-lesson is that the cheapest form of product research is reading your last failure end to end. Three or four hundred users is enough to find real patterns. Twenty users is mostly noise. The discipline is to read every support ticket, every cancellation survey, every NPS comment before designing the next thing. Most founders skip this because it is unglamorous. The signal compounds across attempts only if you read it.

If you are between products and have enough users on the prior one to extract signal, my email is at the top of /contact. I will compare notes on coding cancellation surveys with anyone who wants to.

Frequently asked questions

What was Vibe AI?

Vibe AI was an AI workflow product I built in 2025 that let users connect tools and trigger automations through a chat interface. It got to a few hundred active users before I shut it down in early 2026 because the unit economics did not work and the underlying abstraction was wrong. The full postmortem lives at slash blog slash vibe-ai-postmortem.

Which Vibe AI signals shaped Gravity directly?

Three signals shaped Gravity. First, users hated configuring triggers; they wanted to describe outcomes. Second, users wanted persistence; they wanted the agent to keep running, not run once. Third, users wanted to trust the agent before delegating; reliability across edge cases mattered more than feature breadth. Each became a core Gravity design decision.

Why is outcome description better than workflow configuration?

Workflow configuration requires the user to think like a programmer: trigger, step, condition, branch. Outcome description lets the user describe what should be true at the end. Once the underlying model is good enough to figure out the steps, asking the user to design the path is overhead, not value. The full thesis is at the describe-outcome-not-workflow post.

What did churn data tell you?

Vibe AI churn clustered at week two and week six. Week-two churn was users who hit the configuration friction and gave up. Week-six churn was users who got past configuration and then realised the agent was running in single-shot mode, not as a persistent runner. Both signals are addressed structurally in Gravity, not as features.

Should every founder do this with prior products?

Yes if there are enough real users to give signal. Three or four hundred users is enough to find real patterns; thirty users is mostly noise. The discipline is to read the support tickets and the cancellation surveys end to end before designing the next product, even if the new product is in a different category. The signal compounds across attempts.

Three takeaways before you close this tab

Lineage is the cheapest research. Read every prior support ticket and survey before building.
Signals become decisions, not features. Structural problems need structural answers.
Churn clusters expose paradigm bugs. Find the cluster, fix the paradigm.

Sources

Aryan Agarwal, "Vibe AI Postmortem", 2026, gravity.fast/blog/vibe-ai-postmortem
Aryan Agarwal, "Describe Outcome, Not Workflow", 2026, gravity.fast/blog/describe-outcome-not-workflow
Aryan Agarwal, Vibe AI support ticket and cancellation survey corpus, 2025-2026 (primary source).
CB Insights, "The Top 12 Reasons Startups Fail", 2024 update, cbinsights.com/research