Super AI Postmortem: Why an All-in-One AI App Didn't Work in 2024

Q: Why do all-in-one AI apps fail?

All-in-one AI apps wrap commoditising infrastructure (foundation models) without a defensible margin or a specific user the product wins big with. When the underlying intelligence flattens,as GPT-4-class quality became the floor in late 2024,the wrap loses its reason to exist. Builder.ai, once valued at 1.2 billion dollars, declared bankruptcy in May 2025 for structurally identical reasons.

Super AI ran from March 2024 to March 2025. Twelve months. The pitch fit on a sticker: describe what you want, we'll pick the model. An all-in-one AI app that auto-routed each prompt to the best foundation model,GPT for reasoning, Claude for long context, Gemini for speed, an open-source model for cheap throughput. In early 2024 that thesis felt obviously correct. By late 2024 it was obviously wrong. By March 2025 I had shut it down.

This is what happened, why it failed, and the structural lesson that survived. Super AI is failure number two of the three that preceded Gravity,the full synthesis is in three startups, three shutdowns; this post is the focused case study.

What Super AI was

Super AI was a single chat surface that you typed into, and the system decided which model to send each request to. Reasoning-heavy prompt → GPT. Long-document summarisation → Claude with the long-context window. Code → whichever was strongest that month. Cheap throughput → an open model. The user got "the answer", not a model selection menu.

The product shipped. The router worked. The latency was fine. People used it. I did the GrowthX Capstone in October 2024 partly to pressure-test exactly this thesis,that aggregating models for end users was real value. The pressure test surfaced cracks that I could not unsee for the rest of the year.

The 2024 thesis (and why it sounded right)

In Q1 2024 the foundation-model landscape looked like a long-tail bet. GPT-4 was the leader; Claude 3 Opus had real strengths in long context; Gemini 1.5 had its own profile; open models were cheap but uneven. The premise was that the spread between models would persist or widen, and the average user did not want to learn which model was best for which task.

That premise had a name in the analyst world: the AI-router thesis. It was not crazy. The data points that supported it included: model benchmarks visibly differed by task; pricing differed by an order of magnitude; long-context windows were unevenly available. If you held that picture as the steady state, "let the router pick" was a useful product.

Why the thesis aged badly inside the same year

By Q4 2024 the spread between models had collapsed for most consumer tasks. GPT-4-class quality became the floor. The strongest models stayed strong, but the difference between "use Claude" and "use GPT" stopped being the difference between a usable answer and an unusable one. It became the difference between two usable answers.

That sounds like a small change. It was not. It killed the routing premise. If both answers are usable, the user does not need a router,they need any model that is usable. Aggregating choice has zero value when the choice no longer matters.

Quality spread collapsed. Once "any usable answer" beat "best model", the router lost its job.

Then ChatGPT Workspace Agents launched in April 2026 (OpenAI, ChatGPT agents documentation, accessed 2026-05-05). That made the all-in-one category structurally obsolete: the incumbent shipped both the model and the surface, with native tool use, on a brand consumers already trusted. Super AI did not survive that long; I shut it down a year before that launch. But the launch confirmed that the category was not just hard,it was over.

The Builder.ai parallel

I was not alone. Builder.ai,the Microsoft-backed all-in-one AI startup once valued at $1.2 billion,declared bankruptcy in May 2025 (TechStartups, December 2025). Different scale entirely. Different kind of "all-in-one",Builder.ai was app-development as-a-service, not a model router. But the structural failure was the same: an aggregator product cannot defend margin when the thing it aggregates becomes commoditised.

The 2025-2026 wave caught a long list of similar plays. The CB Insights startup-failure corpus shows 70% of failures cite "ran out of capital" as the surface cause, but the underlying drivers were poor product-market fit (43%), bad timing (29%), and unsustainable unit economics (19%) (CB Insights, "Why Startups Fail", 2026). Super AI's underlying cause was a mix of all three; the surface cause was the same as everyone else's.

The shutdown decision

The decision moment was a feature meeting in February 2025. We were discussing whether to add a "model selector",a manual override for users who wanted to pick the model themselves. I caught myself arguing for it on the grounds that "users sometimes know better than the router". That sentence is a confession that the router is no longer the product.

If the user is doing the routing, the router is overhead. If the router is overhead, the wrap has no reason to exist. Once I'd said the sentence out loud, I could not un-think it. We shipped the manual override anyway as a courtesy to existing users, then I started writing the shutdown plan. By March 2025 it was done.

The honest hindsight read: the right shutdown moment was probably November 2024, not February 2025. The GrowthX Capstone in October already surfaced the routing flatness. I rationalised through it because the product was working,engagement was real, churn was tolerable. "Working" and "structurally viable" are different states. I conflated them.

The lesson,and how Gravity uses it

The lesson tag for Super AI's failure: scaling-potential test failed. Each new Super AI user just routed to one of the same models. The kind of growth that opens new jobs,what aggregator-style products almost never get,was not there. Specific products that solve one user's problem 10x better beat horizontal aggregators in commoditised markets.

For Gravity, this lesson is the structural argument for being a deployer, not an aggregator. How Gravity works: describe a recurring task, the platform deploys an autonomous agent that runs 24/7. Each new vertical,sales follow-up, KPI reports, content scheduling, watch lists,opens a new job class. The TAM expands as new task categories get covered, not as the same task gets used by more users. That is the difference between a product and an aggregator.

If you're building an aggregator right now and want to compare notes,or you want to read the synthesis across all three failures,the three-startups synthesis has the full framework, plus how MindWave failed the 10x test (in the mental health postmortem) and how Vibe AI failed the margin test (in the Vibe AI postmortem).

Frequently asked questions

What was Super AI?

Super AI was an all-in-one AI application I built from March 2024 to March 2025. The thesis was that picking the right foundation model for each task was confusing for users, and a router that did it automatically would deliver real value. We shipped a working product. The thesis aged badly as foundation models converged in quality, and I shut it down in March 2025.

Why do all-in-one AI apps fail?

All-in-one AI apps wrap commoditising infrastructure without a defensible margin or a specific user the product wins big with. When the underlying intelligence flattens,as GPT-4-class quality became the floor in late 2024,the wrap loses its reason to exist. Builder.ai, once valued at $1.2 billion, declared bankruptcy in May 2025 for structurally identical reasons.

What killed Builder.ai?

Builder.ai, the Microsoft-backed all-in-one AI startup once valued at $1.2 billion, declared bankruptcy in May 2025. The structural failure: an aggregator product cannot defend margin when the thing it aggregates becomes commoditised. Different scale than Super AI; identical failure mode.

What is the lesson from Super AI's shutdown?

Aggregator products in commoditising markets do not scale. Specific products that solve one user's problem 10x better beat horizontal aggregators in commoditised model markets. Super AI failed the scaling-potential test: each new user routed to one of the same models. The kind of growth that opens new jobs was not there.

How does Gravity differ from Super AI?

Gravity does not aggregate models. It deploys autonomous agents in 60 seconds for specific recurring tasks,inbox triage, lead follow-up, KPI reports. Each new vertical opens a new job class, not a deeper version of the same one. That is the structural difference between a product and an aggregator.

Three takeaways before you close this tab

If your moat is "we choose the right thing", your moat dies when choice flattens. Aggregators on commoditising infrastructure are short-lived by design.
Add the manual override only if you want to formalise your obsolescence. Every "user can override the router" feature is the router admitting it lost.
Specific beats horizontal in a commoditised model market. One user winning big > many users winning a little.

Sources

TechStartups, "Top AI Startups That Shut Down in 2025: What Founders Can Learn", December 2025, retrieved 2026-05-05, techstartups.com
CB Insights, "Why Startups Fail: Top 9 Reasons", 2026 analysis, retrieved 2026-05-05, cbinsights.com
OpenAI, "Agents on ChatGPT", documentation, accessed 2026-05-05, help.openai.com
Volt Equity, summary of Peter Thiel's 10x rule from Zero to One, retrieved 2026-05-05, voltequity.com