Autonomous vs Assistive AI: The Spectrum, Not a Binary

Autonomous AI and assistive AI are usually discussed as if they are different products. They are not. They are different points on the same spectrum, and the spectrum has five measurable axes: decision-making, tool-use, multi-step planning, error recovery, and time horizon. A copilot can score high on one axis and zero on the next. An agent platform marketed as fully autonomous can score lower than expected once the axes are scored honestly. This post defines the spectrum, scores common products, and places Gravity on it.

The framing matters because most procurement conversations confuse the categories. A buyer asks for an AI agent and gets a copilot. A buyer asks for a copilot and gets an autonomous system that fails open in dangerous ways. The five-axis framework gives a shared vocabulary that survives the marketing. Public benchmarks confirm the spread: GAIA reports human pass rates above 90 percent and the strongest agent systems below 50 percent on harder multi-step levels (Mialon et al., 2023). The gap is real, and it is concentrated on the high-autonomy axes.

What separates autonomous from assistive AI?

Assistive AI suggests, drafts, or completes actions a human takes. The human stays in the loop on every step. Autonomous AI selects goals, plans steps, calls tools, and finishes a task without per-step human approval. The split is not binary; it is a spectrum across five axes. A system can be highly autonomous on tool-use and barely autonomous on planning. Treating it as one number hides the real shape.

The clearest definitional anchor is decision frequency. Assistive systems are interrupted dozens of times per task: each suggestion is a checkpoint. Autonomous systems are interrupted once at start and once at end, with optional escalations in between. The interrupt count is a proxy for where on the spectrum a system sits. Anthropic's published guidance on agentic systems uses a similar framing: agentic behaviour is measured by the depth of independent action between checkpoints (Anthropic engineering, Building Effective Agents, retrieved 2026-05-07).

The five axes of agency

Five axes, each scored zero to five. The total runs from zero (a static dropdown) to twenty-five (a fully autonomous agent that runs for days without supervision). The axes are not equally weighted in practice; planning and error recovery dominate real-world reliability, which is why the stop-after-one-task failure mode is the modal complaint.

Decision-making. Zero: human chooses every step. Five: system chooses goal, plan, and steps.
Tool-use. Zero: text in, text out. Five: system selects from a tool catalogue and chains calls. See how AI agents use tools.
Multi-step planning. Zero: one step. Five: ten or more dependent steps with branching. Covered in detail in orchestration.
Error recovery. Zero: fails on first error. Five: retries, replans, escalates only when it cannot recover.
Time horizon. Zero: seconds. Five: days, with persistent state across runs (see memory).

Most products marketed as agents in 2026 score in the 12 to 16 range. The 16-to-22 gap is where reliability work lives.

Where common products sit

A coding copilot like the ones embedded in IDEs scores around eight: high on tool-use within the IDE, near zero on planning past one suggestion, near zero on error recovery. A workflow automation platform that triggers on events scores around ten: structured planning, weak decision-making (the human authored every branch), no error recovery beyond retry. A retrieval-augmented chatbot scores around twelve: tool-use exists, planning is shallow, decisions are framed as questions to the user. None of these are agents in the strict sense.

A single-task agent (one tool, one goal, one execution) sits around sixteen. A platform built for outcome-described tasks (the user describes the desired end-state and the system plans backward) sits around nineteen. Long-horizon autonomous systems that maintain state for days and recover from upstream failures sit at twenty-two and above. The benchmarks confirm the gap: AgentBench and SWE-bench results show steep drops in pass rate as task length passes four steps (SWE-bench leaderboard, retrieved 2026-05-07).

Where Gravity sits on the spectrum

Gravity targets the high end. The product brief is "describe a task once; an autonomous agent runs it 24/7." That places it on the outcome-described axis, with planning depth past ten hops and error recovery that retries, replans, and escalates only when it cannot recover. The reliability claim is enforced by the 80-test methodology: a capability does not ship until it passes weighted reliability above 95 percent across eight failure categories.

The choice to target the high end is not aesthetic. It is the consequence of three prior shutdowns documented in three startups, three shutdowns. The product framing rule that survived those shutdowns: a feature ships when it is at least three times better than the alternative, not slightly better. Autonomy is the axis where Gravity is three times better than copilots. On suggestion quality, Gravity is not three times better than a coding copilot; that is fine, because that is not the job.

Choosing the right autonomy level for a task

Higher autonomy is not always correct. Tasks with high stakes per step (legal contract redlines, medical orders, irreversible payments) benefit from assistive AI; the human checkpoint is the safety property. Tasks with low stakes per step but high volume (lead follow-ups, status reports, research aggregation) benefit from autonomous AI; the human checkpoint is the bottleneck. The five-axis framework is a tool for matching task to autonomy level, not a rank-ordering of products.

The pragmatic rule: pick the lowest autonomy level that still finishes the task within the time and cost budget. If a copilot suffices, the copilot wins. If the copilot creates a queue of accepted suggestions that the human cannot keep up with, the autonomous agent wins. The decision is documented in build vs buy and grounded in the outcome-not-workflow framing in describe outcome, not workflow.

Frequently asked questions

What is the difference between autonomous and assistive AI?

Assistive AI suggests, drafts, or completes actions a human takes. Autonomous AI selects goals, plans steps, calls tools, and finishes a task without per-step human approval. The split is not binary; it is a spectrum across five axes: decision-making, tool-use, multi-step planning, error recovery, and time horizon.

Is a copilot autonomous?

No. A copilot is assistive by design. It proposes code, text, or actions inside a tool the human is using. The human accepts, edits, or rejects each suggestion. Copilots score low on tool-use, planning, and error recovery; they score high on relevance and speed of suggestion.

What is the five-axis framework for AI agency?

Five axes: decision-making (who chooses the next step), tool-use (whether the system can call external APIs), multi-step planning (depth before a checkpoint), error recovery (whether the system retries or escalates), and time horizon (seconds, hours, or days). Score each axis zero to five for a system; high totals indicate higher autonomy.

Where do most AI products sit on the autonomy spectrum in 2026?

Most score around 8 to 12 out of 25 on the five-axis framework. Strong on tool-use and decision-making for single steps; weak on multi-step planning past three or four hops, and weaker still on error recovery without human escalation. Public benchmarks like GAIA confirm the multi-step gap.

Where does Gravity sit on the autonomy spectrum?

Gravity targets the high end of the spectrum: outcome-described tasks, autonomous tool selection, multi-step planning past 10 hops, and error recovery without per-step human approval. The reliability discipline behind that target is the 80-test methodology run before a capability ships.

Three takeaways before you close this tab

Autonomy is five axes, not one switch. Score products on each axis honestly.
The reliability gap lives between 12 and 22 on the spectrum. That is where most platforms break.
Pick the lowest autonomy level that finishes the task. Copilots and agents do different jobs; both are correct sometimes.

Sources

Mialon et al., "GAIA: A Benchmark for General AI Assistants", arXiv:2311.12983, 2023, retrieved 2026-05-07, arxiv.org/abs/2311.12983
SWE-bench, "Leaderboard for software engineering benchmark", retrieved 2026-05-07, swebench.com
Anthropic, "Building Effective Agents", retrieved 2026-05-07, anthropic.com/engineering/building-effective-agents
NIST, "AI Risk Management Framework", retrieved 2026-05-07, nist.gov/itl/ai-risk-management-framework
Aryan Agarwal, "Gravity five-axis framework v1", internal spec, May 2026, About