Key takeaways
Key takeaways

Key takeaways

What this agent does

Every Confluence instance over two years old has pages nobody reads. A workspace with two thousand pages typically has five hundred that haven't been edited in a year and three hundred that haven't been viewed in six months. The mess slows down search, confuses new hires, and surfaces wrong information in AI overviews and internal search.

The agent walks the spaces you whitelist on a two-week cadence. For each page it checks five hygiene signals. If a page trips at least two of the five, it goes on a proposal list. A space admin opens the proposal, scans it, ticks the boxes they want to archive, and the agent moves those pages to the Archive space. The agent never decides on its own. The reason is that low-signal pages occasionally hold the most important institutional knowledge, and a person who has been at the company long enough to remember those pages has to make the final call. The same principle shows up in how to add a human approval step to an agent.

What the agent does not do: it does not delete, edit page content, change permissions, change ownership, or move pages outside the workflow above. It does not file Jira tickets to chase owners. The single write action is moving an accepted page to the Archive space. Everything else is a read.

Sources of truth

Confluence only, plus an optional read of the directory service to detect orphan owners.

The agent ignores prose content. It does not read pages to decide whether they are stale. Reading content to decide on staleness sounds smart and produces ridiculous false positives, a comprehensive style guide does not need edits but is not stale. The metadata is enough. The same lesson is in how to give an agent multiple tools: pick the smallest read scope that gets the job done.

The five hygiene signals

Each page is evaluated against five signals. Each signal is binary. A page that trips two or more is proposed. A page that trips exactly one is added to a Watch list, which is shown to admins but not proposed.

Signal 1: No edits in twelve months

The page was last edited more than twelve months ago. This is the most common signal and on its own it is the weakest because some pages are deliberately not edited.

Signal 2: No views in six months

The page has zero recorded pageviews over the past six months across all users. This signal is the strongest. A page that no human has opened in six months is rarely critical.

Signal 3: Orphan owner

The page owner is no longer an active employee per the directory service. The agent does not look at who edited last, only who is named as the owner in Confluence's page metadata.

Signal 4: Broken inbound links

The page is not referenced by any live page in any whitelisted space. Pages in the Archive space do not count toward references. Pages with the Draft label do not count toward references.

Signal 5: Deprecated label

The page explicitly carries the Deprecated label. This signal alone is enough to propose, but it usually arrives with one of the other four signals anyway.

An admin can adjust the thresholds, twelve months becomes eighteen for an HR-policy space, six becomes twelve for a long-tail engineering reference space. Thresholds are stored per space. The two-of-five rule itself is not adjustable. Two-of-five is the rule because every team that tried one-of-five flooded admins with false positives and every team that tried three-of-five had pages with two damning signals that never got proposed.

Output: the bi-weekly proposal

Every other Friday at a time you set, the agent posts the proposal to a Confluence page in a space called Workspace Hygiene. The page is identical in structure each run.

  1. Headline. Pages proposed for archive this run. Pages on Watch this run.
  2. Proposed for archive. Table with one row per page. Columns: title, space, signals tripped, last edit date, last view date, owner, inbound links, link to the page, Accept checkbox.
  3. Watch. Table with the same columns. Used for admins who want to see almost-stale pages.
  4. Run summary. Spaces scanned, total pages reviewed, proposals this run, proposals accepted in the previous run.

The Accept checkbox is the only interactive control. When an admin saves the page after ticking checkboxes, the agent reads the saved diff, identifies which pages were ticked, and moves those pages to the Archive space. Untouched pages stay where they are. Pages that were proposed but not accepted carry over to the next proposal automatically. The same principle is described under how to roll back an agent action, easy to accept, easy to undo.

Guardrails

Three guardrails are non-negotiable.

Common mistakes

Auto-archiving on signal count alone. The first version did this. After three runs admins had archived two pages they wanted back and started ignoring the proposal entirely. Now the agent never archives without an explicit accept. The cost of false positives is high; the cost of one extra click is low.

Counting Archive pages in inbound link checks. Pages in Archive linking to a target page should not count as evidence that the target is alive. The agent excludes Archive references. Forgetting to exclude them once caused half the proposal list to look healthy when it was not.

Using prose content to decide. Reading the body to make a staleness call is wrong twice. It is slow, and it produces false positives on pages that are deliberately static. Metadata only.

Running too often. Daily and weekly cadences cause proposal fatigue. Two weeks is the practical cadence because admins can clear a proposal in fifteen minutes and pages do not become stale fast enough to matter at higher resolution. For the broader argument on cadence, see how to write the prompt for a recurring agent.

Not whitelisting spaces. Running across every space in the workspace catches the founder's scratch space, the HR personal-leave drafts, and the customer success private notes. Whitelist explicitly. Three to eight spaces is normal. New spaces have to be added by name.

Frequently asked questions

Can an AI agent clean up stale pages in Confluence?

Yes. The agent walks the spaces you whitelist, checks every page against five hygiene signals, and produces a proposed archive list every two weeks. It does not auto-delete or auto-archive. A space admin reviews the list and accepts pages in bulk.

What does the agent count as stale?

Five signals: no edits in twelve months, no page views in six months, owner has left the company, broken inbound links, and an explicit Status label of Deprecated. A page must trip at least two signals to be proposed. Tripping only one signal logs it as Watch but does not propose archive.

Why not let the agent archive automatically?

Because pages with low edit and view counts often hold institutional knowledge that nobody touches but everyone needs once a year. Audit notes, incident postmortems, vendor contract notes, the org chart at acquisition. The agent does not know which low-traffic pages are critical. A space admin does. Always.

Does this work for Confluence Cloud, Data Center, or both?

Both. On Cloud the agent uses the REST API v2 endpoints. On Data Center it uses REST API v1 with token authentication. The five-signal logic is identical. Only the auth setup differs. The setup wizard detects the host type from the URL.

Can the agent move pages to an archive space instead of deleting?

Yes, and that is the recommended action. After an admin accepts the proposal the agent moves accepted pages to an Archive space you nominate, preserving the page id, version history, and inbound links. Search filters can exclude the Archive space so it disappears from results but stays accessible.

Three takeaways before you close this tab

  1. Two-of-five signals. Anything less and false positives ruin admin trust within three runs.
  2. Propose, never archive. The agent's job is the list. The human's job is the click.
  3. Archive space, not delete. Reversibility is the whole point.

Sources