Key takeaways
- Propose, never archive. The agent surfaces a list every two weeks. An admin clicks accept.
- Two of five signals required. One signal puts a page on Watch, not on the archive list.
- Five signals. No edits, no views, orphan owner, broken inbound links, Deprecated label.
- Archive, not delete. Accepted pages move to an Archive space. Page id, history, inbound links preserved.
- Whitelisted spaces only. Engineering, Product, HR public, Customer Success. Not the founder's scratch space.
What this agent does
Every Confluence instance over two years old has pages nobody reads. A workspace with two thousand pages typically has five hundred that haven't been edited in a year and three hundred that haven't been viewed in six months. The mess slows down search, confuses new hires, and surfaces wrong information in AI overviews and internal search.
The agent walks the spaces you whitelist on a two-week cadence. For each page it checks five hygiene signals. If a page trips at least two of the five, it goes on a proposal list. A space admin opens the proposal, scans it, ticks the boxes they want to archive, and the agent moves those pages to the Archive space. The agent never decides on its own. The reason is that low-signal pages occasionally hold the most important institutional knowledge, and a person who has been at the company long enough to remember those pages has to make the final call. The same principle shows up in how to add a human approval step to an agent.
What the agent does not do: it does not delete, edit page content, change permissions, change ownership, or move pages outside the workflow above. It does not file Jira tickets to chase owners. The single write action is moving an accepted page to the Archive space. Everything else is a read.
Sources of truth
Confluence only, plus an optional read of the directory service to detect orphan owners.
- Page metadata. Last edit timestamp, last edit author, version count, labels, space, parent page.
- Page analytics. Pageview count over the past six months. Confluence Cloud exposes this through the Atlassian Analytics API; Data Center uses an audit table.
- Inbound links. The Confluence search API returns pages that reference a target page. The agent counts those references, filters out references from the Archive space, and flags pages that no live page links to.
- Page labels. Specifically the Deprecated, Draft, and Do Not Archive labels. The last is a manual override admins can apply to keep a page out of any future proposal.
- Directory service (optional). Used only to determine whether the page owner is still an active employee. The agent reads only the active flag, not the full user record.
The agent ignores prose content. It does not read pages to decide whether they are stale. Reading content to decide on staleness sounds smart and produces ridiculous false positives, a comprehensive style guide does not need edits but is not stale. The metadata is enough. The same lesson is in how to give an agent multiple tools: pick the smallest read scope that gets the job done.
The five hygiene signals
Each page is evaluated against five signals. Each signal is binary. A page that trips two or more is proposed. A page that trips exactly one is added to a Watch list, which is shown to admins but not proposed.
Signal 1: No edits in twelve months
The page was last edited more than twelve months ago. This is the most common signal and on its own it is the weakest because some pages are deliberately not edited.
Signal 2: No views in six months
The page has zero recorded pageviews over the past six months across all users. This signal is the strongest. A page that no human has opened in six months is rarely critical.
Signal 3: Orphan owner
The page owner is no longer an active employee per the directory service. The agent does not look at who edited last, only who is named as the owner in Confluence's page metadata.
Signal 4: Broken inbound links
The page is not referenced by any live page in any whitelisted space. Pages in the Archive space do not count toward references. Pages with the Draft label do not count toward references.
Signal 5: Deprecated label
The page explicitly carries the Deprecated label. This signal alone is enough to propose, but it usually arrives with one of the other four signals anyway.
An admin can adjust the thresholds, twelve months becomes eighteen for an HR-policy space, six becomes twelve for a long-tail engineering reference space. Thresholds are stored per space. The two-of-five rule itself is not adjustable. Two-of-five is the rule because every team that tried one-of-five flooded admins with false positives and every team that tried three-of-five had pages with two damning signals that never got proposed.
Output: the bi-weekly proposal
Every other Friday at a time you set, the agent posts the proposal to a Confluence page in a space called Workspace Hygiene. The page is identical in structure each run.
- Headline. Pages proposed for archive this run. Pages on Watch this run.
- Proposed for archive. Table with one row per page. Columns: title, space, signals tripped, last edit date, last view date, owner, inbound links, link to the page, Accept checkbox.
- Watch. Table with the same columns. Used for admins who want to see almost-stale pages.
- Run summary. Spaces scanned, total pages reviewed, proposals this run, proposals accepted in the previous run.
The Accept checkbox is the only interactive control. When an admin saves the page after ticking checkboxes, the agent reads the saved diff, identifies which pages were ticked, and moves those pages to the Archive space. Untouched pages stay where they are. Pages that were proposed but not accepted carry over to the next proposal automatically. The same principle is described under how to roll back an agent action, easy to accept, easy to undo.
Guardrails
Three guardrails are non-negotiable.
- Archive, not delete. Accepted pages move to the Archive space. The Archive space retains the page id and version history. Inbound links remain functional. If a page was archived in error, an admin moves it back. Confluence's own undo gives the same outcome with one click, which is exactly the point of preferring archive over delete.
- Do Not Archive label is sacred. Any page carrying the Do Not Archive label is excluded from evaluation. It does not appear on Watch. It does not appear on Proposed. The agent does not check whether the label was applied recently. The label is the human's instruction and the agent obeys without quibbling.
- No writes during business hours. The agent batches the archive moves into a single window outside business hours of the workspace's primary region. This keeps a sudden flurry of archive moves from confusing on-call engineers who happen to be searching at the time.
Common mistakes
Auto-archiving on signal count alone. The first version did this. After three runs admins had archived two pages they wanted back and started ignoring the proposal entirely. Now the agent never archives without an explicit accept. The cost of false positives is high; the cost of one extra click is low.
Counting Archive pages in inbound link checks. Pages in Archive linking to a target page should not count as evidence that the target is alive. The agent excludes Archive references. Forgetting to exclude them once caused half the proposal list to look healthy when it was not.
Using prose content to decide. Reading the body to make a staleness call is wrong twice. It is slow, and it produces false positives on pages that are deliberately static. Metadata only.
Running too often. Daily and weekly cadences cause proposal fatigue. Two weeks is the practical cadence because admins can clear a proposal in fifteen minutes and pages do not become stale fast enough to matter at higher resolution. For the broader argument on cadence, see how to write the prompt for a recurring agent.
Not whitelisting spaces. Running across every space in the workspace catches the founder's scratch space, the HR personal-leave drafts, and the customer success private notes. Whitelist explicitly. Three to eight spaces is normal. New spaces have to be added by name.
Frequently asked questions
Can an AI agent clean up stale pages in Confluence?
Yes. The agent walks the spaces you whitelist, checks every page against five hygiene signals, and produces a proposed archive list every two weeks. It does not auto-delete or auto-archive. A space admin reviews the list and accepts pages in bulk.
What does the agent count as stale?
Five signals: no edits in twelve months, no page views in six months, owner has left the company, broken inbound links, and an explicit Status label of Deprecated. A page must trip at least two signals to be proposed. Tripping only one signal logs it as Watch but does not propose archive.
Why not let the agent archive automatically?
Because pages with low edit and view counts often hold institutional knowledge that nobody touches but everyone needs once a year. Audit notes, incident postmortems, vendor contract notes, the org chart at acquisition. The agent does not know which low-traffic pages are critical. A space admin does. Always.
Does this work for Confluence Cloud, Data Center, or both?
Both. On Cloud the agent uses the REST API v2 endpoints. On Data Center it uses REST API v1 with token authentication. The five-signal logic is identical. Only the auth setup differs. The setup wizard detects the host type from the URL.
Can the agent move pages to an archive space instead of deleting?
Yes, and that is the recommended action. After an admin accepts the proposal the agent moves accepted pages to an Archive space you nominate, preserving the page id, version history, and inbound links. Search filters can exclude the Archive space so it disappears from results but stays accessible.
Three takeaways before you close this tab
- Two-of-five signals. Anything less and false positives ruin admin trust within three runs.
- Propose, never archive. The agent's job is the list. The human's job is the click.
- Archive space, not delete. Reversibility is the whole point.
Sources
- Atlassian. Confluence Cloud REST API v2 reference. Tier 1.
- Atlassian. Confluence Data Center REST API reference. Tier 1.
- Atlassian. Atlassian Analytics, Confluence pageview data. Tier 1.
- Atlassian. Confluence space permissions and labels. Tier 1.