Build your agent
Auto-index visited pages
Auto-index expands an agent's knowledge base automatically as visitors browse. When someone lands on a page the agent has never seen, that URL enters a crawl queue — without delaying the visitor's request.
Activation
Open the agent settings at /app/agents/{id}/settings and toggle Auto-index visited pages. The change applies on the next widget init — no republish required.
How it works
Every call to /v1/widget/init runs the candidate URL through seven validation checks:
- Auto-index must be enabled on the agent.
- URL must use
httporhttps. - Host must not be private (RFC1918, loopback, link-local ranges).
- Path must not match the private-path blocklist.
- URL must not already be indexed.
- Per-agent rate limit not exceeded.
Originheader must match the agent'sallowed_origins.
When every check passes the runtime creates an auto-type source and queues a crawl job. The visitor's init response is unaffected.
Protection
The path blocklist prevents indexing sensitive areas of your site:
/admin/login/checkout/profile/account/settings/cart/api
This stops authenticated-user content from leaking into the knowledge base. For apps with non-standard auth paths (for example /portal), consider disabling auto-index and adding the public URLs manually.
Rate limiting
The system enforces a ceiling of 30 crawls per agent per hour using a Redis token bucket. Popular pages can't drain crawl budget — once the bucket is empty, additional URLs are silently dropped until the next refill.
Management
Auto-indexed sources show up in the Sources list with an auto badge. They behave like any other source: previewable, re-indexable, deletable. Turning the toggle off stops new crawls but preserves everything already indexed.
Heads up. Auto-index respects allowed_origins, so a misconfigured origin list will silently stop new pages from being queued. If your knowledge base stops growing, check the origin list first.