Build your agent

Auto-index visited pages

Auto-index expands an agent's knowledge base automatically as visitors browse. When someone lands on a page the agent has never seen, that URL enters a crawl queue — without delaying the visitor's request.

Activation

Open the agent settings at /app/agents/{id}/settings and toggle Auto-index visited pages. The change applies on the next widget init — no republish required.

How it works

Every call to /v1/widget/init runs the candidate URL through seven validation checks:

Auto-index must be enabled on the agent.
URL must use http or https.
Host must not be private (RFC1918, loopback, link-local ranges).
Path must not match the private-path blocklist.
URL must not already be indexed.
Per-agent rate limit not exceeded.
Origin header must match the agent's allowed_origins.

When every check passes the runtime creates an auto-type source and queues a crawl job. The visitor's init response is unaffected.

Protection

The path blocklist prevents indexing sensitive areas of your site:

/admin
/login
/checkout
/profile
/account
/settings
/cart
/api

This stops authenticated-user content from leaking into the knowledge base. For apps with non-standard auth paths (for example /portal), consider disabling auto-index and adding the public URLs manually.

Rate limiting

The system enforces a ceiling of 30 crawls per agent per hour using a Redis token bucket. Popular pages can't drain crawl budget — once the bucket is empty, additional URLs are silently dropped until the next refill.

Management

Auto-indexed sources show up in the Sources list with an auto badge. They behave like any other source: previewable, re-indexable, deletable. Turning the toggle off stops new crawls but preserves everything already indexed.

Heads up. Auto-index respects allowed_origins, so a misconfigured origin list will silently stop new pages from being queued. If your knowledge base stops growing, check the origin list first.