Build your agent
Knowledge sources
Everything the agent knows about your business comes from sources you add to its knowledge base. Sources flow through a single ingestion pipeline — extraction, chunking, embedding — and then become retrievable context for every visitor message.
Source types
Glozr supports a wide range of source formats:
- URL / Sitemap / Feed — web content crawled and extracted.
- Text — direct paste of FAQs, snippets, or scripts.
- Notion / Google Docs / Google Sheets — API-based document ingestion.
- SQL — direct PDO read-only
SELECTfrom MySQL or PostgreSQL databases. - Files — PDF, DOCX, XLSX uploads parsed via Cloudflare Workers AI.
- WooCommerce — synced via the WordPress companion plugin.
- Auto — pages visitors land on, queued by auto-index.
SQL database sources
SQL sources are powerful and the runtime enforces strict guardrails:
- Hosts must pass SSRF validation — internal IPs (RFC1918, loopback, link-local) are blocked.
- Queries are limited to
SELECTstatements only. Anything else is rejected before execution. - Connections run inside read-only transactions.
- A 5,000-row cap per sync prevents resource exhaustion on accidental wide queries.
The host, port, database name, username, and password are stored encrypted via Laravel's encrypted:array cast (AES-256-GCM). Credentials are never returned to the dashboard after creation — you re-enter them to rotate.
Processing pipeline
Every source goes through the same three stages:
- Extraction — fetch / parse the raw content into clean text plus metadata (title, URL, locale).
- Chunking — a recursive splitter that prefers semantic boundaries: it splits on headings and blank lines first, then packs paragraphs up to ~2000 characters per chunk.
- Embedding — chunks are vectorized and upserted into the configured vector store (Cloudflare Vectorize or OpenAI / Postgres pgvector depending on the provider).
Limitations
- Cloudflare Vectorize has eventual consistency — expect a 30–60 second propagation delay after a sync completes before chunks are retrievable.
- Spreadsheet (XLSX) uploads require Cloudflare Workers AI. There is no local fallback parser today.
- Re-syncing is currently manual. URL and database sources don't poll for changes — you trigger a resync from the Sources index when content changes.
Note. Sources can be previewed, reindexed, or deleted at any time from /app/agents/{id}/sources. Deleting a source removes its chunks from the vector store on the next sync sweep.