Operate
Observability
Production telemetry runs on three legs: Sentry for errors, OpenTelemetry traces for the hot path, and Horizon for queue health. The platform admin surfaces a Site Health pill that rolls all three up to a single colour.
Overview
Each leg answers a different question. Sentry tells you what broke. OpenTelemetry tells you where the time went. Horizon tells you what's piling up. All three are optional — the app runs without any of them — but production deployments should enable at least Sentry and Horizon.
Error tracking — Sentry
When SENTRY_LARAVEL_DSN is set, unhandled exceptions are forwarded with full stack traces. Workspace id and agent id are attached as Sentry tags so errors can be scoped to a specific tenant. PII is scrubbed via Sentry's data-scrubbing settings — configure those server-side.
Performance tracing — OpenTelemetry
When OTEL_EXPORTER_OTLP_ENDPOINT is set, the request pipeline is instrumented end-to-end. The headline metric is p95 of rag.llm.first_token — the time from request receipt to the first SSE token. The target is under 1 second.
Other named spans worth alerting on:
rag.retrieve— vector search + rerank.rag.prompt.build— system-prompt assembly.rag.llm.stream— full streaming duration.rag.persist— the post-stream persistence batch.
Queue health — Horizon
Horizon's dashboard lives at /horizon in the dashboard (super-admin only). Three queues carry the workload:
default— miscellaneous jobs (notifications, webhook dispatch).crawl— source crawling.index— chunking, embedding, vector upsert.
Priority metrics
| Metric | Target |
|---|---|
| First-token latency (p95) | < 1 s |
| Full response latency (p95) | < 5 s |
| Queue depth (all queues) | Drains within minutes |
| Failed jobs | 0 in steady state |
| Vector store query latency (p95) | < 100 ms |
Site Health pill
The platform admin header shows a Site Health pill that aggregates the configured signals into a single colour:
- Green — everything operational, no configuration warnings.
- Amber — configuration warnings (missing keys, no observability target, no mail credentials). Clicking the pill links straight to the remediation page.
- Red — the app is up but a critical subsystem (LLM provider, vector store, queue worker) is failing health checks.
Note. If you only set up one observability tool, pick Sentry. The error stream is the most actionable signal you'll have in the first weeks of running Glozr in production.