Platform admin
Health & failed jobs
Glozr's platform admin runs seven automated health checks on every page load. The status pill in the header summarises platform configuration and operational health — green when everything works, red when something needs immediate attention.
The seven checks
- Failed jobs (critical) — fires when the
failed_jobstable exceeds the configured threshold. - Stripe configuration — validates that the API key works and the webhook secret is present.
- LLM provider — confirms at least one provider (OpenAI or Cloudflare) is configured and reachable.
- Vector store — ensures the embedding store is accessible from the app servers.
- Mail service — checks that outbound transactional mail can dispatch.
- Reverb — verifies the websocket service powering the live inbox.
- Cache — tests Redis connectivity for response-time-sensitive paths.
Health score
The overall score has four tiers based on the percentage of checks passing:
| Tier | Threshold | Color |
|---|---|---|
| Strong | ≥ 90% passing | green |
| Stable | 70–89% | blue |
| Watchlist | 50–69% | amber |
| Critical | < 50% or any critical-severity check failing | red |
Failed jobs
The /admin/jobs/failed page lists every failed background job with its name, exception, and a retry button. Three queue types account for the vast majority of failures:
- Crawl jobs. Usually upstream — the customer's source site returned a 5xx or timed out.
- Index jobs. Usually rate-limited by the embedding provider. Will succeed on retry once the bucket refills.
- Default queue. Everything else — investigate the exception column.
Retry individual jobs from each row, or use the header buttons to bulk-retry every failure of a given type.
Webhook reliability
Webhook delivery has two semantics depending on the source:
- Lead-capture webhooks are fire-and-forget. A 5xx response from your endpoint is logged but not retried.
- Workflow webhooks retry up to three times with exponential backoff before landing in
/admin/jobs/failed.
Note. Bulk-retrying thousands of failed jobs at once can spike CPU and queue pressure. Prefer batching: retry by job type, watch Horizon, then move on.