Platform admin

Health & failed jobs

Glozr's platform admin runs seven automated health checks on every page load. The status pill in the header summarises platform configuration and operational health — green when everything works, red when something needs immediate attention.

The seven checks

Failed jobs (critical) — fires when the failed_jobs table exceeds the configured threshold.
Stripe configuration — validates that the API key works and the webhook secret is present.
LLM provider — confirms at least one provider (OpenAI or Cloudflare) is configured and reachable.
Vector store — ensures the embedding store is accessible from the app servers.
Mail service — checks that outbound transactional mail can dispatch.
Reverb — verifies the websocket service powering the live inbox.
Cache — tests Redis connectivity for response-time-sensitive paths.

Health score

The overall score has four tiers based on the percentage of checks passing:

Tier	Threshold	Color
Strong	≥ 90% passing	green
Stable	70–89%	blue
Watchlist	50–69%	amber
Critical	< 50% or any critical-severity check failing	red

Failed jobs

The /admin/jobs/failed page lists every failed background job with its name, exception, and a retry button. Three queue types account for the vast majority of failures:

Crawl jobs. Usually upstream — the customer's source site returned a 5xx or timed out.
Index jobs. Usually rate-limited by the embedding provider. Will succeed on retry once the bucket refills.
Default queue. Everything else — investigate the exception column.

Retry individual jobs from each row, or use the header buttons to bulk-retry every failure of a given type.

Webhook reliability

Webhook delivery has two semantics depending on the source:

Lead-capture webhooks are fire-and-forget. A 5xx response from your endpoint is logged but not retried.
Workflow webhooks retry up to three times with exponential backoff before landing in /admin/jobs/failed.

Note. Bulk-retrying thousands of failed jobs at once can spike CPU and queue pressure. Prefer batching: retry by job type, watch Horizon, then move on.