Glozr docs

Platform admin

Health & failed jobs

Glozr's platform admin runs seven automated health checks on every page load. The status pill in the header summarises platform configuration and operational health — green when everything works, red when something needs immediate attention.

The seven checks

  1. Failed jobs (critical) — fires when the failed_jobs table exceeds the configured threshold.
  2. Stripe configuration — validates that the API key works and the webhook secret is present.
  3. LLM provider — confirms at least one provider (OpenAI or Cloudflare) is configured and reachable.
  4. Vector store — ensures the embedding store is accessible from the app servers.
  5. Mail service — checks that outbound transactional mail can dispatch.
  6. Reverb — verifies the websocket service powering the live inbox.
  7. Cache — tests Redis connectivity for response-time-sensitive paths.

Health score

The overall score has four tiers based on the percentage of checks passing:

TierThresholdColor
Strong≥ 90% passinggreen
Stable70–89%blue
Watchlist50–69%amber
Critical< 50% or any critical-severity check failingred

Failed jobs

The /admin/jobs/failed page lists every failed background job with its name, exception, and a retry button. Three queue types account for the vast majority of failures:

  • Crawl jobs. Usually upstream — the customer's source site returned a 5xx or timed out.
  • Index jobs. Usually rate-limited by the embedding provider. Will succeed on retry once the bucket refills.
  • Default queue. Everything else — investigate the exception column.

Retry individual jobs from each row, or use the header buttons to bulk-retry every failure of a given type.

Webhook reliability

Webhook delivery has two semantics depending on the source:

  • Lead-capture webhooks are fire-and-forget. A 5xx response from your endpoint is logged but not retried.
  • Workflow webhooks retry up to three times with exponential backoff before landing in /admin/jobs/failed.

Note. Bulk-retrying thousands of failed jobs at once can spike CPU and queue pressure. Prefer batching: retry by job type, watch Horizon, then move on.