AI Usage & Quotas

Where to Look

Open Settings and scroll to the AI Usage card. It shows your active workspace's AI consumption over the last 30 days and refreshes automatically when you switch workspaces. Viewing usage requires permission on the workspace — members without it see a permission notice instead of the numbers.

The card contains four sections:

  • Headline numbers — total tokens, total requests, alert enrichments (with how many succeeded), and two cache-hit rates (see below).
  • Daily tokens — a sparkline of token consumption per day, for spotting spikes at a glance.
  • Usage by type — a table breaking tokens and requests down per usage type (for example alert_enrichment vs. session-driven analysis), with each type's share of the total.
  • Channel reliability — the top alert channels by test volume over the last 30 days, with success rate and average latency. Populated by testing channels from the Alerts → Channels page.

What Counts as a Query

A query is one natural-language question sent to the AI for analysis — a live-session question, a Smart Reports request, a multi-agent correlation question, an automated alert enrichment, and so on. Each query consumes one unit from your organization's monthly pool. Tokens are the underlying cost unit your provider bills in; the AI Usage card reports both so you can see how token-heavy your average query is.

Usage is recorded per call at routing time and rolled up per organization, per day, and per usage type. The two streams that feed the card are alert enrichments and AI session usage (live sessions, reports, and other operator-driven analysis).

The Per-Organization Pool

Query quotas are pooled at the organization level. Every subscribed user's tier contributes its monthly query allotment to the shared pool — whether the subscription was self-purchased or admin-purchased. A workspace with two Business subscribers has a 10,000-query pool, drawn down by any member's AI activity.

See Pricing & Billing for tier sizes, agent seats, and how admin-purchased subscriptions pool.

BYOK vs. Platform Quota

If your workspace has configured Bring Your Own AI Key (BYOK), AI calls run against your own provider account instead of the platform pool:

  • BYOK calls do not consume workspace queries. Cost moves to your provider's per-token invoice.
  • Fallback calls do consume queries. When a BYOK call fails (revoked key, provider outage, rate limit), the router retries that single call on the platform key so the operator's session survives — and that retry draws from the platform pool. If your pool is draining while BYOK is configured, check the key's validation status on the Settings page.
  • The AI Usage card records tokens and requests either way, so observability is unchanged by BYOK.

Cache Savings

The two cache metrics in the headline row measure different things:

  • Result cache — reuse of a previously computed analysis or enrichment result. A hit means no AI call was made at all.
  • Prompt cache — provider-side prompt caching (cached prefix tokens reused across calls). A high prompt-cache rate means repeated context — system prompts, schemas, fleet metadata — is being served from cache at a steep token discount instead of being re-processed on every call. The card also shows how many cached tokens were read in the window.

Tip: A falling prompt-cache rate after a configuration change (for example, a custom model override) can quietly raise token spend. Watch this number when you change AI settings.