AI Usage & Quotas

Where to Look

Open Settings and scroll to the AI Usage card. It shows your active workspace's AI consumption over the last 30 days and refreshes automatically when you switch workspaces. Viewing usage requires permission on the workspace — members without it see a permission notice instead of the numbers.

The card contains four sections:

Headline numbers — total tokens, total requests, alert enrichments (with how many succeeded), and two cache-hit rates (see below).
Daily tokens — a sparkline of token consumption per day, for spotting spikes at a glance.
Usage by type — a table breaking tokens and requests down per usage type (for example alert_enrichment vs. session-driven analysis), with each type's share of the total.
Channel reliability — the top alert channels by test volume over the last 30 days, with success rate and average latency. Populated by testing channels from the Alerts → Channels page.

What Counts as a Query

A query is the unit your organization's monthly pool is measured in — a live-session question, a Smart Reports request, a multi-agent correlation question, an automated alert enrichment, and so on. Lightweight interactive features draw one query per call. For the more token-heavy features, though, a single call can draw more than one query: rather than charge a flat one-per-call, ET Ducky converts each call's token usage into a query count and draws that many from the pool.

The conversion is weighted, because not all tokens cost the same. Input, output, and cached tokens are combined into a single weighted total (output tokens count for more than input, and cache-read tokens count for far less than either), and that total is divided by a fixed per-query budget of approximately 10,000 weighted tokens. The result, rounded, is how many queries the call consumes — with a minimum of one query per metered call, so nothing is ever free-riding. Each AI-driven web search a call performs adds roughly one more query on top. The exact weighting is an implementation detail and can change; treat the 10,000-token figure as approximate, and use the AI Usage card's token numbers as the source of truth for what a given feature actually cost.

In practice, a short interactive question is one query; a long, context-heavy analysis or a call that runs several web searches can be several. Tokens remain the underlying unit your provider bills in, and the AI Usage card reports both tokens and requests so you can see how token-heavy — and therefore how query-heavy — your average call is.

Usage is recorded per call at routing time and rolled up per organization, per day, and per usage type. The two streams that feed the card are alert enrichments and AI session usage (live sessions, reports, and other operator-driven analysis).

Features That Now Consume Queries

Several AI features used to run “for free” — they called the AI but did not draw down the query pool. They now consume queries through the same token→query conversion described above. If you built usage habits around these being free, expect them to show up in your pool now:

Install-script drafting — drafting a vendor install/update script for a Software Catalog app.
AI script review — the “Review with AI” pass that checks and corrects a drafted script.
AI resolve / discover — resolving an app's source (version, download URL, package id) with AI, including the “Resolve remaining with AI” bulk pass and “Suggest with AI” autofill. Web-search-backed resolution adds queries for each search, per the conversion above.
Automated remediation — AI-driven remediation actions.

Lightweight, flat one-query features (live-session questions, Smart Reports, alert enrichment) are unchanged — they still count as one query each and are not double-charged by the conversion.

The Per-Organization Pool

Query quotas are pooled at the organization level. Every subscribed user's tier contributes its monthly query allotment to the shared pool — whether the subscription was self-purchased or admin-purchased. A workspace with two Business subscribers has a 10,000-query pool, drawn down by any member's AI activity.

See Pricing & Billing for tier sizes, agent seats, and how admin-purchased subscriptions pool.

BYOK vs. Platform Quota

If your workspace has configured Bring Your Own AI Key (BYOK), AI calls run against your own provider account instead of the platform pool:

BYOK calls do not consume workspace queries. Cost moves to your provider's per-token invoice. This exemption applies to every AI feature, including the token→query-metered ones above — a call served by a healthy BYOK key is never converted into queries or drawn from the pool.
Fallback calls do consume queries. When a BYOK call fails (revoked key, provider outage, rate limit), the router retries that single call on the platform key so the operator's session survives — and that retry draws from the platform pool, metered by the same token→query conversion. If your pool is draining while BYOK is configured, check the key's validation status on the Settings page.
The AI Usage card records tokens and requests either way, so observability is unchanged by BYOK.

Cache Savings

The two cache metrics in the headline row measure different things:

Result cache — reuse of a previously computed analysis or enrichment result. A hit means no AI call was made at all.
Prompt cache — provider-side prompt caching (cached prefix tokens reused across calls). A high prompt-cache rate means repeated context — system prompts, schemas, fleet metadata — is being served from cache at a steep token discount instead of being re-processed on every call. The card also shows how many cached tokens were read in the window.

Tip: A falling prompt-cache rate after a configuration change (for example, a custom model override) can quietly raise token spend. Watch this number when you change AI settings.