Health Monitoring

Heartbeat & Metrics

Agents send a heartbeat every 60 seconds. Each heartbeat carries the full metric snapshot below. Metrics feed the alert rule engine, the Data Explorer, and the Smart Reports generator.

CPU

  • System-wide utilization %, per-core utilization % (array), CPU model name
  • Context switches/sec, interrupts/sec, current frequency (MHz)

Memory

  • Total, used, available, usage % — physical RAM
  • Committed bytes, page file total & in-use, non-paged pool, system cache

Disk I/O

  • Read bytes/sec, write bytes/sec, queue length, disk busy %
  • Per-drive: letter, label, file system, total/used/free GB, usage %

Network

  • Aggregate receive/send bytes/sec, cumulative totals, active TCP connections (v4 & v6)
  • Per-adapter breakdown: name, MAC, IP addresses, DNS servers, link speed (Mbps), adapter type, receive/send rates

System Information

  • OS edition, build number, feature update version (e.g. 23H2), last boot time, uptime (seconds)
  • System manufacturer, model, BIOS version, total installed RAM
  • Time zone, domain, domain-joined status, pending reboot indicator

Security Posture

  • Antivirus: product name, real-time protection enabled/disabled, definitions up to date, definition age (days)
  • Firewall: enabled/disabled
  • Windows Update: last check timestamp, automatic updates enabled
  • Encryption: BitLocker protection status on the system drive
  • UAC: enabled/disabled — Secure Boot: enabled/disabled

User Sessions

  • Currently logged-on interactive user(s), session type (Console / RDP)

Event Log Summary (optional, configurable)

  • Error and warning counts from the Application and System logs over the configured lookback window (default 1 hour)
  • Most recent critical error source and timestamp

Anomaly-Triggered Diagnostics

After each heartbeat, the agent runs an inflection detector that compares the current metric snapshot to the previous one. When a significant change is detected, the agent immediately flushes its buffered ETW event window to the cloud alongside the heartbeat — capturing the raw kernel events that were occurring when the anomaly happened.

Inflection points reported to the cloud include:

  • CPU spike (≥25 pp jump), CPU drop (≥30 pp), sustained high CPU (≥90% for 3+ heartbeats), single-core saturation
  • Memory freed/consumed ≥500 MB, usage jump ≥15 pp, sustained low available (<500 MB)
  • Disk queue spike (<2 → >10), disk utilisation ≥90%, write throughput >5×
  • Network receive/send spike >10×, TCP connection surge >500
  • Drive crossing 80% or 90% full, rapid fill >5% per heartbeat
  • CPU temperature spike ≥15°C or threshold ≥95°C, rapid battery drain ≥10%/heartbeat

Inflection data is stored in the AgentHealthMetrics table and queryable in the Data Explorer via the Metric Inflections (JSON) column.

Retention

Health metric rows follow the org's configured data retention tier (14 days free by default). See Data Retention for tier details.