Watchdog
Watchdog continuously verifies that your agents are not just alive — but actually making progress. It monitors gateway and agent liveness, probes for stall signatures, and escalates before silent throughput collapse becomes an outage.The Gap Between “Running” and “Working”
Process monitors and uptime checkers answer one question: is the process alive? That’s necessary but insufficient for agents. An agent can be:- Running but stalled — process is up, no work happening
- Running but looping — executing continuously, not advancing
- Running but silent — heartbeat looks fine, outputs have stopped
What Watchdog Does
Gateway Probes
Actively probes gateway health at regular intervals. Not passive — verified.
Stall Signature Detection
Recognizes patterns consistent with execution stalls before they cause failures.
Throughput Monitoring
Tracks whether work is actually completing, not just whether the agent is active.
Escalation Pipeline
Routes detections to Sentinel, Agent911, and configured alert channels.
How Watchdog Works
Watchdog runs as a persistent background monitor with two probe types:Liveness Probes
Active checks that the gateway and agent runtime are responsive:Progress Probes
Passive observation that meaningful work is completing:Detection Scope
| Detection | Trigger |
|---|---|
| Missed heartbeat | No liveness response for N consecutive intervals |
| Stall | No meaningful progress for threshold window |
| Gateway unresponsive | Probe timeout with no response |
| Silent throughput collapse | Outputs dropped to zero without process failure |
| Loop detected | Repeated identical outputs N times in succession |
| Abnormal latency | Response times exceed established baseline by threshold |
| Probe SLA breach | Gateway responded but took longer than configured SLA |
Quick Start
Sample Output
Configuration
CLI Reference
Watchdog vs Sentinel
Watchdog and Sentinel are complementary:| Tool | What It Watches | How |
|---|---|---|
| Watchdog | Gateway liveness, heartbeat, throughput | Active probes + passive monitoring |
| Sentinel | Runtime anomalies, stalls, behavioral patterns | Continuous passive observation |