Skip to main content

Welcome to ACME Docs

ACME builds the reliability layer for AI agent systems. We help operators answer the question every team running agents eventually asks:
“My agents are running. But are they reliable?”

The Problem

Standard monitoring catches crashes. But agents fail in ways that don’t crash:
  • Silent failures — the agent looks healthy but stopped making progress
  • Stalls — waiting on a response that will never come
  • Drift — behavior changes session-to-session without obvious cause
These failures don’t trigger alerts. They just… happen. And you find out when a user complains or a workflow silently stops.

The ACME Stack

We provide tools across the reliability lifecycle:
PhaseToolWhat It Does
DetectRadCheckRead-only scan, 0-100 reliability score
TriageOCTriageEntry-point terminal for incident assessment
ProtectSentinelRuntime guardrails for stalls and silent failures
ControlAgent911Unified control plane with recovery guidance

Getting Started

The fastest path to understanding your agent reliability:

Free vs Paid

ACME follows a trust-first model:
  • Free: RadCheck (scan), OCTriage (triage), Lazarus Lite (backup check)
  • Paid: Sentinel (runtime protection), Agent911 (control plane)
Free tools build trust. Paid tools solve the problem at scale.