AgentOps — Four-Pillar Operating Checklist

One page: the four AgentOps pillars for running a video AI agent in production — observability (trace every step; one interaction ≈ 40-75 spans; OpenTelemetry GenAI spans); evaluation (golden set; end-to-end / trajectory / component levels; LLM-as-judge on a 0-5 scale; pass^k reliability — 90% per step holds at 43% over 8 steps); cost (cost-per-successful-task, not per-token; 3-8 calls and 50k-200k tokens per task; route, cache, batch); and security (OWASP ASI Top 10, indirect prompt injection, least-agency, input guardrails, human-in-the-loop gate). Plus a scoping checklist of questions to ask before shipping.

Download free PDF

PDF