Command Palette

Search for a command to run...

Error Rate Metrics

Benched.ai Editorial Team

Error rate metrics quantify how often an AI system fails to deliver an acceptable response. They are vital for SLOs, capacity planning, and regression alerts.

  Common Metrics

MetricDefinitionTypical Target
HTTP 5xx rate5xx responses ÷ total requests<0.1 %
Model error rateerror field set in JSON<1 %
Timeout rateRequests exceeding SLA<0.5 %
Safety block rateResponses filtered by policy— (monitor)

  Aggregation Windows

WindowUsage
1 minAuto-scaling, circuit breaker
5 minPager alerts
1 hDashboard trends
1 daySLA reporting

  Design Trade-offs

  • Short windows detect spikes quickly but are noisy.
  • Aggregating multiple error classes can mask high-severity categories like safety blocks.

  Current Trends (2025)

  • Separating "model errors" (bad generations) from "infrastructure errors" clarifies ownership.
  • Client libraries expose retry-after headers to hide transient 429 spikes.

  Implementation Tips

  1. Tag errors with error_type (timeout, invalid_request, safety) for root-cause analysis.
  2. Sample user context for high-volume endpoints to reduce log cost.
  3. Alert on error budget burn rate, not raw counts.