Error Rate Metrics

Benched.ai Editorial Team

Error rate metrics quantify how often an AI system fails to deliver an acceptable response. They are vital for SLOs, capacity planning, and regression alerts.

Common Metrics

Metric	Definition	Typical Target
HTTP 5xx rate	5xx responses ÷ total requests	<0.1 %
Model error rate	`error` field set in JSON	<1 %
Timeout rate	Requests exceeding SLA	<0.5 %
Safety block rate	Responses filtered by policy	— (monitor)

Aggregation Windows

Window	Usage
1 min	Auto-scaling, circuit breaker
5 min	Pager alerts
1 h	Dashboard trends
1 day	SLA reporting

Design Trade-offs

Short windows detect spikes quickly but are noisy.
Aggregating multiple error classes can mask high-severity categories like safety blocks.

Current Trends (2025)

Separating "model errors" (bad generations) from "infrastructure errors" clarifies ownership.
Client libraries expose retry-after headers to hide transient 429 spikes.

Implementation Tips

Tag errors with error_type (timeout, invalid_request, safety) for root-cause analysis.
Sample user context for high-volume endpoints to reduce log cost.
Alert on error budget burn rate, not raw counts.