Command Palette

Search for a command to run...

Model Intelligence Score

Benched.ai Editorial Team

A model intelligence score is a composite metric designed to summarize overall model capability across multiple benchmarks (MMLU, GSM8K, MBPP, etc.).

  Example Weighting Scheme

BenchmarkWeight
MMLU0.4
GSM8K0.3
HumanEval0.2
Winogrande0.1

Overall score = Σ weight × normalized benchmark.

  Design Trade-offs

  • Simple weighted average easy to communicate but hides domain weaknesses.
  • Too many benchmarks dilutes signal.
  • Proprietary weights reduce transparency.

  Current Trends (2025)

  • Adaptive weights adjust per user domain (code vs chat).
  • Leaderboards publish raw scores alongside composite.
  • Bootstrapped CIs show ±2 % noise intervals.

  Implementation Tips

  1. Normalize each benchmark to 0–100 before weighting.
  2. Publish YAML of weight config for reproducibility.
  3. Update weights annually to reflect task importance shifts.