Command Palette

Search for a command to run...

Instruction Following

Benched.ai Editorial Team

Instruction following is a model's ability to comply with directives given in natural language while respecting system and safety constraints.

  Evaluation Benchmarks

BenchmarkDomainMetricGPT-4o Score
MT-BenchChat tasksPairwise win rate88 %
AlpacaEval 2GeneralPass@192 %
HELM InstructionMulti-domainExact match74 %

  Prompt Anatomy

SectionPurposeExample
SystemGlobal role & rules"You are a legal assistant..."
UserTask description"Summarize contract §3-§5."
Few-shotDemonstrationsQ/A pairs
Tool call schemaStructured output channelJSON spec

  Design Trade-offs

  • Adding more few-shot examples boosts accuracy but consumes tokens.
  • High temperature encourages creativity but risks deviating from instructions.
  • Overly strict system prompts can override user intent.

  Current Trends (2025)

  • Hierarchical prompting reserves first 200 tokens for safety and style then appends dynamic context.
  • Instruction-fine-tuned open models (Hermes-2 Pro) reach GPT-3.5 compliance at 1/10th cost.
  • LLMs self-generate synthetic instruction datasets for continual tuning.

  Implementation Tips

  1. Test with adversarial paraphrases to ensure robustness.
  2. Keep safety rules earlier than task to maintain precedence.
  3. Log refused requests separately to improve policy coverage.