Context compression reduces the token length of retrieved or user-supplied documents so they fit into a model's context window without sacrificing answer quality.
Compression Techniques
Quality vs Ratio Curve (sample)
Design Trade-offs
- Higher ratios save cost but may omit reasoning chains required for citations.
- Abstractive approaches risk hallucinating details not present in source.
- Extractive summaries preserve grounding but may miss implicit context.
Current Trends (2025)
- Hierarchical compressors first prune at chunk level, then sentence level.
- Semantic hashing enables constant-time deduplication before compression.
- Benchmarks like HARDC reduce evaluation to a single "preserve answerability" metric.
Implementation Tips
- Measure downstream task accuracy at each compression ratio to set safe limits.
- Cache compressed chunks alongside embeddings to avoid recomputation.
- Keep citation spans unchanged when regulatory compliance demands verbatim text.