Output post-processing transforms raw model generations into the final form delivered to users or downstream systems.
Common Steps
Design Trade-offs
- Strict validators reduce errors but may over-reject valid replies.
- Aggressive filtering can censor creative language.
- Extra transforms add latency; batch where possible.
Current Trends (2025)
- Streaming post-processors handle partial JSON shards as tokens arrive.
- Explainability tags (
source_sentence
) added for citation tracking. - Inline code formatters auto-fix Python before execution.
Implementation Tips
- Keep raw model output for audit before mutations.
- Run post-processing in isolated sandbox to avoid code injection.
- Log distinct error types (schema, safety) to guide prompt fixes.