Interleaved content mixes different media types—text, images, audio, code—within the same prompt or response, as supported by modern multimodal models.
Use Cases
Formatting Guidelines (Markdown)
- Use fenced code blocks ```python for syntax.
- Embed images via markdown

when not using URLs. - Reserve alt text for accessibility; models may read it.
Design Trade-offs
- More modalities improve expressiveness but risk hitting context window.
- Mixed prompts require tokenizer alignment; use base64 for images if binary unsupported.
- Safety filters must inspect each modality separately.
Current Trends (2025)
- Unified tokenizers encode image patches and text in one stream.
- Chunked upload APIs accept up to 10 images interleaved with text.
- Content policy models evaluate multimodal messages holistically.
Implementation Tips
- Limit total image pixels to stay within 20 % of token budget.
- Store images in blob store and reference by CID to avoid bloat.
- Provide media-type list in system prompt so model can plan output.