Mistral Small 3 is a 24 B-parameter language model built for speed: ~150 tokens/s, a 32 k-token context window, and 81% MMLU—all released under the permissive Apache-2.0 license in both base and instruct variants.
For developers, it's an open, low-latency alternative to mid-size proprietary LLMs. Run it locally on a single RTX 4090/MacBook, fine-tune it into a domain expert, or use the cloud API for $0.10 / $0.30 per million input/output tokens to power chatbots, function-calling agents, or edge workloads without vendor lock-in.