Mistral Small 3.1 is a 24-billion-parameter, Apache-2.0 model that natively handles text and images, speaks 20 + languages, and accepts up to 128k tokens. It outperforms Gemma 3 and GPT-4o Mini on core text, vision, and multilingual benchmarks while decoding at roughly 150 tokens/s.
A single RTX 4090 or 32 GB Mac can host it, so you avoid cloud lock-in and per-token fees. Strong instruction following, low-latency function calling, and easy fine-tuning make it a solid open-source base for chatbots, on-device assistants, and multimodal agent workflows.