Mistral Small 4: One Model to Rule Them All (And It's Open Source)
Remember when you needed three different models for three different tasks? Mistral is done with that. Their new Mistral Small 4 unifies the capabilities of Magistral (reasoning), Pixtral (multimodal), and Devstral (agentic coding) into a single, versatile model — and it's fully open source under Apache 2.0.
The Architecture
Small 4 is deceptively named. With 119 billion total parameters, it's hardly "small" in absolute terms. But thanks to a Mixture of Experts (MoE) architecture with 128 experts and only 4 active per token, only about 6-8 billion parameters fire for any given input. This makes it efficient enough to run on a couple of NVIDIA H200 nodes while delivering performance that competes with much larger models.
The 256K context window supports long-form interactions and document analysis, and the model natively handles both text and image inputs. No adapters, no separate vision model — it's all baked in.
Reasoning on Demand
Perhaps the most elegant feature is the configurable reasoning effort. Set reasoning_effort="none" for fast, lightweight responses like the old Mistral Small 3. Set it to "high" for deep, step-by-step reasoning equivalent to Magistral. You get to choose the tradeoff between speed and depth for every single request.
This is a genuinely useful design pattern. Most API calls don't need deep reasoning — they need fast, accurate responses. But when you do need the model to think carefully, you shouldn't have to switch to a different model. Small 4 makes this a runtime choice.
Performance
Mistral claims a 40% reduction in end-to-end completion time in latency-optimized setups, and 3x more requests per second in throughput-optimized configurations compared to Mistral Small 3. On benchmarks, it matches or surpasses models with significantly more active parameters while generating shorter outputs — meaning you pay less per useful response.
Shorter outputs with equivalent accuracy isn't just a benchmark trick — it directly translates to lower API costs in production.
Key Takeaways
- 119B parameters total, 6-8B active per token via MoE architecture
- Unifies reasoning, multimodal, and coding in one model
- Configurable reasoning effort from fast-chat to deep-thinking
- Apache 2.0 open source — fully free for commercial use
- 256K context window with native image support
Our Take
Mistral Small 4 is the kind of release that makes the open-source AI community genuinely excited. A single model that handles chat, reasoning, vision, and coding with configurable depth — all under Apache 2.0 — is remarkably generous. The Nemotron Coalition membership and Forge launch happening simultaneously suggest Mistral is betting big on becoming the go-to open-source foundation for enterprise AI. If you're building AI applications and haven't looked at Mistral lately, now's the time.