Mistral Small 4: One Model to Rule Them All (And It's Open Source)

Remember when you needed three different models for three different tasks? Mistral is done with that. Their new Mistral Small 4 unifies the capabilities of Magistral (reasoning), Pixtral (multimodal), and Devstral (agentic coding) into a single, versatile model — and it's fully open source under Apache 2.0.

The Architecture

Small 4 is deceptively named. With 119 billion total parameters, it's hardly "small" in absolute terms. But thanks to a Mixture of Experts (MoE) architecture with 128 experts and only 4 active per token, only about 6-8 billion parameters fire for any given input. This makes it efficient enough to run on a couple of NVIDIA H200 nodes while delivering performance that competes with much larger models.

The 256K context window supports long-form interactions and document analysis, and the model natively handles both text and image inputs. No adapters, no separate vision model — it's all baked in.

Reasoning on Demand

Perhaps the most elegant feature is the configurable reasoning effort. Set reasoning_effort="none" for fast, lightweight responses like the old Mistral Small 3. Set it to "high" for deep, step-by-step reasoning equivalent to Magistral. You get to choose the tradeoff between speed and depth for every single request.

This is a genuinely useful design pattern. Most API calls don't need deep reasoning — they need fast, accurate responses. But when you do need the model to think carefully, you shouldn't have to switch to a different model. Small 4 makes this a runtime choice.

Performance

Mistral claims a 40% reduction in end-to-end completion time in latency-optimized setups, and 3x more requests per second in throughput-optimized configurations compared to Mistral Small 3. On benchmarks, it matches or surpasses models with significantly more active parameters while generating shorter outputs — meaning you pay less per useful response.

Shorter outputs with equivalent accuracy isn't just a benchmark trick — it directly translates to lower API costs in production.

Key Takeaways

119B parameters total, 6-8B active per token via MoE architecture
Unifies reasoning, multimodal, and coding in one model
Configurable reasoning effort from fast-chat to deep-thinking
Apache 2.0 open source — fully free for commercial use
256K context window with native image support

Our Take

Mistral Small 4 is the kind of release that makes the open-source AI community genuinely excited. A single model that handles chat, reasoning, vision, and coding with configurable depth — all under Apache 2.0 — is remarkably generous. The Nemotron Coalition membership and Forge launch happening simultaneously suggest Mistral is betting big on becoming the go-to open-source foundation for enterprise AI. If you're building AI applications and haven't looked at Mistral lately, now's the time.

The Architecture

Reasoning on Demand

Performance

Key Takeaways

Our Take

Sources

Related Articles

Hugging Face Rebuilds Transformers From the Inside Out for the MoE Era

LeRobot v0.5.0 Adds Humanoid Support — Open-Source Robotics Just Leveled Up

llama.cpp Joins Hugging Face — The Open-Source AI Power Move Nobody Saw Coming