SmolLM3: A Powerful 3B Multilingual Model with Long-Context Reasoning

SmolLM3 SmolLM3

Introducing SmolLM3: Small, Efficient, and Highly Capable

The AI community continues to push the boundaries of small language models (SLMs), proving that bigger isn’t always better. Today, we’re excited to introduce SmolLM3, a 3B-parameter model that outperforms competitors like Llama-3.2-3B and Qwen2.5-3B while rivaling larger 4B models (Qwen3 & Gemma3).

What makes SmolLM3 special?
✅ Multilingual (English, French, Spanish, German, Italian, Portuguese)
✅ 128K long-context support (via NoPE + YaRN)
✅ Dual-mode reasoning (think/no_think for explicit vs. direct answers)
✅ Fully open weights & training recipe

 Base Model | Instruct Model


Why SmolLM3 Stands Out

Performance Highlights

  • Beats Llama-3.2-3B & Qwen2.5-3B across reasoning, math, and coding tasks

  • Competitive with 4B models (Qwen3-4B, Gemma3-4B) at lower compute cost

  • Strong multilingual ability (tested on Global MMLU, MLMM HellaSwag, Belebele)

  • 128K context handling (via NoPE + YaRN extrapolation)

️ Key Architectural Improvements

  1. Grouped Query Attention (GQA) – Reduces KV cache size without performance loss

  2. NoPE (No Positional Embeddings in select layers) – Better long-context handling

  3. Intra-Document Masking – Improves training stability for long sequences

  4. Embedding Layer Optimization – Removes weight decay for smoother training


The Full Training Blueprint

Unlike proprietary models, we’re releasing everything:
Data mixtures (11.2T tokens across web, math, code)
Three-stage pretraining (progressive domain specialization)
Mid-training for reasoning & long-context adaptation
Post-training with SFT & Anchored Preference Optimization (APO)

Three-Stage Pretraining

StageWebCodeMathFocus
Stage 1 (0-8T)85 percent12 percent3 percentGeneral capabilities
Stage 2 (8-10T)75 percent15 percent10 percentHigher-quality math/code
Stage 3 (10-11.1T)63 percent24 percent13 percentReasoning & long-context

Dual-Mode Reasoning: Think vs. No-Think

SmolLM3 supports two response modes:

  1. /think – Explicit reasoning traces (like Chain-of-Thought)

  2. /no_think – Direct answers (faster inference)

Example:

Tool Calling Support

⚡ How to Use SmolLM3

Install & Run

Recommended Sampling: temperature=0.6top_p=0.95


Why This Matters

SmolLM3 proves that small models can be highly capable when optimized correctly. Key takeaways:
✅ Efficiency matters – 3B models can rival 4B with the right training
✅ Long-context is achievable – NoPE + YaRN enables 128K support
✅ Open weights & recipes accelerate research – No more black-box models!


Resources

 GitHub Repo (Training configs & eval code)
Model Collection (Quantized checkpoints)
Training Logs

What will YOU build with SmolLM3? Let us know in the comments!

Leave a Reply

Your email address will not be published. Required fields are marked *

Home
Courses
Services
Search