SmolLM3: A Powerful 3B Multilingual Model with Long-Context Reasoning

Introducing SmolLM3: Small, Efficient, and Highly Capable

The AI community continues to push the boundaries of small language models (SLMs), proving that bigger isn’t always better. Today, we’re excited to introduce SmolLM3, a 3B-parameter model that outperforms competitors like Llama-3.2-3B and Qwen2.5-3B while rivaling larger 4B models (Qwen3 & Gemma3).

What makes SmolLM3 special?
✅ Multilingual (English, French, Spanish, German, Italian, Portuguese)
✅ 128K long-context support (via NoPE + YaRN)
✅ Dual-mode reasoning (think/no_think for explicit vs. direct answers)
✅ Fully open weights & training recipe

Base Model | Instruct Model

Why SmolLM3 Stands Out

Performance Highlights

Beats Llama-3.2-3B & Qwen2.5-3B across reasoning, math, and coding tasks
Competitive with 4B models (Qwen3-4B, Gemma3-4B) at lower compute cost
Strong multilingual ability (tested on Global MMLU, MLMM HellaSwag, Belebele)
128K context handling (via NoPE + YaRN extrapolation)

️ Key Architectural Improvements

Grouped Query Attention (GQA) – Reduces KV cache size without performance loss
NoPE (No Positional Embeddings in select layers) – Better long-context handling
Intra-Document Masking – Improves training stability for long sequences
Embedding Layer Optimization – Removes weight decay for smoother training

The Full Training Blueprint

Unlike proprietary models, we’re releasing everything:
Data mixtures (11.2T tokens across web, math, code)
Three-stage pretraining (progressive domain specialization)
Mid-training for reasoning & long-context adaptation
Post-training with SFT & Anchored Preference Optimization (APO)

Three-Stage Pretraining

Stage	Web	Code	Math	Focus
Stage 1 (0-8T)	85 percent	12 percent	3 percent	General capabilities
Stage 2 (8-10T)	75 percent	15 percent	10 percent	Higher-quality math/code
Stage 3 (10-11.1T)	63 percent	24 percent	13 percent	Reasoning & long-context

Dual-Mode Reasoning: Think vs. No-Think

SmolLM3 supports two response modes:

/think – Explicit reasoning traces (like Chain-of-Thought)
/no_think – Direct answers (faster inference)

Example:



messages = [
    {"role": "system", "content": "/think"},  # or "/no_think"
    {"role": "user", "content": "Explain quantum entanglement."}
]

messages = [

{"role": "system", "content": "/think"}, # or "/no_think"

{"role": "user", "content": "Explain quantum entanglement."}

]

Tool Calling Support



tools = [{
    "name": "get_weather",
    "description": "Fetch weather data",
    "parameters": {"city": {"type": "string"}}
}]

tools = [{

"name": "get_weather",

"description": "Fetch weather data",

"parameters": {"city": {"type": "string"}}

}]

⚡ How to Use SmolLM3

Install & Run



pip install -U transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("HuggingFaceTB/SmolLM3-3B")
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM3-3B")

# Generate with reasoning
messages = [{"role": "user", "content": "Solve 3x + 5 = 20"}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt")
outputs = model.generate(inputs, max_new_tokens=128)
print(tokenizer.decode(outputs[0]))

pip install -U transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("HuggingFaceTB/SmolLM3-3B")

tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM3-3B")

# Generate with reasoning

messages = [{"role": "user", "content": "Solve 3x + 5 = 20"}]

inputs = tokenizer.apply_chat_template(messages, return_tensors="pt")

outputs = model.generate(inputs, max_new_tokens=128)

print(tokenizer.decode(outputs[0]))

Recommended Sampling: temperature=0.6, top_p=0.95

Why This Matters

SmolLM3 proves that small models can be highly capable when optimized correctly. Key takeaways:
✅ Efficiency matters – 3B models can rival 4B with the right training
✅ Long-context is achievable – NoPE + YaRN enables 128K support
✅ Open weights & recipes accelerate research – No more black-box models!