Mistral Inference - Run Mistral AI Models Locally on Your Hardware
Last updated May 4, 2026
Key capabilities that make Mistral Inference stand out.
Download and run any Mistral AI open-weight model locally
CLI demo tool for quick model testing (mistral-demo)
Python API for programmatic inference
Multi-GPU support via torchrun for large models
Function calling support across all models
Hugging Face Hub integration for model weights
Optimized with xformers for efficient attention computation
Support for instruction-tuned and base model variants
Safetensors format for fast and safe model loading
Extended 32K+ token vocabulary on newer model versions
Who benefits most from this tool.
Run Mistral models locally for prototyping, testing, and building LLM-powered applications without API dependency.
Evaluate Mistral model performance, fine-tune open-weight variants, and conduct AI safety research on local hardware.
Deploy Mistral models on-premise for applications requiring data sovereignty and zero data leakage to external APIs.
Large language models from the same organization.
| Model | Context Window | Price (In / Out per M) |
|---|---|---|
| Mistral Small 4Current | 262K | $0.15 / $0.60 |
| Mistral Small CreativeCurrent | 33K | $0.10 / $0.30 |
| Devstral 2 2512Current | 262K | $0.40 / $2.00 |
| Ministral 3 14B 2512Current | 262K | $0.20 / $0.20 |
Latest coverage and updates.
If you've used this product, share your thoughts with other builders