Mistral Inference
By Mistral AIMistral Inference - Run Mistral AI Models Locally on Your Hardware
Last updated May 4, 2026
What is Mistral Inference?
Mistral Inference's Top Features
Key capabilities that make Mistral Inference stand out.
Download and run any Mistral AI open-weight model locally
CLI demo tool for quick model testing (mistral-demo)
Python API for programmatic inference
Multi-GPU support via torchrun for large models
Function calling support across all models
Hugging Face Hub integration for model weights
Optimized with xformers for efficient attention computation
Support for instruction-tuned and base model variants
Safetensors format for fast and safe model loading
Extended 32K+ token vocabulary on newer model versions
Use Cases
Who benefits most from this tool.
AI Developers
Run Mistral models locally for prototyping, testing, and building LLM-powered applications without API dependency.
Researchers
Evaluate Mistral model performance, fine-tune open-weight variants, and conduct AI safety research on local hardware.
Privacy-Conscious Teams
Deploy Mistral models on-premise for applications requiring data sovereignty and zero data leakage to external APIs.
Tags
Mistral Inference's Pricing
AI Models by Mistral AI
Large language models from the same organization.
| Model | Context Window | Price (In / Out per M) |
|---|---|---|
| Mistral Small 4Current | 262K | $0.15 / $0.60 |
| Mistral Small CreativeCurrent | 33K | $0.10 / $0.30 |
| Devstral 2 2512Current | 262K | $0.40 / $2.00 |
| Ministral 3 14B 2512Current | 262K | $0.20 / $0.20 |
User Reviews
Share your thoughts
If you've used this product, share your thoughts with other builders