MiniCPM

Name: MiniCPM
Author: OpenBMB

MiniCPMv4 / 4.1Current

OpenBMB(open source)

API Docs Playground

Context128K tokens

Price (In / Out)Free / Free

Training CutoffNot publicly specified in queued source

CategoryLarge Language Model

Max Output32.8K tokens

About MiniCPM

MiniCPM is OpenBMB’s ultra-efficient open language-model family for edge and end-device deployment. The MiniCPM4 and MiniCPM4.1 lines focus on fast local reasoning, while MiniCPM-SALA extends the family toward sparse/linear attention and million-token context research.

Capabilities

textcodereasoninglocal inference

Input Modalities

text

Output Modalities

text

Technical Details

API Identifier: OpenBMB/MiniCPM
Category: Large Language Model
Context Window: 128,000 tokens
Max Output Tokens: 32,768 tokens

Benchmarks

Performance scores for MiniCPM across standard benchmarks.

MiniCPM-SALA standard benchmark averageofficial-github-readme · May 2026

76.53%

MiniCPM-SALA long-context averageofficial-github-readme · May 2026

38.97%

MiniCPM-SALA 2048K extrapolation scoreofficial-github-readme · May 2026

81.6%

MiniCPM4.1 reasoning decoding speedupofficial-github-readme · May 2026

MiniCPM4 Jetson AGX Orin decoding speedup vs Qwen3-8Bofficial-github-readme · May 2026

Pricing

Token pricing for MiniCPM API usage.

Input Tokens

Free

per million tokens

Output Tokens

Free

per million tokens

Pricing Calculator

Input tokens/month

Output tokens/month

Input cost$0.00

Output cost$0.00

Estimated monthly cost$0.00

Open-weight GitHub and Hugging Face model family. There is no fixed vendor API price; runtime cost depends on the host, hardware, or inference provider.