LLM Comparison
Llama 4 Scout vs GPT-5.5
Side-by-side specs, pricing & capabilities · Updated May 2026
Price vs Intelligence
Add to comparison
2/6 modelsSame tier:
| Organization | ||
| OpenTools Score | 17 87.4 | 90 5.1 |
| Family | Llama | GPT |
| Status | Current | Current |
| Release Date | Apr 2025 | Apr 2026 |
| Context Window | 328K tokens | 1.1M tokens |
| Input Price | $0.08/M tokens | $5.00/M tokens |
| Output Price | $0.30/M tokens | $30.00/M tokens |
| Pricing Notes | — | Cached input: $0.50/M tokens. Long context (>272K tokens): 2x input, 1.5x output. Batch API: 50% discount. Priority: 2.5x standard. |
| Capabilities | textvisioncode | textvisioncodetool-useextended-thinkingcomputer-useweb-search |
| Training Cutoff | — | December 2025 |
| Max Output | 16K tokens | 128K tokens |
| API Identifier | meta-llama/llama-4-scout | openai/gpt-5.5 |
| Benchmarks | ||
| MMLU | 79.6meta | 92.4openai |
| MMLU Pro | 74.3meta | — |
| GPQA | 57.2meta | — |
| MATH | 50.3meta | — |
| LiveCodeBench | 32.8meta | — |
| MMMU | 69.4meta | — |
| MGSM | 91meta | — |
| GPQA Diamond | — | 93.6openai |
| ARC-AGI-2 | — | 85openai |
| Terminal-Bench 2.0 | — | 82.7openai |
| SWE-bench Pro | — | 58.6openai |
| OSWorld-Verified | — | 78.7openai |
| BrowseComp | — | 84.4openai |
| MMMU-Pro | — | 81.2openai |
| FrontierMath Tier 4 | — | 35.4openai |
| HLE (with tools) | — | 52.2openai |
| GDPval | — | 84.9openai |
| Toolathlon | — | 55.6openai |
| CyberGym | — | 81.8openai |
| MRCR v2 512K-1M | — | 74openai |
| View Llama 4 Scout | View GPT-5.5 | |
Cost Calculator
Enter your expected monthly token usage to compare costs.
| Model | Input | Output | Total / mo | vs Best |
|---|---|---|---|---|
| Llama 4 ScoutCheapest | $0.08 | $0.15 | $0.23 | — |
| GPT-5.5 | $5.00 | $15.00 | $20.00 | +8596% |
Meta
Llama 4 Scout
Llama 4 Scout is a multimodal llm from Meta. Supports up to 327,680 token context window. Achieves 82.6% on MMLU. Available from $0.08/M input tokens.
OpenAI
GPT-5.5
GPT-5.5 is OpenAI's smartest and most intuitive model, built for agentic work like coding, research, and data analysis. It matches GPT-5.4 per-token latency while delivering higher intelligence with significantly fewer tokens. Supports a 1,050,000 token context window and five reasoning effort levels (none through xhigh).
More Comparisons
Looking for more AI models?
Browse All LLMs