LLM Comparison
Grok 4.20 vs GPT-5.5
Side-by-side specs, pricing & capabilities · Updated May 2026
Price vs Intelligence
Add to comparison
2/6 modelsSame tier:
| Organization | ||
| OpenTools Score | 77 19.1 | 90 5.1 |
| Family | Grok | GPT |
| Status | Current | Current |
| Release Date | Mar 2026 | Apr 2026 |
| Context Window | 2.0M tokens | 1.1M tokens |
| Input Price | $2.00/M tokens | $5.00/M tokens |
| Output Price | $6.00/M tokens | $30.00/M tokens |
| Pricing Notes | Cache read: $0.2000/M tokens | Cached input: $0.50/M tokens. Long context (>272K tokens): 2x input, 1.5x output. Batch API: 50% discount. Priority: 2.5x standard. |
| Capabilities | textvisioncode | textvisioncodetool-useextended-thinkingcomputer-useweb-search |
| Training Cutoff | — | December 2025 |
| Max Output | — | 128K tokens |
| API Identifier | x-ai/grok-4.20 | openai/gpt-5.5 |
| Benchmarks | ||
| MMLU | — | 92.4openai |
| GPQA Diamond | — | 93.6openai |
| ARC-AGI-2 | — | 85openai |
| Terminal-Bench 2.0 | — | 82.7openai |
| SWE-bench Pro | — | 58.6openai |
| OSWorld-Verified | — | 78.7openai |
| BrowseComp | — | 84.4openai |
| MMMU-Pro | — | 81.2openai |
| FrontierMath Tier 4 | — | 35.4openai |
| HLE (with tools) | — | 52.2openai |
| GDPval | — | 84.9openai |
| Toolathlon | — | 55.6openai |
| CyberGym | — | 81.8openai |
| MRCR v2 512K-1M | — | 74openai |
| View Grok 4.20 | View GPT-5.5 | |
Cost Calculator
Enter your expected monthly token usage to compare costs.
| Model | Input | Output | Total / mo | vs Best |
|---|---|---|---|---|
| Grok 4.20Cheapest | $2.00 | $3.00 | $5.00 | — |
| GPT-5.5 | $5.00 | $15.00 | $20.00 | +300% |
xAI
Grok 4.20
Grok 4.20 is a multimodal llm from xAI. Supports up to 2,000,000 token context window. Available from $2.00/M input tokens.
OpenAI
GPT-5.5
GPT-5.5 is OpenAI's smartest and most intuitive model, built for agentic work like coding, research, and data analysis. It matches GPT-5.4 per-token latency while delivering higher intelligence with significantly fewer tokens. Supports a 1,050,000 token context window and five reasoning effort levels (none through xhigh).
More Comparisons
Looking for more AI models?
Browse All LLMs