OpenToolslogo
ToolsExpertsSubmit a Tool
AdvertiseLearn AI
  1. home
  2. tools
  3. whichllm
whichllm screenshot

whichllm

Local AI ToolsFree

whichllm local LLM picker for your actual AI hardware

Last updated Jun 26, 2026

Claim Tool

What is whichllm?

whichllm is a command-line model picker for people running LLMs locally. Instead of asking only “what is the biggest model I can fit,” it tries to answer the more useful question: which local model should I actually run on this machine? The README says it ranks models by real, recency-aware benchmarks rather than parameter count, while also accounting for VRAM fit and likely speed. The workflow is simple. Users can run it through uvx, install it with uv, Homebrew, or pip, and then ask for recommendations for the detected machine or a simulated GPU. The README shows examples for RTX 4090, multi-GPU workstations, GPU-only filtering, speed thresholds, markdown output, JSON output, upgrade comparisons, planning hardware for a target model, starting a chat, and printing copy-paste Python. That makes it useful both as an interactive CLI and as a small automation primitive. whichllm is builder-relevant because local model choice is messy. A model can technically load but be too slow, too old, or worse on the tasks a developer cares about. The project says it combines sources such as LiveBench, Artificial Analysis, Aider, multimodal or vision benchmarks, Chatbot Arena ELO, and Open LLM Leaderboard data. The exact scoring should still be reviewed from the repository, but the tool’s goal is clear: translate benchmark and hardware data into a practical shortlist. Pricing is free as an open-source project, but local LLMs still have hardware costs. Teams using whichllm should treat it as a decision aid, not an absolute ranking. Validate recommendations with your own prompts, latency targets, memory limits, and deployment stack before buying GPUs or standardizing on a model family. The tool is strongest during exploration and hardware planning. A developer can quickly compare whether a smaller current model beats an older larger model on a given card, or whether a GPU upgrade changes the set of practical options. Scriptable JSON output also means teams can fold model recommendations into internal docs, benchmark reports, or setup scripts. The limitation is that no benchmark mix perfectly predicts your workload. Code generation, retrieval, tool use, chat quality, multimodal tasks, and throughput each stress models differently. Use whichllm to narrow the field, then run your own eval prompts and measure local latency, memory pressure, and failure modes before committing to a default model. whichllm also helps reduce wasted download time. Local model files are large, and trial-and-error selection can burn hours on models that barely fit or run too slowly. A hardware-aware shortlist gives developers a better first pass before they spend time pulling weights and tuning runtimes.

whichllm's Top Features

Key capabilities that make whichllm stand out.

Ranks local LLMs by VRAM fit, expected speed, and benchmark quality

Simulates GPUs such as RTX 4090, multi-GPU workstations, or custom VRAM limits

Supports one-off uvx use plus uv, Homebrew, and pip installation paths

Outputs Markdown or JSON for scripts, Slack, Discord, and documentation

Includes planning commands to find the GPU needed for a target model

Use Cases

Who benefits most from this tool.

Local AI builders

Find models that fit your GPU and are likely fast enough before downloading weights.

Hardware buyers

Compare upgrade scenarios and plan which GPU is needed for a target local model.

Tooling teams

Use markdown or JSON output to document local model recommendations in internal workflows.

Explore Top AI Use Cases

Tags

local-llmllm-benchmarkhardwaregpuvramcliollamahuggingfaceggufopen-source

whichllm's Pricing

Free plan available

User Reviews

Share your thoughts

If you've used this product, share your thoughts with other builders

Recent reviews

Frequently Asked Questions

What is whichllm?
whichllm is a CLI that recommends local LLMs based on your hardware, model fit, speed, and benchmark quality.
How do you run whichllm?
The README shows one-off use with uvx and install options through uv, Homebrew, or pip.
Does whichllm only use parameter count?
No. The project specifically says it ranks by real, recency-aware benchmarks rather than parameter count alone.
Can whichllm simulate other GPUs?
Yes. The README shows GPU simulation, multi-GPU scenarios, VRAM overrides, and upgrade comparisons.

Footer

Company name

The right AI tool is out there. We'll help you find it.

LinkedInX

Knowledge Hub

  • News
  • Resources
  • Newsletter
  • Blog
  • AI Tool Reviews
  • YouTube Summary
  • YouTube Transcript Generator

Industry Hub

  • AI Companies
  • AI Tools
  • AI Models
  • MCP Servers
  • AI Tool Categories
  • Top AI Use Cases

For Builders

  • Submit a Tool
  • Experts & Agencies
  • Advertise
  • Compare Tools
  • Favourites

Legal

  • Privacy Policy
  • Terms of Service

© 2026 OpenTools - All rights reserved.