Docker Model Runner screenshot

Docker Model Runner

By Docker
DeveloperApplicationFree

Docker Model Runner - Run AI Models Locally With Docker

Last updated May 7, 2026

Claim Tool

What is Docker Model Runner?

Docker Model Runner brings AI model inference directly into the Docker ecosystem. Instead of managing separate Python environments, CUDA installations, and model serving frameworks, you can pull and run AI models with the same docker commands you already know. Running a model is as simple as `docker model run ai/gemma3 "Hello"`. Docker Model Runner handles model downloading, GPU acceleration (including Apple Silicon, NVIDIA CUDA, and AMD ROCm), and serving through an OpenAI-compatible API endpoint. The 100K+ Docker Hub pulls show strong adoption among developers. Docker Model Runner is included in Docker Desktop for macOS and Windows, and in Docker Engine for Linux when installed from Docker official repositories. No separate installation needed for most users. Just verify with `docker model --help` and start running models. The tool supports models from Docker Hub's ai/ namespace and any OCI-compliant registry. It provides an OpenAI-compatible chat completions endpoint so existing applications can switch to local inference by changing the base URL. This makes it straightforward to develop against local models and deploy to cloud endpoints later. For developers building from source, the Go-based codebase builds with a single `make` command that produces the server, CLI plugin, and a convenience wrapper. The project has 560 GitHub stars, 1,868 commits, and active development with recent GPU support additions and E2E testing improvements. Docker Model Runner supports macOS (Apple Silicon GPU via Metal), Linux (NVIDIA CUDA and AMD ROCm), and Windows. It integrates with Docker Compose and Kubernetes through Helm charts for production deployments.

Docker Model Runner's Top Features

Key capabilities that make Docker Model Runner stand out.

Pull and run AI models with familiar docker commands

OpenAI-compatible chat completions API endpoint for easy integration

GPU acceleration: Apple Silicon Metal, NVIDIA CUDA, AMD ROCm

Included in Docker Desktop (macOS/Windows) and Docker Engine (Linux)

Models from Docker Hub ai/ namespace and OCI-compliant registries

Docker Compose and Kubernetes (Helm) deployment support

Single-command model execution: docker model run ai/gemma3

Automatic model downloading and caching from Docker Hub

Use Cases

Who benefits most from this tool.

Tags

dockerlocal-llmai-inferencegpu-accelerationmodel-servingopenai-compatiblellmcontainersapple-siliconnvidia-cuda

Docker Model Runner's Pricing

Free plan available
Docker logo
Dockerenterprise

We simplify the lives of developers building world-changing apps

Founded 2008Palo Alto, California, United States
View full profile

User Reviews

Share your thoughts

If you've used this product, share your thoughts with other builders

Recent reviews

Frequently Asked Questions

How is Docker Model Runner different from Ollama?
Docker Model Runner integrates directly into the Docker ecosystem. If you already use Docker Desktop, it is included with no separate installation. It uses Docker Hub as its model registry (same as container images) and provides an OpenAI-compatible API. Ollama is a standalone tool with its own model library. Both support local LLM inference with GPU acceleration.
Do I need a GPU to use Docker Model Runner?
No, Docker Model Runner works on CPU as well. However, GPU acceleration is supported on Apple Silicon (Metal), NVIDIA (CUDA), and AMD (ROCm) for significantly faster inference. The tool automatically detects and uses available GPU resources.
How do I install Docker Model Runner?
On macOS and Windows, install Docker Desktop and Docker Model Runner is included. On Linux, install Docker Engine from Docker official repositories (not distro packages). Verify with `docker model --help`. If using distro packages, you may need to reinstall from Docker official repos.
Can I use Docker Model Runner with my existing OpenAI-based application?
Yes. Docker Model Runner provides an OpenAI-compatible chat completions endpoint. Point your application base URL to the local Model Runner endpoint and your existing code works with local models instead of cloud APIs.