ComfyUI was designed as a local desktop app. Running it in Docker for production requires solving three problems most tutorials ignore: GPU passthrough, model volume strategy (SDXL checkpoints alone are 7 GB), and the single flag that makes or breaks network connectivity.
Prerequisites
- NVIDIA GPU with CUDA 11.8+
- Docker Engine 19.03+ (for the --gpus flag)
- NVIDIA Container Toolkit (formerly nvidia-docker2) - this is the layer that lets Docker containers access the host GPU through the NVIDIA driver. Without it, --gpus all silently has no effect. Installation docs: docs.nvidia.com/datacenter/cloud-native/container-toolkit
Installing the NVIDIA Container Toolkit
The NVIDIA Container Toolkit must be installed on the Docker host (not inside the container). These are the official installation commands for Ubuntu:
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey \
| sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list \
| sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' \
| sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart dockerVerify the toolkit is working correctly before building your ComfyUI image:
docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smiYou should see your GPU listed in the nvidia-smi table output. If the command fails with "could not select device driver", the Container Toolkit is not correctly installed or the Docker daemon was not restarted.
The Dockerfile
Build from the official ComfyUI repository (github.com/comfyanonymous/ComfyUI), not from an unofficial Docker Hub image. Unofficial images may lag behind releases, bundle outdated dependencies, or modify default behavior in ways that are hard to debug.
FROM nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu22.04
RUN apt-get update && apt-get install -y \
python3 python3-pip git \
libglib2.0-0 libsm6 libxrender1 libxext6 \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /app
RUN git clone https://github.com/comfyanonymous/ComfyUI.git .
RUN pip3 install --no-cache-dir torch torchvision torchaudio \
--index-url https://download.pytorch.org/whl/cu121
RUN pip3 install --no-cache-dir -r requirements.txt
VOLUME ["/app/models", "/app/output", "/app/custom_nodes"]
EXPOSE 8188
CMD ["python3", "main.py", "--listen", "--port", "8188"]Build the image:
docker build -t comfyui:production .The --listen Flag: The Most Common Mistake
By default, ComfyUI binds to 127.0.0.1 (loopback only). Inside a Docker container, this means the port is unreachable from outside the container - even with -p 8188:8188 in your docker run command. The port mapping exists at the Docker network level, but the application refuses connections that do not originate from localhost inside the container itself.
The --listen flag changes the bind address to 0.0.0.0 (all interfaces), making ComfyUI accessible through the Docker network bridge. This is confirmed in ComfyUI's source: github.com/comfyanonymous/ComfyUI/blob/master/main.py - look for the address argument in the server startup code.
This is the single most common reason people report that ComfyUI "doesn't work in Docker": the container starts, the port is mapped, but every connection times out. Adding --listen to the CMD fixes it.
Docker Compose for Production
Docker Compose handles the full configuration in one file: GPU reservation, volume mounts, restart policy, and health checks.
services:
comfyui:
build: .
restart: unless-stopped
ports:
- "127.0.0.1:8188:8188"
volumes:
- ./models:/app/models
- ./output:/app/output
- ./custom_nodes:/app/custom_nodes
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
healthcheck:
test: ["CMD", "python3", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:8188/system_stats')"]
interval: 30s
timeout: 10s
retries: 3
start_period: 20sThe start_period: 20s gives ComfyUI time to load models into VRAM before the health check begins. Without it, the container may be marked unhealthy during a normal cold start.
Smoke Test the Container
After the stack is up, verify the host bind, GPU visibility, and queue endpoint before you hand the machine to users.
docker compose up -d --build
docker compose ps
curl -fsS http://127.0.0.1:8188/system_stats
curl -fsS http://127.0.0.1:8188/queueIf the first curl fails, fix the bind address or the GPU runtime before exposing the service. If /queue responds but /system_stats does not, the Python process is up but ComfyUI is not ready yet.
Model Volume Strategy
Do not bake models into the Docker image. A single SDXL 1.0 checkpoint is approximately 7 GB (fp16). Flux.1 [dev] in fp8 format is approximately 12 GB. Baking them into the image means multi-hour rebuilds on every code change and images too large to push to a container registry efficiently.
The correct strategy:
- Mount models from the host: ./models:/app/models in your Compose file
- For multi-instance deployments, use a shared network volume (NFS or a managed file storage service) so all instances read from the same model store
- Download models to the host before starting the container - not inside the container
- ComfyUI's model directory structure: checkpoints/, vae/, loras/, controlnet/, upscale_models/ - your host-side layout must match exactly
Model downloads can be automated with a separate download script that runs before docker compose up, or with a one-off container that populates the volume. Either approach keeps model management separate from the application image lifecycle.
| Model | Format | Size on disk | VRAM required |
|---|---|---|---|
| SDXL 1.0 base | fp16 | 6.9 GB | 8 GB+ |
| Flux.1 [dev] | fp8 | 12 GB | 12 GB+ |
| Flux.1 [schnell] | fp8 | 12 GB | 12 GB+ |
| SDXL ControlNet | fp16 | 2.5 GB | 6 GB+ |
| SD 1.5 | fp16 | 2.0 GB | 4 GB+ |
Security: Never Expose Port 8188 Directly
Docker bypasses most host firewall rules. When you publish a port with -p 8188:8188, Docker adds an iptables rule that lets traffic reach the container even if UFW or firewalld blocks it. Anyone who can reach your machine's IP can call the ComfyUI API - including queue, interrupt, and downloading every image ever generated.
The fix is two lines: bind to loopback at the Docker level, then put Nginx in front for TLS and any auth you need.
# docker-compose.yml - correct port binding
services:
comfyui:
image: comfyui:production
ports:
- "127.0.0.1:8188:8188" # loopback only - NOT 0.0.0.0:8188:8188
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]Then point Nginx at 127.0.0.1:8188 and add your auth layer there (see the ComfyUI as a Production API guide for the full Nginx config with X-API-Key validation). The WebSocket endpoint /ws needs the Upgrade: websocket header forwarded - include it in your proxy_set_header block or real-time events will silently drop.
Troubleshooting: The Five Most Common Docker Failures
CUDA not available inside the container
Symptom: ComfyUI starts but falls back to CPU. The log says "No CUDA runtime is found" or models load with device: cpu.
- Check the NVIDIA Container Toolkit is installed on the host: run nvidia-container-cli info - it should print your GPU model.
- Verify the Docker runtime: docker run --rm --gpus all nvidia/cuda:12.1-base-ubuntu22.04 nvidia-smi should print the GPU table.
- Check your Docker image uses a CUDA base: the FROM line should be nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu22.04 or similar, not plain Ubuntu.
- Ensure the Compose file has capabilities: [gpu] under device reservations - without it, the runtime passes no GPU even with --gpus all.
Out of memory (OOM) - container killed or CUDA OOM error
Symptom: execution crashes mid-generation with CUDA out of memory or the container is killed silently.
- Model too large for VRAM: Flux.1 fp16 needs ~24 GB VRAM. Use the fp8 quantized version (12 GB) or SDXL (8 GB+). Run nvidia-smi inside the container to see actual VRAM usage per process.
- No VRAM eviction: add --lowvram or --medvram flags to the CMD if the GPU has less than 12 GB. This offloads model parts to CPU RAM between steps.
- System RAM OOM: if the container is killed without a CUDA error, the Linux OOM killer hit it. Add mem_limit: 24g to the Compose service to cap RAM and get a clearer error.
# Check VRAM live inside the container
docker exec -it comfyui nvidia-smi --query-gpu=memory.used,memory.free --format=csvModel not loading - FileNotFoundError or grey node
Symptom: a node turns grey or red on first run, log shows FileNotFoundError for a checkpoint path.
- The model file is not in the mounted volume. Check: docker exec comfyui ls /app/models/checkpoints/. If empty, the host path in your VOLUME mount is wrong or the model was never downloaded.
- Path case mismatch: Linux is case-sensitive. SDXL_base.safetensors ≠ sdxl_base.safetensors. The filename in your workflow JSON must match exactly.
- Custom nodes missing: some nodes require Python packages not in the base image. Mount a custom_nodes/ volume and run pip install -r requirements.txt for each node at container start, or bake them into a derived image.
Container starts but /system_stats returns 502
ComfyUI takes 20-40 seconds to load the first model into VRAM after the Python process starts. Your healthcheck start_period must be at least this long, and your reverse proxy should retry on 502 rather than immediately returning an error to the client.
Slow build times - image rebuild takes 8+ minutes
Usually caused by reinstalling PyTorch on every build because the pip install layer is invalidated. Pin your requirements to a hash and put pip install before the COPY . step:
FROM nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu22.04
# Install Python deps first (this layer is cached unless requirements.txt changes)
COPY requirements.txt /tmp/requirements.txt
RUN pip install --no-cache-dir -r /tmp/requirements.txt
# Copy source code second (invalidates only the layers above this)
COPY . /app
WORKDIR /appCI/CD: Rebuilding Without Re-downloading Models
Never bake models into the Docker image. An SDXL checkpoint is 7 GB - baking it means every code change triggers a 7 GB image push and a multi-minute pull on the production host. The correct pattern separates the image (code + Python deps) from the data (models).
Keep models on a persistent volume that survives container restarts and image updates:
# docker-compose.prod.yml
services:
comfyui:
image: your-registry/comfyui:${IMAGE_TAG}
volumes:
- comfyui_models:/app/models # persists across rebuilds
- comfyui_output:/app/output
- comfyui_custom_nodes:/app/custom_nodes
ports:
- "127.0.0.1:8188:8188"
restart: unless-stopped
volumes:
comfyui_models:
driver: local
driver_opts:
type: none
device: /data/comfyui/models # host path on the GPU server
o: bind
comfyui_output:
driver: local
comfyui_custom_nodes:
driver: localYour CI pipeline builds and pushes only the code image (seconds to minutes). Model management is a separate, manual step run once per model: download the .safetensors file directly to the host volume path. The container never touches model download logic.
For multi-host deployments, mount the models volume from shared storage (NFS, EFS, or a provider-specific block volume) so all GPU workers share the same model files without redundant downloads.
Health Checks and Monitoring
ComfyUI exposes a GET /system_stats endpoint that returns current RAM usage, VRAM usage, and device info. Use /queue for backlog and /system_stats for liveness checks.
curl http://localhost:8188/system_statsA successful response looks like this:
{
"system": {
"os": "posix",
"python_version": "3.11.0 (main, ...) [GCC 11.3.0]",
"embedded_python": false
},
"devices": [
{
"name": "NVIDIA RTX 4090",
"type": "cuda",
"index": 0,
"vram_total": 25769803776,
"vram_free": 18000000000,
"torch_vram_total": 25769803776,
"torch_vram_free": 18000000000
}
]
}Monitor vram_free over time - a gradual decrease that does not recover after jobs complete indicates a VRAM leak in a custom node. Set up an alert if vram_free drops below 10% of vram_total.
Authentication: What's Not Included
ComfyUI has no built-in authentication. Do not expose port 8188 directly to the internet. The standard production pattern is a reverse proxy (nginx or Caddy) in front of ComfyUI that handles authentication before forwarding requests.
A minimal nginx configuration with HTTP Basic Auth:
# Generate the password file (run once)
htpasswd -c /etc/nginx/.htpasswd your_usernameserver {
listen 80;
server_name your-domain.com;
location / {
auth_basic "ComfyUI";
auth_basic_user_file /etc/nginx/.htpasswd;
proxy_pass http://localhost:8188;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $host;
proxy_read_timeout 300s;
}
}For API use with token-based authentication, use nginx with the ngx_http_auth_jwt_module or place an API gateway (Kong, Traefik, AWS API Gateway) in front of ComfyUI. HTTP Basic Auth is sufficient for personal or small-team use but is not appropriate for multi-tenant production.