// deploy · comfyui-deploy

ComfyUI Docker: The Production Setup Guide

How to run ComfyUI in Docker for production: the --listen flag most guides skip, volume strategy for large models, GPU passthrough, and health checks.

Updated 2026-05-08comfyui dockercomfyui docker productiondeploy comfyui
~3 min
Typical first-pull startup time for a ComfyUI Docker image on a fresh GPU instance - most of that is pulling the image layers, not starting ComfyUI itself.
workflow/lab estimate based on standard ComfyUI images

ComfyUI was designed as a local desktop app. Running it in Docker for production requires solving three problems most tutorials ignore: GPU passthrough, model volume strategy (SDXL checkpoints alone are 7 GB), and the single flag that makes or breaks network connectivity.

Prerequisites

  • NVIDIA GPU with CUDA 11.8+
  • Docker Engine 19.03+ (for the --gpus flag)
  • NVIDIA Container Toolkit (formerly nvidia-docker2) - this is the layer that lets Docker containers access the host GPU through the NVIDIA driver. Without it, --gpus all silently has no effect. Installation docs: docs.nvidia.com/datacenter/cloud-native/container-toolkit

Installing the NVIDIA Container Toolkit

The NVIDIA Container Toolkit must be installed on the Docker host (not inside the container). These are the official installation commands for Ubuntu:

$bash
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey \
  | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list \
  | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' \
  | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Verify the toolkit is working correctly before building your ComfyUI image:

$bash
docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi

You should see your GPU listed in the nvidia-smi table output. If the command fails with "could not select device driver", the Container Toolkit is not correctly installed or the Docker daemon was not restarted.

The Dockerfile

Build from the official ComfyUI repository (github.com/comfyanonymous/ComfyUI), not from an unofficial Docker Hub image. Unofficial images may lag behind releases, bundle outdated dependencies, or modify default behavior in ways that are hard to debug.

$Dockerfile
FROM nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu22.04

RUN apt-get update && apt-get install -y \
    python3 python3-pip git \
    libglib2.0-0 libsm6 libxrender1 libxext6 \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /app
RUN git clone https://github.com/comfyanonymous/ComfyUI.git .
RUN pip3 install --no-cache-dir torch torchvision torchaudio \
    --index-url https://download.pytorch.org/whl/cu121
RUN pip3 install --no-cache-dir -r requirements.txt

VOLUME ["/app/models", "/app/output", "/app/custom_nodes"]

EXPOSE 8188

CMD ["python3", "main.py", "--listen", "--port", "8188"]

Build the image:

$bash
docker build -t comfyui:production .

The --listen Flag: The Most Common Mistake

By default, ComfyUI binds to 127.0.0.1 (loopback only). Inside a Docker container, this means the port is unreachable from outside the container - even with -p 8188:8188 in your docker run command. The port mapping exists at the Docker network level, but the application refuses connections that do not originate from localhost inside the container itself.

The --listen flag changes the bind address to 0.0.0.0 (all interfaces), making ComfyUI accessible through the Docker network bridge. This is confirmed in ComfyUI's source: github.com/comfyanonymous/ComfyUI/blob/master/main.py - look for the address argument in the server startup code.

This is the single most common reason people report that ComfyUI "doesn't work in Docker": the container starts, the port is mapped, but every connection times out. Adding --listen to the CMD fixes it.

Docker Compose for Production

Docker Compose handles the full configuration in one file: GPU reservation, volume mounts, restart policy, and health checks.

$docker-compose.yml
services:
  comfyui:
    build: .
    restart: unless-stopped
    ports:
      - "127.0.0.1:8188:8188"
    volumes:
      - ./models:/app/models
      - ./output:/app/output
      - ./custom_nodes:/app/custom_nodes
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    healthcheck:
      test: ["CMD", "python3", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:8188/system_stats')"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 20s

The start_period: 20s gives ComfyUI time to load models into VRAM before the health check begins. Without it, the container may be marked unhealthy during a normal cold start.

Smoke Test the Container

After the stack is up, verify the host bind, GPU visibility, and queue endpoint before you hand the machine to users.

$smoke-test.sh
docker compose up -d --build
docker compose ps
curl -fsS http://127.0.0.1:8188/system_stats
curl -fsS http://127.0.0.1:8188/queue

If the first curl fails, fix the bind address or the GPU runtime before exposing the service. If /queue responds but /system_stats does not, the Python process is up but ComfyUI is not ready yet.

Model Volume Strategy

Do not bake models into the Docker image. A single SDXL 1.0 checkpoint is approximately 7 GB (fp16). Flux.1 [dev] in fp8 format is approximately 12 GB. Baking them into the image means multi-hour rebuilds on every code change and images too large to push to a container registry efficiently.

The correct strategy:

  • Mount models from the host: ./models:/app/models in your Compose file
  • For multi-instance deployments, use a shared network volume (NFS or a managed file storage service) so all instances read from the same model store
  • Download models to the host before starting the container - not inside the container
  • ComfyUI's model directory structure: checkpoints/, vae/, loras/, controlnet/, upscale_models/ - your host-side layout must match exactly

Model downloads can be automated with a separate download script that runs before docker compose up, or with a one-off container that populates the volume. Either approach keeps model management separate from the application image lifecycle.

Model size reference - plan your VOLUME mounts accordingly
ModelFormatSize on diskVRAM required
SDXL 1.0 basefp166.9 GB8 GB+
Flux.1 [dev]fp812 GB12 GB+
Flux.1 [schnell]fp812 GB12 GB+
SDXL ControlNetfp162.5 GB6 GB+
SD 1.5fp162.0 GB4 GB+

Security: Never Expose Port 8188 Directly

Docker bypasses most host firewall rules. When you publish a port with -p 8188:8188, Docker adds an iptables rule that lets traffic reach the container even if UFW or firewalld blocks it. Anyone who can reach your machine's IP can call the ComfyUI API - including queue, interrupt, and downloading every image ever generated.

The fix is two lines: bind to loopback at the Docker level, then put Nginx in front for TLS and any auth you need.

$docker-compose.yml
# docker-compose.yml - correct port binding
services:
  comfyui:
    image: comfyui:production
    ports:
      - "127.0.0.1:8188:8188"   # loopback only - NOT 0.0.0.0:8188:8188
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

Then point Nginx at 127.0.0.1:8188 and add your auth layer there (see the ComfyUI as a Production API guide for the full Nginx config with X-API-Key validation). The WebSocket endpoint /ws needs the Upgrade: websocket header forwarded - include it in your proxy_set_header block or real-time events will silently drop.

NOTE
If you use RunPod, Vast.ai or any cloud GPU provider: their network tab shows open ports. Always bind to 127.0.0.1 and use the provider's built-in port forwarding or SSH tunnel instead of exposing 8188 publicly.

Troubleshooting: The Five Most Common Docker Failures

CUDA not available inside the container

Symptom: ComfyUI starts but falls back to CPU. The log says "No CUDA runtime is found" or models load with device: cpu.

  • Check the NVIDIA Container Toolkit is installed on the host: run nvidia-container-cli info - it should print your GPU model.
  • Verify the Docker runtime: docker run --rm --gpus all nvidia/cuda:12.1-base-ubuntu22.04 nvidia-smi should print the GPU table.
  • Check your Docker image uses a CUDA base: the FROM line should be nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu22.04 or similar, not plain Ubuntu.
  • Ensure the Compose file has capabilities: [gpu] under device reservations - without it, the runtime passes no GPU even with --gpus all.

Out of memory (OOM) - container killed or CUDA OOM error

Symptom: execution crashes mid-generation with CUDA out of memory or the container is killed silently.

  • Model too large for VRAM: Flux.1 fp16 needs ~24 GB VRAM. Use the fp8 quantized version (12 GB) or SDXL (8 GB+). Run nvidia-smi inside the container to see actual VRAM usage per process.
  • No VRAM eviction: add --lowvram or --medvram flags to the CMD if the GPU has less than 12 GB. This offloads model parts to CPU RAM between steps.
  • System RAM OOM: if the container is killed without a CUDA error, the Linux OOM killer hit it. Add mem_limit: 24g to the Compose service to cap RAM and get a clearer error.
$bash
# Check VRAM live inside the container
docker exec -it comfyui nvidia-smi --query-gpu=memory.used,memory.free --format=csv

Model not loading - FileNotFoundError or grey node

Symptom: a node turns grey or red on first run, log shows FileNotFoundError for a checkpoint path.

  • The model file is not in the mounted volume. Check: docker exec comfyui ls /app/models/checkpoints/. If empty, the host path in your VOLUME mount is wrong or the model was never downloaded.
  • Path case mismatch: Linux is case-sensitive. SDXL_base.safetensorssdxl_base.safetensors. The filename in your workflow JSON must match exactly.
  • Custom nodes missing: some nodes require Python packages not in the base image. Mount a custom_nodes/ volume and run pip install -r requirements.txt for each node at container start, or bake them into a derived image.

Container starts but /system_stats returns 502

ComfyUI takes 20-40 seconds to load the first model into VRAM after the Python process starts. Your healthcheck start_period must be at least this long, and your reverse proxy should retry on 502 rather than immediately returning an error to the client.

Slow build times - image rebuild takes 8+ minutes

Usually caused by reinstalling PyTorch on every build because the pip install layer is invalidated. Pin your requirements to a hash and put pip install before the COPY . step:

$Dockerfile
FROM nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu22.04

# Install Python deps first (this layer is cached unless requirements.txt changes)
COPY requirements.txt /tmp/requirements.txt
RUN pip install --no-cache-dir -r /tmp/requirements.txt

# Copy source code second (invalidates only the layers above this)
COPY . /app
WORKDIR /app

CI/CD: Rebuilding Without Re-downloading Models

Never bake models into the Docker image. An SDXL checkpoint is 7 GB - baking it means every code change triggers a 7 GB image push and a multi-minute pull on the production host. The correct pattern separates the image (code + Python deps) from the data (models).

Keep models on a persistent volume that survives container restarts and image updates:

$docker-compose.prod.yml
# docker-compose.prod.yml
services:
  comfyui:
    image: your-registry/comfyui:${IMAGE_TAG}
    volumes:
      - comfyui_models:/app/models       # persists across rebuilds
      - comfyui_output:/app/output
      - comfyui_custom_nodes:/app/custom_nodes
    ports:
      - "127.0.0.1:8188:8188"
    restart: unless-stopped

volumes:
  comfyui_models:
    driver: local
    driver_opts:
      type: none
      device: /data/comfyui/models       # host path on the GPU server
      o: bind
  comfyui_output:
    driver: local
  comfyui_custom_nodes:
    driver: local

Your CI pipeline builds and pushes only the code image (seconds to minutes). Model management is a separate, manual step run once per model: download the .safetensors file directly to the host volume path. The container never touches model download logic.

For multi-host deployments, mount the models volume from shared storage (NFS, EFS, or a provider-specific block volume) so all GPU workers share the same model files without redundant downloads.

Health Checks and Monitoring

ComfyUI exposes a GET /system_stats endpoint that returns current RAM usage, VRAM usage, and device info. Use /queue for backlog and /system_stats for liveness checks.

$bash
curl http://localhost:8188/system_stats

A successful response looks like this:

$json
{
  "system": {
    "os": "posix",
    "python_version": "3.11.0 (main, ...) [GCC 11.3.0]",
    "embedded_python": false
  },
  "devices": [
    {
      "name": "NVIDIA RTX 4090",
      "type": "cuda",
      "index": 0,
      "vram_total": 25769803776,
      "vram_free": 18000000000,
      "torch_vram_total": 25769803776,
      "torch_vram_free": 18000000000
    }
  ]
}

Monitor vram_free over time - a gradual decrease that does not recover after jobs complete indicates a VRAM leak in a custom node. Set up an alert if vram_free drops below 10% of vram_total.

Authentication: What's Not Included

ComfyUI has no built-in authentication. Do not expose port 8188 directly to the internet. The standard production pattern is a reverse proxy (nginx or Caddy) in front of ComfyUI that handles authentication before forwarding requests.

A minimal nginx configuration with HTTP Basic Auth:

$nginx-setup.sh
# Generate the password file (run once)
htpasswd -c /etc/nginx/.htpasswd your_username
$nginx.conf
server {
    listen 80;
    server_name your-domain.com;

    location / {
        auth_basic "ComfyUI";
        auth_basic_user_file /etc/nginx/.htpasswd;

        proxy_pass http://localhost:8188;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header Host $host;
        proxy_read_timeout 300s;
    }
}

For API use with token-based authentication, use nginx with the ngx_http_auth_jwt_module or place an API gateway (Kong, Traefik, AWS API Gateway) in front of ComfyUI. HTTP Basic Auth is sufficient for personal or small-team use but is not appropriate for multi-tenant production.

Frequently Asked Questions

Does --listen expose ComfyUI to the internet?

No. --listen changes the bind address from 127.0.0.1 to 0.0.0.0 (all network interfaces on the host), making it accessible within your Docker network. To expose it to the internet you still need to open the port in your firewall and configure a reverse proxy with authentication. Never expose port 8188 directly to a public network.

How much disk space do I need for models?

Plan for at least 20–30 GB per GPU workload. A single SDXL 1.0 checkpoint is ~7 GB (fp16), a VAE adds ~300 MB, LoRAs are typically 100–400 MB each, and Flux.1 [dev] in fp8 format is ~12 GB. Mount a dedicated volume with plenty of headroom - model storage grows quickly.

Can I run ComfyUI without an NVIDIA GPU?

Yes, ComfyUI supports CPU-only mode (pass --cpu flag) and Apple Silicon MPS. CPU inference is 10–50× slower depending on the model - usable for testing but not for production workloads. For AMD GPUs, ROCm support exists but is less tested than CUDA.

How do I update ComfyUI without rebuilding the full image?

Two options: (1) Use a volume mount for the ComfyUI source code and run git pull inside the container, or (2) rebuild the image without the --no-cache flag so Docker reuses the CUDA and Python layers. The requirements.txt layer will only re-run if requirements change. Option 2 is simpler and keeps the image reproducible.

What's the difference between the Docker Compose deploy.resources config and --gpus all?

The deploy.resources section is the Compose v3 way to request GPUs, equivalent to docker run --gpus all. Both require the NVIDIA Container Toolkit installed on the host. Use deploy.resources in Compose files; use --gpus in docker run commands. The result is identical: the NVIDIA runtime is activated for that container.

How do I persist custom nodes across container restarts?

Mount a volume to /app/ComfyUI/custom_nodes (or wherever your ComfyUI installation is). In Docker Compose: add "- ./custom_nodes:/app/ComfyUI/custom_nodes" under volumes. Custom nodes installed in a running container without a volume mount are lost on restart. Alternatively, install custom nodes in your Dockerfile so they are baked into the image - this is the production-recommended approach for reproducibility.

What is the recommended base image for a production ComfyUI Docker setup?

The official NVIDIA CUDA base images (nvidia/cuda:12.x-cudnn-devel-ubuntu22.04) are the standard starting point. ComfyUI itself recommends Python 3.11+. The ComfyUI GitHub repo maintains a Dockerfile that can serve as a reference. For production, pin both the CUDA version and the ComfyUI git commit to prevent unexpected behavior from upstream changes.

Why does my container pass the smoke test but fail in production?

The most common cause is model availability - the smoke test uses whichever models happen to be loaded, but production workflows reference specific model filenames. Verify that all model files referenced in your workflow JSON exist in the mounted model volume. The second most common cause is the --listen flag: ComfyUI must start with --listen 0.0.0.0 to accept connections from outside the container. Without it, it binds to 127.0.0.1 and is unreachable from your reverse proxy.