| Platform | Cold start (no snapshot) | Memory snapshot | Custom nodes | Pricing model |
|---|---|---|---|---|
| RunPod Serverless | 30–90s | Yes (pod templates) | Full | Per second of GPU time |
| Modal | 15–60s | Via @app.cls warmup | Full (custom containers) | Per second, ~$0.00022/s A10G |
| fal.ai | <5s (pre-warmed) | Yes - pre-loaded workers | Limited | $0.003/MP Flux Schnell |
| Replicate | 5–30s | Model-dependent | Limited (curated models) | $0.003/image Flux Schnell |
| Comfy Cloud (official) | 8–10s (snapshot) | Yes | Full | Contact for pricing |
Running ComfyUI serverless means you pay only when a workflow is executing. No idle GPU. No 3am restarts. The tradeoff is cold start latency - the time from request to first pixel depends entirely on which platform you choose and whether it has a model snapshot ready.
This guide covers the five main options with verified pricing and cold start data as of May 2026. Numbers are from platform documentation or independently reproduced benchmarks. Where data is unavailable, that is stated explicitly.
What 'Serverless' Means for ComfyUI
A serverless GPU worker:
- Starts when a request arrives (or warms from a snapshot)
- Runs the workflow
- Shuts down after a configurable idle timeout
- Bills only for GPU-seconds from worker start to shutdown
The key constraint: ComfyUI loads model weights into VRAM at startup. A Flux.1 fp8 checkpoint is ~12 GB. That load time is part of your cold start - and on most platforms, it is billed.
Memory snapshotting solves this. A snapshot captures VRAM state after model load, so the next cold start restores from the snapshot (seconds) rather than re-loading weights from disk (tens of seconds).
Platform Comparison
All prices are per GPU-second and verified from platform documentation as of May 2026. Cold start figures are for ComfyUI workflows specifically unless noted.
Platform | A100 $/sec | Cold start (no opt) | Cold start (optimized) | Custom nodes
-------------|--------------|---------------------|---------------------------|--------------
RunPod | $0.000760 | 6–12s | Under 2s (FlashBoot) | Full support
Modal | $0.000583 | 12–15s | Under 3s (memory snap) | Manual deps
fal.ai | $0.000275 | 5–10 min (1st boot) | keep_alive=300 (warm) | Not confirmed
Replicate | $0.001400 | 10–180s | None built-in | Popular pre-installed
Comfy Cloud | 0.266 cr/s | Not published | Not published | Approved nodes only
ComfyDeploy | Modal infra | ~12–15s (est.) | ~3s snap (Modal baseline) | Full, auto-detectedNote: ComfyDeploy pricing requires login; not publicly listed. Comfy Cloud credit-to-dollar conversion is not published in their pricing page as of May 2026.
RunPod Serverless
Best for: developers who want full Docker control and the widest GPU selection.
RunPod Serverless bills per-second from worker start through full stop - cold start time is included in the bill. Two worker modes:
- Flex: workers scale to zero. Lower availability SLA. RTX 4090: $0.00031/sec. A100 80GB: $0.00076/sec.
- Active: always-on workers. Lower rate. RTX 4090: $0.00021/sec. A100 80GB: $0.00060/sec.
The runpod-workers/worker-comfyui GitHub template handles workflow submission and custom node installation out of the box. Send a workflow JSON to RunPod's serverless endpoint; the worker submits it to the local ComfyUI instance, waits for completion, and returns the output.
# Submit a workflow to RunPod Serverless
curl -X POST https://api.runpod.io/v2/{endpoint_id}/run \
-H "Authorization: Bearer $RUNPOD_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"input": {
"workflow": { /* ComfyUI API JSON */ },
"images": []
}
}'Cold start without optimization: 6–12 seconds for large model containers. RunPod FlashBoot reduces this to under 2 seconds - it is a built-in feature, not a separate add-on.
Custom nodes: fully supported. Include nodes in your Docker image or list them for installation at worker startup via the template's COMFYUI_INSTALL_CUSTOM_NODES environment variable.
Modal
Best for: developers comfortable with Python-first infrastructure who want the lowest cold start with documented memory snapshotting.
Modal bills per GPU-second with no idle charge and no base fee. The Starter plan includes $30/month in free credits. Per-second rates: L4 $0.000222, A100 40GB $0.000583, H100 $0.001097.
Modal's memory snapshotting for ComfyUI achieves under 3 seconds cold start. The technique: force CPU mode during the snapshot phase (override torch.cuda.is_available() to return False), run ComfyUI through model load, snapshot VRAM state, then restore GPU access at runtime. Independently reproduced - not just marketing numbers.
import modal, torch
# Patch CUDA detection so snapshot captures CPU-loaded weights
original_is_available = torch.cuda.is_available
torch.cuda.is_available = lambda: False
app = modal.App("comfyui-worker")
comfyui_image = (
modal.Image.debian_slim()
.pip_install(["torch", "torchvision", "torchaudio"])
.run_commands("git clone https://github.com/comfyanonymous/ComfyUI /app")
.run_commands("cd /app && pip install -r requirements.txt")
)
@app.cls(
image=comfyui_image,
gpu="A100",
enable_memory_snapshot=True, # key flag
container_idle_timeout=300,
)
class ComfyUIWorker:
@modal.enter(snap=True) # runs during snapshot phase (CPU only)
def load_models(self):
# Start ComfyUI, load models into CPU RAM for snapshot
import subprocess
subprocess.Popen(["python3", "main.py", "--cpu", "--listen"])
@modal.enter() # runs after snapshot restore (GPU available)
def restore_gpu(self):
torch.cuda.is_available = original_is_available
@modal.method()
def run_workflow(self, workflow: dict) -> bytes:
# Submit to local ComfyUI, poll, return output bytes
...Custom nodes: supported, but you manage dependencies manually in the Modal image definition. Version conflicts between nodes must be resolved in the Dockerfile - no auto-detection.
fal.ai
Best for: teams already using fal.ai's model marketplace who want to add custom ComfyUI workflows.
Pricing (verified from fal.ai/pricing, May 2026): fal.ai offers four GPU tiers for custom deployments:
- A100 40 GB: $0.99/hr - $0.000275/sec
- H100 80 GB: $1.89/hr - $0.000525/sec
- H200 141 GB: $2.10/hr - $0.000583/sec
- B200 184 GB: contact sales
No T4, L4, A10G, or A100 80GB tier is offered for custom deployments. If your workflow requires less than 40 GB VRAM, fal.ai is not the most cost-effective option - RunPod and Modal both offer smaller GPU tiers.
fal.ai supports ComfyUI workflows via POST https://fal.run/fal-ai/comfy-server. Send your ComfyUI API JSON as the request body. Model weights are stored in persistent /data storage (not baked into the Docker image), which reduces image pull latency on cold start.
import fal_client
# Submit ComfyUI workflow to fal.ai
result = fal_client.submit(
"fal-ai/comfy-server",
arguments={
"workflow": { /* ComfyUI API JSON */ },
"keep_alive": 300, # keep worker warm for 5 minutes after completion
}
)The keep_alive parameter is the main cold start mitigation. After a job completes, the worker stays live for the configured seconds. Subsequent requests hit a warm worker. Without it, every request may cold start.
Cold start (official fal.ai docs): the first cold start for a ComfyUI deployment on fal.ai takes 5–10 minutes - the runner downloads model weights and installs custom nodes at container boot. This is not a typo. fal.ai's ComfyUI runner does not use pre-warmed memory snapshots. From their documentation: "Runner stuck in SETUP for several minutes - Normal on first cold start while weights download (5–10 min)." Subsequent runs on a warm instance execute in seconds.
The keep_alive parameter (default 0) controls how long the worker stays warm after a job finishes. Set keep_alive=300 to keep the worker alive for 5 minutes - subsequent requests within that window skip the cold start entirely. For low-traffic deployments, the economics of 5-10 minute cold starts may be acceptable. For user-facing applications, keep_alive is essential.
Custom nodes: support for arbitrary custom nodes when deploying your own ComfyUI to fal is not confirmed in official documentation as of May 2026. Verify before building a workflow that depends on non-standard nodes.
Replicate
Best for: teams that need popular pre-installed ComfyUI nodes without Docker image management.
Replicate bills per-second from initialization through completion. L40S: $0.000975/sec. A100 80GB: $0.001400/sec. H100: $0.001525/sec. Note: Replicate was acquired by Cloudflare in early 2026 and operates as an independent brand with unchanged APIs.
The fastest path to running a ComfyUI workflow on Replicate is fofr/any-comfyui-workflow - send your workflow JSON directly. Popular nodes are pre-installed: IPAdapter Plus, ControlNet Aux, AnimateDiff Evolved, VideoHelperSuite, Efficiency Nodes, and others.
For custom nodes not in the pre-installed list, provide a custom_nodes.json file specifying the repo URL and commit hash. Replicate rebuilds the container with those nodes added. This is a one-time cost per node version.
Cold start: 10–180 seconds depending on model size. Replicate has no built-in snapshotting mechanism. For latency-sensitive applications, keep a warm instance via periodic pings or use an Active worker.
Pricing note: at $0.0014/sec, an A100 on Replicate costs $5.04/hr - significantly more than RunPod ($2.74/hr Flex) or Modal ($2.10/hr) for the same hardware. The premium is for the managed workflow and pre-installed node ecosystem.
Comfy Cloud (Official)
Comfy Cloud (comfy.org/cloud) is the official ComfyUI cloud, operated by the same team behind the ComfyUI open-source project.
- GPU: Blackwell RTX Pro 6000, 96 GB VRAM - roughly 2× faster than A100 for most workloads
- Billing: 0.266 credits/GPU-second (reduced 30% from 0.39 credits/second in January 2026)
- Free tier: 400 credits/month, no credit card required. Standard plan ~$20/month.
- Custom nodes: limited to nodes approved/available on the platform - no arbitrary git repos
The credit-to-dollar conversion rate is not published on their pricing page as of May 2026. Check the official blog (blog.comfy.org) for current rates.
Cold Start in Practice: Memory Snapshotting
The platforms that offer memory snapshotting - ViewComfy (one-click), Modal (code-level), RunPod (FlashBoot, built-in), ComfyDeploy (one-click) - all achieve under 3–10 seconds for ComfyUI cold starts. Without snapshotting, cold starts for production ComfyUI workflows with large models range from 12 seconds (RunPod, Modal) to over 2 minutes (Replicate worst case).
If cold start latency matters for your product - anything where users wait for a response - choose a platform with snapshotting and measure your specific workflow. The delta between snapshot and no-snapshot is 5–15× depending on model size.
When to Use Serverless vs a Persistent Instance
Use serverless when:
- Traffic is bursty - jobs come in batches with idle periods between
- You don't want to manage GPU scaling or restart crashed workers
- Total GPU-hours per month is under ~600 hours (serverless per-second pricing becomes expensive above this)
Use a persistent instance (VPS or dedicated GPU) when:
- Queue is always non-empty - the GPU is busy essentially 24/7
- You need VRAM to persist between jobs (model stays loaded, no reload penalty)
- Cold start is completely unacceptable - always-on workers have no cold start
The crossover point for RunPod: an RTX 4090 Active worker at $0.00021/sec costs $0.756/hr - close to the Flex on-demand GPU rate for the same card. At full utilization, a persistent RunPod GPU pod ($0.44–$0.74/hr depending on contract) is cheaper than serverless. Serverless is cost-effective when utilization is below ~50%.