ComfyUI is a node-based workflow interface for running diffusion models locally or on a remote GPU. Flux is the current state-of-the-art open-weight text-to-image model family from Black Forest Labs. Together they are the standard toolchain for engineers building AI image pipelines in 2026. This guide covers installation, model selection, and your first working workflow - with the specific decisions you need to make for your hardware.
Before you install: GPU requirements by Flux model variant
Flux comes in three quality tiers and multiple quantization levels. The variant you choose determines your VRAM requirement and your image quality. There is no single right answer - it depends on your GPU.
| Variant | Precision | VRAM required | Quality | Speed |
|---|---|---|---|---|
| Flux Dev FP16 | Full | 24 GB | Best | Slowest |
| Flux Dev Q8 | 8-bit quant | 16 GB | Near-identical to FP16 | Moderate |
| Flux Dev NF4 | 4-bit quant | 8 GB | Good, minor detail loss | Faster |
| Flux Schnell FP16 | Full, distilled | 24 GB | Good, 4-step | Very fast |
| Flux Schnell NF4 | 4-bit, distilled | 8 GB | Acceptable | Fastest |
Practical recommendation: if you have a 12 GB card (RTX 3080, 4070), use Flux Dev NF4. If you have 16 GB (RTX 4080, 3090), use Flux Dev Q8. If you have 24 GB or more, use Flux Dev FP16 for production-quality output.
Installing ComfyUI: the three-command path
ComfyUI runs on Linux, macOS, and Windows. The installation is the same across platforms. You need Python 3.10-3.12 and a CUDA-compatible GPU (NVIDIA) or Apple Silicon for MPS acceleration.
git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI
pip install -r requirements.txt
python main.pyThis starts the ComfyUI server on localhost:8188. The interface opens in your browser. You will see an empty canvas with a default workflow - delete it (Ctrl+A, Delete) and load the Flux workflow below.
One flag worth knowing on first launch: if you are on a machine with limited VRAM, add --lowvram or --novram to the main.py command. These tell ComfyUI to aggressively offload model components to system RAM between inference steps. Expect slower generation but successful runs on 8 GB cards that would otherwise OOM.
# For 8 GB VRAM cards
python main.py --lowvram
# For cards with less than 6 GB VRAM
python main.py --novramDownloading the Flux model files
Flux Dev requires three file downloads: the main transformer, the text encoders (CLIP-L and T5-XXL), and the VAE. These go in specific directories inside your ComfyUI installation.
# From your ComfyUI directory
# Main model - choose ONE based on your VRAM
# FP16 (24 GB)
wget -P models/unet/ https://huggingface.co/black-forest-labs/FLUX.1-dev/resolve/main/flux1-dev.safetensors
# NF4 quantized (8 GB) - community quantization by Kijai
wget -P models/unet/ https://huggingface.co/Kijai/flux-fp8/resolve/main/flux1-dev-fp8.safetensors
# Text encoders (same for all variants)
wget -P models/clip/ https://huggingface.co/comfyanonymous/flux_text_encoders/resolve/main/clip_l.safetensors
wget -P models/clip/ https://huggingface.co/comfyanonymous/flux_text_encoders/resolve/main/t5xxl_fp16.safetensors
# VAE
wget -P models/vae/ https://huggingface.co/black-forest-labs/FLUX.1-dev/resolve/main/ae.safetensorsNote: Flux Dev requires accepting the license agreement on Hugging Face before you can download. Log in to Hugging Face, accept the license on the FLUX.1-dev model page, then authenticate the CLI with huggingface-cli login. Flux Schnell does not require this - it is Apache 2.0 licensed and freely downloadable.
Your first workflow: the minimal Flux node graph
A working Flux text-to-image workflow in ComfyUI requires six node types: DualCLIPLoader (loads both text encoders), UnetLoader (loads the main transformer), VAELoader (loads the VAE), CLIPTextEncode (encodes your prompt), KSampler (runs the diffusion), and VAEDecode + SaveImage (decodes and saves the result).
The simplest way to get this workflow is to download the official Flux example from the ComfyUI GitHub repository examples folder, or load it from the ComfyUI Manager. Once loaded, you will see the node graph pre-connected. The only things you need to configure before your first run: select your downloaded model files in the loader nodes, and write your prompt in the CLIPTextEncode node.
Flux Dev sampler settings that produce reliable results: 20 steps, Euler sampler, Simple scheduler, CFG scale 1.0. Yes, CFG 1.0 - Flux does not benefit from classifier-free guidance the way SDXL does. Running it at CFG 3.5 or 7.0 (SDXL defaults people often copy over) will produce washed-out, oversaturated results. Keep it at 1.0 unless you have a specific reason to change it.
ComfyUI Manager: the package manager for custom nodes
ComfyUI by itself has a limited node set. Most production workflows rely on custom nodes from the community - additional preprocessing, ControlNet support, LoRA chaining, upscaling, and more. ComfyUI Manager is the package manager that installs and updates these.
# Install ComfyUI Manager
cd ComfyUI/custom_nodes
git clone https://github.com/ltdrdata/ComfyUI-Manager
# Restart ComfyUI - the Manager button appears in the interfaceWith Manager installed, you can install any custom node pack from the interface without touching the command line again. The most useful packs for Flux workflows: ComfyUI-Impact-Pack (segmentation, detailing), ComfyUI_essentials (utility nodes), and was-node-suite-comfyui (image processing). Install only what your workflow needs - each custom node pack is an additional dependency that can break on ComfyUI updates.
Next steps: from local to production
Once you have a working local Flux workflow in ComfyUI, the next decision is whether to keep running it locally or move it to a production environment with an API layer. Local is fine for personal use and experimentation. For anything serving external traffic - a product feature, a batch job, a client project - you need GPU infrastructure that is reliable, scalable, and has no cold start penalty when you need it.
ComfyUI as a Production API covers the architecture of exposing your workflow as a REST endpoint. GPU Provider Cost Comparison 2026 has current pricing across the major GPU clouds. ComfyUI Hosting 2026 compares managed hosting options for teams that do not want to operate GPU infrastructure themselves.
| Method | OS | Setup time | Best for |
|---|---|---|---|
| Git clone + pip | Linux / macOS / Windows | 15-30 min | Developers who want full control |
| ComfyUI portable (Windows) | Windows only | 5-10 min | Non-technical users on Windows |
| Docker image | Linux / macOS | 30-60 min | Reproducible environments, CI/CD |
| Managed (Runflow, ComfyDeploy) | Any (cloud) | 1-2 days | Teams skipping local GPU entirely |
Flux sampler settings that actually work
The default ComfyUI sampler settings (Euler, Karras, 20 steps, CFG 7.0) are tuned for SD 1.5 and will produce bad results with Flux. Flux needs different settings because it uses a rectified flow formulation rather than DDPM. The correct Flux Dev settings: Euler sampler, Simple scheduler, 20 steps, CFG 1.0. For Flux Schnell: Euler, Simple, 4 steps, CFG 1.0.
Image resolution: Flux is trained on multiple resolutions and handles non-square images well. The base training resolution is 1024x1024. For portrait images use 832x1216. For landscape use 1216x832. Unlike SDXL, you do not need to stay close to 1024px total - Flux at 1280x1280 still produces coherent results without the tile-repeat artifacts that plagued SD 1.5 at high resolutions.
LoRAs and ControlNets for Flux in 2026
The Flux LoRA ecosystem is growing rapidly in 2026 but is still smaller than SDXL's. For fine-tuning style, the most common approach is rank-16 to rank-64 LoRA training using SimpleTuner or OneTrainer. Key difference from SDXL training: Flux LoRAs train on the transformer layers directly, not on a UNet. Training 1,000 steps on 30-50 reference images typically produces usable style LoRAs.
ControlNet for Flux: as of May 2026, several ControlNet implementations exist for Flux including Canny, Depth, and Pose variants. The most stable are the XLabs-AI ControlNet models available on Hugging Face. Install via ComfyUI Manager as part of the ComfyUI-FluxControlNet node pack. Performance is slightly slower than SDXL ControlNets due to the larger model size.
Troubleshooting the most common Flux setup errors
Three errors appear in almost every first Flux setup. First: CUDA out of memory - usually caused by loading Flux Dev FP16 on a card with less than 24 GB VRAM. Fix: switch to the NF4 variant or add --lowvram to your launch command. Second: NaNsException (tensor with all NaNs) - typically caused by running Flux with CFG above 2.0 or using a sampler incompatible with rectified flow. Fix: set CFG to 1.0 and use Euler with Simple scheduler. Third: black or gray output images - usually caused by the VAE not loading correctly or a mismatch between the model and VAE versions. Fix: explicitly load the ae.safetensors VAE in a VAELoader node rather than relying on automatic detection.
If you are running on a machine without a GPU or with an unsupported GPU, ComfyUI will fall back to CPU inference. CPU inference for Flux Dev is extremely slow - expect 10-30 minutes per image. This is only useful for testing that the workflow configuration is correct, not for actual generation.
One final note on reproducibility: ComfyUI uses a seed value in the KSampler node to control randomness. Set a fixed seed during development to get consistent outputs while you tune other parameters. Switch to a random seed for production to generate variety across requests. Documenting your seed alongside the workflow JSON is good practice when sharing reproducible results.