NaNsException: Tensor With All NaNs Was Produced is one of the more confusing ComfyUI errors because it can have five completely different causes, each requiring a different fix. This guide covers systematic diagnosis so you identify the root cause in your specific setup rather than trying fixes at random.
What NaNs Actually Mean
NaN stands for "Not a Number" - a floating-point value that results from invalid operations like dividing by zero or taking the square root of a negative number. In neural network inference, NaN values propagate: once a single NaN appears in a tensor, all subsequent operations produce NaN. By the time ComfyUI reports the error, the original source may be several layers back.
The error is thrown by ComfyUI when it detects that the output tensor from a sampler step contains exclusively NaN values. This is a safety check added in a 2023 update - older versions would silently generate black or noise images instead of crashing.
| Cause | Probability | Affects | Quick test |
|---|---|---|---|
| Wrong VAE for model | High | Flux + custom VAEs | Use bundled VAE |
| FP16 overflow | High | Flux on older GPUs | Switch to bf16 or fp32 |
| CFG scale too high | Medium | Flux specifically | Set CFG to 1.0 |
| Corrupt checkpoint | Medium | Any model | Re-download and verify hash |
| Wrong sampler/scheduler | Medium | Flux schnell | Use euler + simple |
Cause 1: Wrong VAE
Flux uses a custom 16-channel VAE, different from the standard SD 1.5 4-channel VAE and the SDXL 4-channel VAE. Using any VAE not designed for Flux will produce NaNs immediately during the decode step. This is the most common cause of NaNsException in Flux workflows.
The correct VAE files for Flux: ae.safetensors (the official Flux VAE, available from black-forest-labs on Hugging Face). Do not use vae-ft-mse-840000-ema-pruned.safetensors (SD 1.5 VAE) or sdxl_vae.safetensors (SDXL VAE) with Flux models.
# List your VAE files and their sizes
ls -lh ComfyUI/models/vae/
# Flux ae.safetensors should be approximately 335 MB
# SD 1.5 VAE is approximately 334 MB (similar size - check filename, not size)
# SDXL VAE is approximately 334 MB (same issue)
# Download the correct Flux VAE
# Place in ComfyUI/models/vae/ae.safetensorsIn your workflow, the VAELoader node must point to ae.safetensors, not any other VAE. If you use a CheckpointLoaderSimple with an all-in-one Flux checkpoint that includes the VAE, ensure the checkpoint was not compiled with a substituted VAE.
Cause 2: FP16 Overflow on Older GPUs
FP16 (half-precision float) has a maximum representable value of approximately 65,504. Flux attention layers can produce intermediate values that exceed this limit on certain GPU architectures, causing overflow to infinity and then NaN. This is particularly common on Turing (RTX 20xx) and older Ampere (RTX 30xx on some drivers) GPUs.
The fix is to use bf16 (bfloat16) instead of fp16. BF16 has the same range as fp32 (approximately 3.4 x 10^38) but with reduced precision - it cannot overflow in the same way. Most Flux fp8 checkpoints are internally stored in fp8 but compute in bf16 on supported hardware.
# Force bf16 computation (add to ComfyUI startup)
python main.py --bf16-vae --listen
# Alternatively, force fp32 for maximum stability (slower)
python main.py --force-fp32 --listen
# Check which precision ComfyUI is using
# Look in startup logs for: "VAE dtype: torch.bfloat16" or similarNote: BF16 requires Ampere (RTX 30xx) or newer GPU. On Turing (RTX 20xx), bf16 falls back to fp32 automatically. If you are on a Turing GPU and seeing NaN with fp16, the solution is --force-fp32 at the cost of higher VRAM usage.
Cause 3: CFG Scale Set for Standard Diffusion
This is a common mistake when migrating a workflow from SDXL or SD 1.5 to Flux. Standard diffusion models use CFG (classifier-free guidance) with typical values of 5-12. Flux uses a different guidance mechanism called guidance_scale with typical values of 1.0-4.0 - and the parameter is in a different node.
If you connect a KSampler CFG value of 7.0 (typical for SDXL) to a Flux model, the extremely high guidance can cause NaN through numerical instability. Flux dev is designed to work with CFG at exactly 1.0 in the KSampler (which effectively disables classifier-free guidance), with guidance controlled instead via the FluxGuidance node.
{
"ksampler": {
"class_type": "KSampler",
"inputs": {
"cfg": 1.0,
"guidance": 3.5,
"sampler_name": "euler",
"scheduler": "simple",
"steps": 20
}
},
"flux_guidance": {
"class_type": "FluxGuidance",
"inputs": {
"conditioning": ["clip_encode", 0],
"guidance": 3.5
}
}
}For Flux schnell (the 4-step distilled variant), set guidance to 0 or do not use the FluxGuidance node at all. Schnell was distilled without guidance and adding it causes NaN or degraded results.
Cause 4: Corrupt Checkpoint File
A partially downloaded or corrupt checkpoint will load without error but produce NaN during inference. The safetensors format does not validate checksums on load - it trusts the file contents. Corruption can happen from an interrupted download, a disk write error, or antivirus software modifying the file.
To verify a checkpoint, check the SHA-256 hash against the value published on the model card. Most Flux checkpoints on Hugging Face include a sha256 hash in the model card or .sha256 sidecar file.
# Compute SHA-256 of your checkpoint
sha256sum ComfyUI/models/checkpoints/flux1-dev-fp8.safetensors
# Compare to expected hash from Hugging Face model card
# Example expected hash (always verify against the actual source):
# 31a2e4c74ac13c5dd87de49f6b03e02b77e7b97e1ca4d2e5c1c0fb2f02e4a8f3
# If hashes don't match, re-download:
# wget https://huggingface.co/.../flux1-dev-fp8.safetensors
# Quick integrity check (no reference hash needed)
python3 -c "
from safetensors import safe_open
import sys
try:
f = safe_open(sys.argv[1], framework='pt')
keys = list(f.keys())
print(f'OK: {len(keys)} tensors loaded')
except Exception as e:
print(f'CORRUPT: {e}')
" ComfyUI/models/checkpoints/flux1-dev-fp8.safetensorsCause 5: Incompatible Sampler and Scheduler Combination
Not all sampler and scheduler combinations work with Flux. Certain combinations produce unstable sampling trajectories that generate NaN, especially on the first or last denoising step. This is more common with Flux than with older diffusion models because of the flow-matching training objective.
Verified working combinations for Flux dev and schnell:
- euler + simple (recommended for Flux dev, 20 steps, guidance 3.5)
- euler + sgm_uniform (alternative for Flux dev)
- dpmpp_2m + sgm_uniform (longer generations, Flux dev)
- euler + simple (Flux schnell, 4 steps, guidance 0)
Known problematic combinations that can produce NaN:
- dpm_adaptive (adaptive step count - unstable with Flux at default settings)
- uni_pc + karras (karras schedule incompatible with flow-matching models)
- dpmpp_sde + karras (can overflow at high guidance values)
Systematic Diagnosis Workflow
If you are unsure which cause applies, follow this sequence in order. Each step eliminates one cause:
- Step 1: Load a simple test workflow - just CheckpointLoaderSimple + VAELoader + CLIPTextEncode x2 + KSampler + VAEDecode + SaveImage. No custom nodes, no LoRA, no ControlNet.
- Step 2: Use ae.safetensors as your VAE. If the simple workflow succeeds, your original VAE was the problem.
- Step 3: Set KSampler CFG to 1.0 and add a FluxGuidance node set to 3.5. Use euler + simple. If this fixes it, your CFG settings were the problem.
- Step 4: Add --force-fp32 to your startup flags. If this fixes it, you had a precision overflow issue.
- Step 5: Verify your checkpoint SHA-256. If the hash does not match, re-download the checkpoint.
# Step 4: Launch ComfyUI with fp32 forced
python main.py --force-fp32 --listen
# Monitor for the error during generation
# If it succeeds, you had an fp16 overflow - switch to bf16 for better performance
python main.py --bf16-vae --listenPreventing Future NaNs
Once you identify and fix your NaN source, these habits prevent recurrence: always use the bundled VAE when available (all-in-one Flux checkpoints include the correct VAE), keep CFG at 1.0 for Flux and use FluxGuidance for guidance control, and verify checkpoint hashes after downloading. When adding custom nodes, add them one at a time so you can identify which node introduces instability.