The Real Cost of 1,000 AI Images in 2026
All prices verified May 11, 2026. Self-hosted throughput figures are estimates based on known GPU performance characteristics - actual performance varies by model, resolution, batch size, and system configuration.
| Option | Model | Cost / 1K images | Infrastructure | Notes |
|---|---|---|---|---|
| Self-hosted, Vast.ai RTX 4090 | Flux Schnell | ~$0.31 | Managed by you | $0.31/hr ÷ ~1,000 imgs/hr |
| Self-hosted, RunPod RTX 4090 | Flux Schnell | ~$0.69 | Managed by you | $0.69/hr ÷ ~1,000 imgs/hr |
| Self-hosted, Salad RTX 4090 | Flux Schnell | ~$0.16 | Managed by you | $0.16/hr ÷ ~1,000 imgs/hr |
| Together AI | Flux Schnell | $2.70 | Zero infra | Verified May 2026 |
| Replicate | Flux Schnell | $3.00 | Zero infra | Verified May 2026 |
| fal.ai | Flux Schnell | $3.00 | Zero infra | At 1024×1024 (1MP) |
| Replicate | Flux Dev | $25.00 | Zero infra | Higher quality, non-commercial license |
| Self-hosted, RunPod RTX 4090 | SDXL | ~$1.53 | Managed by you | $0.69/hr ÷ ~450 imgs/hr |
Why Cost Per GPU Hour Misleads You
GPU rental prices are intuitive but dangerous as a cost metric. Two setups can have the same GPU cost per hour and radically different costs per image: a well-optimized inference container on an RTX 4090 generating 1,500 Flux Schnell images per hour costs half as much per image as an under-optimized container generating 750.
The metric that matters is cost per image at your target quality and resolution. To calculate it: measure images generated per hour on your actual workload (not theoretical benchmarks), then divide the hourly GPU cost. A 50% improvement in inference throughput through batching, torch.compile, or model quantization cuts your effective cost per image in half without changing the GPU.
Flux.1 Schnell: The Speed-Cost King
Flux.1 Schnell uses a distilled architecture that generates high-quality images in 4 diffusion steps - compared to 20–50 steps for older models. This is why Schnell dominates production deployments: it is fast, commercially licensed (Apache 2.0), and produces results that satisfy most real-world image generation use cases.
On a managed API, Flux Schnell is approximately $0.003 per image on Replicate and fal.ai, and $0.0027 on Together AI. These prices have been stable through early 2026, though API providers adjust pricing as compute costs change. For self-hosted, an RTX 4090 running Flux Schnell generates an estimated 1,000 images per hour at standard settings - your actual throughput may be higher with batching and optimization.
Flux Schnell Cost by Provider (1,024 × 1,024)
| Provider | 100 images | 1,000 images | 10,000 images | 100,000 images |
|---|---|---|---|---|
| Together AI | $0.27 | $2.70 | $27 | $270 |
| Replicate | $0.30 | $3.00 | $30 | $300 |
| fal.ai | $0.30 | $3.00 | $30 | $300 |
| RunPod self-hosted* | $0.069 | $0.69 | $6.90 | $69 |
| Salad self-hosted* | $0.016 | $0.16 | $1.60 | $16 |
* Self-hosted estimates assume continuous utilization of the GPU. Idle time is billed on RunPod. Salad charges zero for scheduling overhead.
Flux.1 Dev: High Quality, Higher Price
Flux Dev generates noticeably sharper images in complex scenes - fine text rendering, intricate patterns, and detailed backgrounds benefit most. The trade-off is 28–50 inference steps versus 4 for Schnell, making it 7–12× slower and proportionally more expensive. On Replicate, Flux Dev costs $0.025/image - over 8× the price of Schnell.
The other critical constraint: Flux Dev has a non-commercial license. You cannot use it in a product that generates revenue without purchasing a commercial license from Black Forest Labs. Many developers inadvertently violate this by testing with Dev and forgetting to switch to Schnell for production. Use Schnell for commercial products unless you have a specific quality requirement that only Dev satisfies.
| Use case | Recommendation | Reason |
|---|---|---|
| Commercial product, standard photography | Schnell | Apache 2.0 license, 8x cheaper |
| Non-commercial artwork generation | Dev | Quality improvement worth the cost |
| Text-heavy images (posters, mockups) | Dev | Better text rendering |
| Portrait and lifestyle photography | Schnell | Quality gap invisible to most users |
| Product photography on white background | Schnell | Simple scenes, quality parity |
| High-detail illustrations, architecture | Dev or Pro | Complex scenes reveal quality difference |
SDXL: The Self-Hosted Workhorse
SDXL (Stable Diffusion XL) predates Flux but remains relevant in 2026 for teams with existing SDXL fine-tunes and custom workflows. The model requires 20 steps for good results, making self-hosted throughput lower than Flux Schnell: approximately 450 images per hour on an RTX 4090 at 1024×1024 resolution (estimate). The effective cost per image on RunPod is roughly $0.0015/image - cheaper than any managed API but more expensive than self-hosted Flux Schnell.
SDXL shines when you have fine-tuned checkpoint files (LoRA, full fine-tunes) that took significant compute to produce. Migrating those fine-tunes to Flux requires retraining from scratch. If your production quality depends on SDXL fine-tunes, the calculus changes: the right model is the one that matches your trained weights, not the newest architecture.
Managed API vs Self-Hosted: The Break-Even Point
The decision comes down to two factors: volume and engineering capacity. Below a certain volume, managed APIs are cheaper when you factor in the engineering time to build and maintain self-hosted infrastructure. Above that volume, self-hosted savings justify the overhead.
| Monthly volume | Replicate | RunPod (compute) | Eng. overhead* | RunPod total | Winner |
|---|---|---|---|---|---|
| 10K images | $30 | $6.90 | ~$200 | ~$207 | Replicate |
| 50K images | $150 | $34.50 | ~$200 | ~$235 | Roughly equal |
| 100K images | $300 | $69 | ~$200 | ~$269 | RunPod |
| 500K images | $1,500 | $345 | ~$200 | ~$545 | RunPod saves $955/mo |
| 1M images | $3,000 | $690 | ~$200 | ~$890 | RunPod saves $2,110/mo |
* Engineering overhead is a rough estimate of amortized cost for a developer maintaining self-hosted infrastructure - model management, monitoring, uptime, updates. Actual cost depends heavily on team expertise and automation investment.
When Managed API Wins
- Volume under 50,000 images/month - engineering overhead exceeds savings
- Prototype and early-stage products - move fast, validate first
- Teams without infrastructure expertise - the learning curve has real costs
- Latency requirements below 2 seconds - pre-warmed API pools beat cold-start self-hosted
When Self-Hosted Wins
- Volume above 100,000 images/month - clear savings after overhead
- Custom model weights (fine-tunes, LoRA) that are not deployable on standard APIs
- Data privacy requirements - images cannot leave your infrastructure
- Custom inference pipelines (multi-step ComfyUI workflows, chained models)
The Costs You Forget to Count
The sticker price per image or per GPU hour is only part of the total cost. Hidden costs can double the real price if you do not account for them upfront.
- 1Failed and retried jobs: on any system, some jobs fail (OOM errors, timeouts, network issues). Budget a 5–15% failure rate depending on platform reliability.
- 2Storage and egress: generated images need to be stored and served. S3 storage is cheap ($0.023/GB/month) but egress is $0.09/GB. For high-resolution images, egress adds up.
- 3Model loading time: on RunPod pods (not Serverless), you pay for GPU time while loading model weights. A 12 GB Flux checkpoint takes 30–60 seconds on NVMe - real cost at $0.69/hr.
- 4Engineering and maintenance: monitoring containers, handling model updates, debugging silent failures. Even 2 hours/week × $80/hr = $640/month in opportunity cost.
- 5Idle GPU time: if you run a persistent RunPod pod for real-time inference, it bills even when no requests come in. Use RunPod Serverless for scale-to-zero to eliminate idle costs.
- 6Bandwidth for model downloads: if you re-download 12 GB of model weights every pod restart, at high volume this adds up in developer time and cloud bandwidth costs.
Resolution and Cost: The Megapixel Effect
Not all "images" cost the same. Resolution directly affects inference time (and therefore cost on self-hosted), and directly affects price on fal.ai's per-megapixel billing.
| Resolution | Megapixels | Cost per image | Common use case |
|---|---|---|---|
| 512×512 | 0.26 MP | $0.00078 | Thumbnails, previews |
| 768×768 | 0.59 MP | $0.00177 | Social media icons |
| 1024×1024 | 1.05 MP | $0.00315 | Standard generation |
| 1024×1536 | 1.57 MP | $0.00472 | Portrait format |
| 1536×1024 | 1.57 MP | $0.00472 | Landscape format |
| 2048×2048 | 4.19 MP | $0.01258 | High-resolution output |
For self-hosted workloads, higher resolution also means fewer images per hour. A 2048×2048 Flux Schnell generation takes roughly 4× longer than 1024×1024 on the same GPU (VRAM and computation both scale with resolution). The effective cost per image at 2048×2048 on RunPod RTX 4090 would be approximately $0.00276/image - similar to managed API pricing for standard resolution, eliminating the self-hosted cost advantage.
Cost Optimization Playbook
If you're running AI image generation at scale, these six tactics produce the biggest cost reductions:
- 1Use Flux Schnell instead of Dev for commercial products: 8× cheaper, commercially licensed, quality difference imperceptible for most use cases.
- 2Batch requests on self-hosted: processing multiple images in a single forward pass (batch size 2–4) can improve throughput 30–60% without linear VRAM scaling on Schnell.
- 3Use RunPod Serverless or Salad: scale-to-zero eliminates idle GPU billing. A pod running 24/7 at $0.69/hr costs $497/month even with zero requests.
- 4Cache identical prompts: if your product generates the same image repeatedly (same prompt, same seed), store the output and skip regeneration entirely.
- 5Generate at the display resolution, not higher: upscaling is cheap (Real-ESRGAN at < $0.001/image); generating at 2× the needed resolution is expensive and usually unnecessary.
- 6Benchmark before committing: your specific workflow (ComfyUI nodes, ControlNet, custom samplers) will have different throughput than generic benchmarks. Measure your actual images/hour before choosing a pricing tier.
Our Verdict
There is no single cheapest option - the right answer depends on your volume, engineering capacity, and quality requirements. The framework:
- Under 50K images/month: start with Together AI ($2.70/1K) or Replicate ($3.00/1K). Zero infrastructure, iterate fast.
- 50K–200K images/month: evaluate RunPod Serverless. The engineering investment starts paying off, and Serverless eliminates idle costs.
- Above 200K images/month: self-hosted on Salad ($0.16/1K) or Vast.ai ($0.31/1K) delivers 90%+ savings vs managed APIs.
- Custom model weights: always self-hosted - managed APIs only support standard checkpoints.
- Data privacy / on-premise requirement: self-hosted only. Consider RunPod Secure Cloud or a dedicated bare-metal setup.
Need exact VRAM requirements for your configuration? Use our VRAM calculator to instantly get numbers for your resolution, batch size, and quantization level.