// cost · ai-image-cost

Cost Per 1,000 AI Images in 2026: APIs vs Self-Hosted

Flux Schnell costs $2.70 per 1,000 images via API. Self-hosted on Salad costs $0.27. Real numbers for every path — inference APIs vs GPU rental — verified May 2026.

Published 2026-05-11ai inference coststable diffusion costreplicate pricing

The Real Cost of 1,000 AI Images in 2026

All prices verified May 11, 2026. Self-hosted throughput figures are estimates based on known GPU performance characteristics - actual performance varies by model, resolution, batch size, and system configuration.

Cost per 1,000 AI images - all options
OptionModelCost / 1K imagesInfrastructureNotes
Self-hosted, Vast.ai RTX 4090Flux Schnell~$0.31Managed by you$0.31/hr ÷ ~1,000 imgs/hr
Self-hosted, RunPod RTX 4090Flux Schnell~$0.69Managed by you$0.69/hr ÷ ~1,000 imgs/hr
Self-hosted, Salad RTX 4090Flux Schnell~$0.16Managed by you$0.16/hr ÷ ~1,000 imgs/hr
Together AIFlux Schnell$2.70Zero infraVerified May 2026
ReplicateFlux Schnell$3.00Zero infraVerified May 2026
fal.aiFlux Schnell$3.00Zero infraAt 1024×1024 (1MP)
ReplicateFlux Dev$25.00Zero infraHigher quality, non-commercial license
Self-hosted, RunPod RTX 4090SDXL~$1.53Managed by you$0.69/hr ÷ ~450 imgs/hr
$0.16
Lowest cost per 1,000 Flux Schnell images - Salad RTX 4090 self-hosted (vs $3.00 on managed APIs)
Salad pricing page, May 11, 2026 + estimated throughput

Why Cost Per GPU Hour Misleads You

GPU rental prices are intuitive but dangerous as a cost metric. Two setups can have the same GPU cost per hour and radically different costs per image: a well-optimized inference container on an RTX 4090 generating 1,500 Flux Schnell images per hour costs half as much per image as an under-optimized container generating 750.

The metric that matters is cost per image at your target quality and resolution. To calculate it: measure images generated per hour on your actual workload (not theoretical benchmarks), then divide the hourly GPU cost. A 50% improvement in inference throughput through batching, torch.compile, or model quantization cuts your effective cost per image in half without changing the GPU.

NOTE
Self-hosted throughput estimates in this article use conservative figures - actual performance can be higher with optimization. Before using these numbers for financial projections, benchmark your specific workflow on target hardware.

Flux.1 Schnell: The Speed-Cost King

Flux.1 Schnell uses a distilled architecture that generates high-quality images in 4 diffusion steps - compared to 20–50 steps for older models. This is why Schnell dominates production deployments: it is fast, commercially licensed (Apache 2.0), and produces results that satisfy most real-world image generation use cases.

On a managed API, Flux Schnell is approximately $0.003 per image on Replicate and fal.ai, and $0.0027 on Together AI. These prices have been stable through early 2026, though API providers adjust pricing as compute costs change. For self-hosted, an RTX 4090 running Flux Schnell generates an estimated 1,000 images per hour at standard settings - your actual throughput may be higher with batching and optimization.

Flux Schnell Cost by Provider (1,024 × 1,024)

Flux Schnell managed API prices - verified May 11, 2026
Provider100 images1,000 images10,000 images100,000 images
Together AI$0.27$2.70$27$270
Replicate$0.30$3.00$30$300
fal.ai$0.30$3.00$30$300
RunPod self-hosted*$0.069$0.69$6.90$69
Salad self-hosted*$0.016$0.16$1.60$16

* Self-hosted estimates assume continuous utilization of the GPU. Idle time is billed on RunPod. Salad charges zero for scheduling overhead.

Flux.1 Dev: High Quality, Higher Price

Flux Dev generates noticeably sharper images in complex scenes - fine text rendering, intricate patterns, and detailed backgrounds benefit most. The trade-off is 28–50 inference steps versus 4 for Schnell, making it 7–12× slower and proportionally more expensive. On Replicate, Flux Dev costs $0.025/image - over 8× the price of Schnell.

The other critical constraint: Flux Dev has a non-commercial license. You cannot use it in a product that generates revenue without purchasing a commercial license from Black Forest Labs. Many developers inadvertently violate this by testing with Dev and forgetting to switch to Schnell for production. Use Schnell for commercial products unless you have a specific quality requirement that only Dev satisfies.

When to use Flux Schnell vs Dev
Use caseRecommendationReason
Commercial product, standard photographySchnellApache 2.0 license, 8x cheaper
Non-commercial artwork generationDevQuality improvement worth the cost
Text-heavy images (posters, mockups)DevBetter text rendering
Portrait and lifestyle photographySchnellQuality gap invisible to most users
Product photography on white backgroundSchnellSimple scenes, quality parity
High-detail illustrations, architectureDev or ProComplex scenes reveal quality difference

SDXL: The Self-Hosted Workhorse

SDXL (Stable Diffusion XL) predates Flux but remains relevant in 2026 for teams with existing SDXL fine-tunes and custom workflows. The model requires 20 steps for good results, making self-hosted throughput lower than Flux Schnell: approximately 450 images per hour on an RTX 4090 at 1024×1024 resolution (estimate). The effective cost per image on RunPod is roughly $0.0015/image - cheaper than any managed API but more expensive than self-hosted Flux Schnell.

SDXL shines when you have fine-tuned checkpoint files (LoRA, full fine-tunes) that took significant compute to produce. Migrating those fine-tunes to Flux requires retraining from scratch. If your production quality depends on SDXL fine-tunes, the calculus changes: the right model is the one that matches your trained weights, not the newest architecture.

Managed API vs Self-Hosted: The Break-Even Point

The decision comes down to two factors: volume and engineering capacity. Below a certain volume, managed APIs are cheaper when you factor in the engineering time to build and maintain self-hosted infrastructure. Above that volume, self-hosted savings justify the overhead.

Break-even analysis: Replicate vs RunPod self-hosted (Flux Schnell)
Monthly volumeReplicateRunPod (compute)Eng. overhead*RunPod totalWinner
10K images$30$6.90~$200~$207Replicate
50K images$150$34.50~$200~$235Roughly equal
100K images$300$69~$200~$269RunPod
500K images$1,500$345~$200~$545RunPod saves $955/mo
1M images$3,000$690~$200~$890RunPod saves $2,110/mo

* Engineering overhead is a rough estimate of amortized cost for a developer maintaining self-hosted infrastructure - model management, monitoring, uptime, updates. Actual cost depends heavily on team expertise and automation investment.

When Managed API Wins

  • Volume under 50,000 images/month - engineering overhead exceeds savings
  • Prototype and early-stage products - move fast, validate first
  • Teams without infrastructure expertise - the learning curve has real costs
  • Latency requirements below 2 seconds - pre-warmed API pools beat cold-start self-hosted

When Self-Hosted Wins

  • Volume above 100,000 images/month - clear savings after overhead
  • Custom model weights (fine-tunes, LoRA) that are not deployable on standard APIs
  • Data privacy requirements - images cannot leave your infrastructure
  • Custom inference pipelines (multi-step ComfyUI workflows, chained models)

The Costs You Forget to Count

The sticker price per image or per GPU hour is only part of the total cost. Hidden costs can double the real price if you do not account for them upfront.

  1. 1Failed and retried jobs: on any system, some jobs fail (OOM errors, timeouts, network issues). Budget a 5–15% failure rate depending on platform reliability.
  2. 2Storage and egress: generated images need to be stored and served. S3 storage is cheap ($0.023/GB/month) but egress is $0.09/GB. For high-resolution images, egress adds up.
  3. 3Model loading time: on RunPod pods (not Serverless), you pay for GPU time while loading model weights. A 12 GB Flux checkpoint takes 30–60 seconds on NVMe - real cost at $0.69/hr.
  4. 4Engineering and maintenance: monitoring containers, handling model updates, debugging silent failures. Even 2 hours/week × $80/hr = $640/month in opportunity cost.
  5. 5Idle GPU time: if you run a persistent RunPod pod for real-time inference, it bills even when no requests come in. Use RunPod Serverless for scale-to-zero to eliminate idle costs.
  6. 6Bandwidth for model downloads: if you re-download 12 GB of model weights every pod restart, at high volume this adds up in developer time and cloud bandwidth costs.

Resolution and Cost: The Megapixel Effect

Not all "images" cost the same. Resolution directly affects inference time (and therefore cost on self-hosted), and directly affects price on fal.ai's per-megapixel billing.

Cost per image by resolution - fal.ai Flux Schnell ($0.003/MP)
ResolutionMegapixelsCost per imageCommon use case
512×5120.26 MP$0.00078Thumbnails, previews
768×7680.59 MP$0.00177Social media icons
1024×10241.05 MP$0.00315Standard generation
1024×15361.57 MP$0.00472Portrait format
1536×10241.57 MP$0.00472Landscape format
2048×20484.19 MP$0.01258High-resolution output

For self-hosted workloads, higher resolution also means fewer images per hour. A 2048×2048 Flux Schnell generation takes roughly 4× longer than 1024×1024 on the same GPU (VRAM and computation both scale with resolution). The effective cost per image at 2048×2048 on RunPod RTX 4090 would be approximately $0.00276/image - similar to managed API pricing for standard resolution, eliminating the self-hosted cost advantage.

Cost Optimization Playbook

If you're running AI image generation at scale, these six tactics produce the biggest cost reductions:

  1. 1Use Flux Schnell instead of Dev for commercial products: 8× cheaper, commercially licensed, quality difference imperceptible for most use cases.
  2. 2Batch requests on self-hosted: processing multiple images in a single forward pass (batch size 2–4) can improve throughput 30–60% without linear VRAM scaling on Schnell.
  3. 3Use RunPod Serverless or Salad: scale-to-zero eliminates idle GPU billing. A pod running 24/7 at $0.69/hr costs $497/month even with zero requests.
  4. 4Cache identical prompts: if your product generates the same image repeatedly (same prompt, same seed), store the output and skip regeneration entirely.
  5. 5Generate at the display resolution, not higher: upscaling is cheap (Real-ESRGAN at < $0.001/image); generating at 2× the needed resolution is expensive and usually unnecessary.
  6. 6Benchmark before committing: your specific workflow (ComfyUI nodes, ControlNet, custom samplers) will have different throughput than generic benchmarks. Measure your actual images/hour before choosing a pricing tier.

Our Verdict

There is no single cheapest option - the right answer depends on your volume, engineering capacity, and quality requirements. The framework:

  • Under 50K images/month: start with Together AI ($2.70/1K) or Replicate ($3.00/1K). Zero infrastructure, iterate fast.
  • 50K–200K images/month: evaluate RunPod Serverless. The engineering investment starts paying off, and Serverless eliminates idle costs.
  • Above 200K images/month: self-hosted on Salad ($0.16/1K) or Vast.ai ($0.31/1K) delivers 90%+ savings vs managed APIs.
  • Custom model weights: always self-hosted - managed APIs only support standard checkpoints.
  • Data privacy / on-premise requirement: self-hosted only. Consider RunPod Secure Cloud or a dedicated bare-metal setup.

Need exact VRAM requirements for your configuration? Use our VRAM calculator to instantly get numbers for your resolution, batch size, and quantization level.

Frequently Asked Questions

What does it cost to run 1 million AI images per month?

On Together AI (Flux Schnell): $2,700/month. On Replicate: $3,000/month. Self-hosted on RunPod RTX 4090 (estimated ~1,000 imgs/hr, $0.69/hr): roughly $690/month for compute, plus ~$200/month engineering overhead = $890/month. Self-hosted on Salad RTX 4090 ($0.16/hr): ~$160/month for compute + overhead = ~$360/month total. At 1M images/month, self-hosting produces 80–90% savings vs managed APIs.

Is it cheaper to use SDXL or Flux for production?

For self-hosted workloads, Flux Schnell is cheaper than SDXL because Schnell generates images in 4 steps vs SDXL's 20 steps - roughly 4× more images per GPU hour. For managed APIs, Flux Schnell on Replicate ($0.003/image) is typically comparable to or cheaper than SDXL depending on the specific API and model variant. Flux Schnell is the better choice for new production systems on both cost and quality grounds.

How accurate are the throughput estimates in this article?

The estimates (e.g., ~1,000 Flux Schnell images/hr on RTX 4090) are conservative benchmarks at standard settings with batch size 1. Actual throughput depends on your inference server, batching configuration, image resolution, and ComfyUI node overhead. Before scaling, benchmark your specific workflow - your numbers may be 20–50% higher with optimization, or lower with complex multi-step pipelines.

Does image resolution affect managed API costs?

It depends on the provider. Replicate charges per image regardless of resolution for Flux Schnell ($0.003/image whether 512×512 or 2048×2048). fal.ai charges per megapixel, so a 2048×2048 image costs 4× more than a 1024×1024 image. Together AI charges per image. For high-resolution outputs at volume, Replicate's flat per-image pricing becomes advantageous over fal.ai.

What is the cheapest way to generate AI images in 2026?

Self-hosted on Salad Cloud using an RTX 4090 ($0.16/hr). At an estimated 1,000 Flux Schnell images per hour, that is $0.00016/image - roughly 19× cheaper than Replicate's $0.003/image. The trade-off is infrastructure management and designing your system for Salad's distributed, container-restart-friendly architecture.

Are there free options for AI image generation?

Replicate, fal.ai, and Together AI all offer free credits for new accounts - enough for small-scale testing. Stable Diffusion can be run locally on your own hardware for free if you have a capable GPU. For production at any meaningful volume, free tiers are insufficient and paid infrastructure is required.

How does ComfyUI affect cost vs a simple API call?

ComfyUI introduces pipeline overhead - node execution, image saving, output retrieval - that adds a few hundred milliseconds per image. For cost calculation, this reduces effective throughput slightly compared to raw diffusion benchmarks. A complex multi-model ComfyUI workflow (upscaling, face enhancement, background removal) may generate only 200–400 images/hr on an RTX 4090 vs 1,000 for simple Flux Schnell generation. Cost per image scales accordingly.

When should I switch from managed API to self-hosted?

The rough rule of thumb: when your monthly API bill exceeds ~$200–$300 and you have 2–4 weeks of engineering capacity to invest in self-hosted setup and ongoing maintenance. At that point, RunPod Serverless or Salad starts saving real money. Below that volume, the managed API's zero-overhead simplicity is worth the premium.