// cost · image-generation-api

Cheapest Flux API in 2026: Every Provider Compared

Flux Schnell from $0.0027 to $0.003 per image depending on provider. Real prices for Flux Schnell, Dev, and Pro across Together.ai, Replicate, fal.ai, and Novita — verified May 2026.

Published 2026-05-11flux apiflux schnell apiai image generation api

All Flux API Providers - Price Comparison

Prices verified May 11, 2026. All providers below offer serverless access with no infrastructure to manage.

Flux API pricing - verified May 11, 2026
ProviderFlux SchnellFlux DevBilling model
Replicate$0.003/image$0.025/imagePer image
Together AI$0.0027/image$0.0154/imagePer image
fal.ai$0.003/MP$0.003/MP (dev tier)Per megapixel
Self-hosted (RunPod RTX 4090)~$0.00069/img~$0.0015/imgPer GPU hour ($0.69/hr)
Self-hosted (Vast.ai RTX 4090)~$0.00031/img~$0.00069/imgPer GPU hour ($0.31/hr avg)
$0.0027
Cheapest managed Flux Schnell per image - Together AI (verified May 11, 2026)
together.ai/pricing

Together AI is currently the cheapest managed option for Flux Schnell at $0.0027/image - 10% cheaper than Replicate and fal.ai. For Flux Dev, Together AI charges $0.0154/image versus Replicate's $0.025/image - a 38% saving. The self-hosted options on RunPod and Vast.ai are cheaper still at scale, but require infrastructure management and have higher per-job latency due to model loading.

Why Flux Became the Standard for Production AI Images

Black Forest Labs released Flux.1 in August 2024. Within months, it had displaced SDXL as the default choice for production image generation workloads. The reason is the combination: better prompt adherence than SDXL, faster inference with Schnell's 4-step distillation, and a permissive commercial license on the Schnell variant. SDXL required 20–50 steps for good results; Flux Schnell produces comparable quality in 4 steps.

The Flux family has three tiers: Flux.1 Schnell (fastest, permissive license, ideal for production), Flux.1 Dev (higher quality, non-commercial license - check your use case carefully), and Flux.1 Pro (highest quality, via BFL API only, not available for self-hosting). Most production systems use Schnell for its speed and licensing clarity.

NOTE
Flux.1 Dev has a non-commercial license. If you're building a commercial product, use Flux.1 Schnell or Flux.1 Pro via the official BFL API. Running Flux Dev via a provider API for a commercial product may violate the license terms - check with your legal team.

Replicate - The Safe Default

Replicate is the go-to choice for developers who want to start generating images with minimal setup. You post a JSON payload, get back a URL. Flux Schnell costs $3.00 per 1,000 images ($0.003/image), and Flux Dev is $0.025/image. The API is well-documented, reliable, and the Python/Node SDKs are mature.

The main drawbacks are cost at scale and cold start latency. Replicate runs serverless workers that spin up on demand - if you have not run a model recently, the first request of a session takes longer while the worker initializes. For bursty workloads this is usually acceptable; for real-time applications with strict SLAs, it can be problematic. Replicate also has rate limits on free and starter plans that you need to check before relying on it for high-volume production use.

Replicate: Pros and Cons

  • Easiest onboarding: working API in under 5 minutes, no infrastructure decisions
  • Reliable and well-maintained: Replicate handles model updates and infrastructure
  • Cold starts: first request per session can take 5–30 seconds
  • Price at scale: $0.003/image becomes expensive above 100K images/month vs self-hosted
  • Rate limits: check current limits before building high-volume production pipelines

fal.ai - Speed-First, Per-Megapixel Pricing

fal.ai prices Flux Schnell at $0.003 per megapixel, which means the effective cost per image depends on resolution. A 1024×1024 image is exactly 1 megapixel = $0.003. A 1536×1024 image is 1.57 megapixels = $0.0047. A 2048×2048 image is 4 megapixels = $0.012. If you're generating standard 1024×1024 images, fal.ai is price-competitive with Replicate. If you're generating high-resolution outputs, fal.ai becomes more expensive.

fal.ai's infrastructure is optimized for low latency. They run dedicated GPU pools with models pre-loaded in memory, which eliminates cold starts on popular models. For Flux Schnell, fal.ai typically returns the first image in under 2 seconds after submission. They also support queue-based async processing and webhooks for high-volume batch generation.

fal.ai Pros and Cons

  • No cold starts on popular models: pre-warmed workers mean consistent latency
  • Per-megapixel billing: fair for standard resolutions, expensive for high-res outputs
  • Async queue and webhooks: good for batch workloads with callback-based architectures
  • Strong model selection: broad support for Flux variants, ControlNet, and image-to-image

Together AI - Cheapest Managed Flux

Together AI is primarily known for LLM inference, but their image generation endpoint offers the lowest managed price for Flux Schnell: $0.0027/image - matching the quality of Replicate at 10% lower cost. Flux Dev is $0.0154/image, also cheaper than Replicate's $0.025. For teams already using Together AI for language models, adding image generation requires no new vendor.

The main consideration with Together AI for image generation is that their core product is LLMs - image generation is a secondary offering. Documentation is less detailed than Replicate's, model selection is narrower, and advanced features like ControlNet or img2img have more limited support. If you need straightforward text-to-image with Flux Schnell and want the lowest managed price, Together AI is the right choice.

Together AI Pros and Cons

  • Lowest managed price for Flux Schnell at $0.0027/image - 10% cheaper than Replicate
  • Same vendor as LLM inference - simplified billing if you're already on Together AI
  • Narrower model support vs Replicate or fal.ai for image-specific features
  • Less image-generation documentation - expect more trial and error during integration

Self-Hosted on RunPod: The Math

Self-hosting Flux on RunPod makes sense when you're generating more than ~50,000 images per month. Below that volume, the engineering overhead of maintaining a container, handling model downloads, and monitoring uptime typically costs more in developer time than the API savings deliver.

A RunPod RTX 4090 at $0.69/hr generates approximately 1,000 Flux Schnell images per hour (conservative estimate at 4 steps with batch size 1 and standard resolution). That works out to $0.00069/image - a 77% saving vs Replicate's $0.003/image. At 100,000 images/month, self-hosted costs roughly $69 vs $300 on Replicate. At 500,000 images/month, you save over $1,000 per month.

Break-even: managed API vs RunPod self-hosted
Monthly volumeReplicate costRunPod RTX 4090Savings
10,000 images$30$6.90 + setupNegligible after overhead
50,000 images$150$34.50~$100/mo after ops time
100,000 images$300$69$231/mo
500,000 images$1,500$345$1,155/mo
1,000,000 images$3,000$690$2,310/mo
NOTE
These self-hosted calculations assume continuous utilization. A RunPod pod sitting idle still costs $0.69/hr. Use RunPod Serverless for scale-to-zero behavior - workers only cost money when actively processing jobs.

Flux Schnell vs Flux Dev: Which Should You Use?

Flux Schnell vs Flux Dev
Flux.1 SchnellFlux.1 Dev
Inference steps4 steps28–50 steps
Relative speed~6–8x fasterSlower, more detail
Image qualityExcellent for most use casesNoticeably sharper detail in complex scenes
LicenseApache 2.0 (commercial OK)Non-commercial only
Replicate price$0.003/image$0.025/image
Together AI price$0.0027/image$0.0154/image
Best forProduction APIs, batch generationPortfolio pieces, non-commercial projects

For most production use cases, Flux Schnell is the correct choice. It is faster, cheaper, and commercially licensed. The quality gap between Schnell and Dev is visible in highly complex scenes with fine text or intricate backgrounds, but imperceptible in standard portrait, product, and lifestyle photography use cases. Use Flux Dev only when quality in complex scenes is critical and your use case is non-commercial.

Latency: What to Expect Per Provider

Latency matters for real-time applications. Here is what to expect for Flux Schnell at 1024×1024 resolution under normal load (not at capacity limits):

Expected latency per provider - Flux Schnell
ProviderCold start (first request)Warm latencyNotes
Replicate5–30 seconds1.5–4 secondsCold start on first call per session
fal.ai<1 second0.8–2 secondsPre-warmed workers, consistent
Together AI1–5 seconds1–3 secondsVaries with model load
Self-hosted RunPod30–120 seconds (model load)1–3 seconds (after load)One-time cost per container start

Our Recommendation

Start here

  • Building a prototype or low-volume product: use Replicate. Best docs, easiest setup, reliable.
  • Cost is your top priority and volume is modest: Together AI saves 10% over Replicate with no quality trade-off.
  • Real-time UX with no cold starts: fal.ai. Pre-warmed workers = consistent sub-2-second latency.
  • High volume (100K+ images/month): self-hosted on RunPod Serverless - 77% cheaper at scale.
  • Batch generation with cost as top priority: self-hosted on Vast.ai RTX 4090 - lowest cost per image.

Need exact VRAM requirements for your configuration? Use our VRAM calculator to instantly get numbers for your resolution, batch size, and quantization level.

Frequently Asked Questions

Is Flux.1 Schnell free to use commercially?

Yes. Flux.1 Schnell is released under the Apache 2.0 license, which permits commercial use. You can run it in a product, charge customers, and distribute outputs without restriction. Flux.1 Dev is non-commercial - check the license before using it in a revenue-generating product.

How does fal.ai's per-megapixel pricing work in practice?

fal.ai charges $0.003 per megapixel. A 1024×1024 image = 1MP = $0.003. A 1280×720 image = 0.9216MP ≈ $0.0028. A 1536×1024 image = 1.573MP ≈ $0.0047. For standard 1024×1024 generation, fal.ai matches Replicate's price. For higher resolutions, you pay proportionally more - budget accordingly.

What is Flux.1 Pro and how does it compare?

Flux.1 Pro is the highest-quality Flux variant, available only through Black Forest Labs's own API (api.bfl.ml) and licensed providers. It is not available for self-hosting. Quality is noticeably better than Dev in complex scenes, and it has a commercial license. Check api.bfl.ml for current pricing as rates evolve frequently.

Can I run Flux on an older GPU like an RTX 3090?

Yes. Flux Schnell runs on 24GB VRAM GPUs including the RTX 3090. Performance is lower than the RTX 4090 - expect roughly 20–30% fewer images per hour due to slower memory bandwidth. At Vast.ai's RTX 3090 price of $0.13/hr, the cost per image is still very competitive even at lower throughput.

Does Together AI support ControlNet or img2img with Flux?

Together AI's image generation offering focuses on standard text-to-image with Flux. ControlNet, inpainting, and img2img have limited or no support. For advanced image manipulation, Replicate or fal.ai offer broader feature sets. Check current Together AI documentation for the latest model capabilities.

What are the rate limits on Replicate for Flux?

Replicate rate limits vary by plan. Free tier has strict limits unsuitable for production. Paid plans include higher request rates, but specific limits are shown in your Replicate dashboard rather than the public pricing page. For high-volume production use, contact Replicate about enterprise plans with dedicated capacity.

How much does it cost to run 1 million images per month?

At 1 million images/month on Replicate Flux Schnell: $3,000/month. On Together AI: $2,700/month. Self-hosted on RunPod RTX 4090 (at $0.69/hr and ~1,000 imgs/hr): approximately $690/month for compute - an 80%+ saving. Self-hosted costs require adding storage, engineering time, and monitoring overhead, but the economics are compelling at that volume.

Is fal.ai reliable enough for production?

fal.ai runs production inference infrastructure for many commercial products. Their pre-warmed worker model provides consistent latency without cold starts - a genuine advantage over Replicate for latency-sensitive applications. They have a status page and offer SLA tiers for enterprise customers. For most production AI image applications, fal.ai is a solid choice.