Azure offers H100 80GB instances at $6.98/hr per GPU (Standard_NC40ads_H100_v5). The full instance contains 1 GPUs and costs $6.98/hr total. For teams running AI image workloads on existing Azure infrastructure, this is the on-demand rate as of May 2026. Spot instances reduce cost by approximately 60-80%, but can be interrupted with short notice. This page covers the full cost structure, egress costs, spot savings, and how Azure compares to GPU-specialist providers for the same hardware.
Azure H100 80GB Pricing: Instance Types and Rates
The H100 80GB is available on Azure via the Standard_NC40ads_H100_v5 instance. On-demand pricing is $6.98/hr for the full 1-GPU instance, or $6.98/hr per GPU. Azure is the preferred choice for teams with existing Microsoft enterprise agreements. Azure Spot VMs offer 60-80% savings. The NC H100 v5 series is the current H100 offering.
| Provider | Type | Price/GPU/hr | Notes |
|---|---|---|---|
| Oracle Cloud (OCI) | Hyperscaler | $2.00 | Best hyperscaler value for H100. Hidden gem - rare |
| AWS | Hyperscaler | $6.76 | Post price cut. ~5x more expensive than Thunder Co |
| Azure | Hyperscaler | $6.98 | Single GPU. 8x H100 (ND96isr_H100_v5): $12.29/GPU/ |
| GCP | Hyperscaler | $9.80 | Most expensive H100 option across all tiers |
| Salad | Community Edge | $0.990 | Batch inference pricing. Published benchmark: 5,24 |
| Vast.ai | Marketplace | $1.38 | Wide range - depends on verified vs unverified hos |
| Thunder Compute | Datacenter | $1.38 | Cheapest H100 on market as of May 2026. Virtualize |
| TensorDock | Marketplace | $2.25 | Spot price $1.91/hr |
On-Demand vs Spot Pricing on Azure
Azure offers Spot (or equivalent preemptible) instances for H100 80GB at approximately 60-80% off on-demand pricing. At a 60-80% discount, spot would be approximately $2.44/GPU/hr. Spot instances are interruptible: Azure can reclaim them with 2 minutes notice (AWS/Azure) or 30 seconds (GCP Spot VMs). For batch inference jobs with checkpointing, spot significantly reduces cost. For real-time APIs, spot is not suitable.
Reserved instances (1-year or 3-year commitments) offer predictable savings of 30-50% off on-demand. If your GPU workload is continuous, a 1-year reserved instance is almost always cheaper than on-demand. The trade-off is commitment: you pay regardless of whether you use the instance. Hybrid strategies (reserved for baseline, spot for burst) work well for workloads with predictable base load and variable peaks.
Azure H100 80GB vs GPU-Specialist Providers
GPU-specialist providers (RunPod, Vast.ai, Salad, Thunder Compute) typically offer the H100 80GB at significantly lower on-demand rates than Azure. The table above shows the full comparison. The cheapest specialist option for H100 80GB is currently Salad at $0.990/hr versus $6.98/hr on Azure.
| Reason | Detail |
|---|---|
| Compliance (SOC 2, HIPAA, FedRAMP) | Azure carries enterprise compliance certifications that GPU-specialist providers generally do not. |
| Existing cloud contract | Teams already running workloads on Azure avoid vendor complexity and billing fragmentation. |
| Network proximity | If your data pipeline is on Azure, keeping GPU inference in the same cloud eliminates cross-provider egress fees. |
| SLA uptime guarantee | Azure offers contractual uptime SLAs. Specialist providers typically offer best-effort availability. |
| Enterprise support | Azure provides 24/7 enterprise support with defined response times. Specialist providers vary. |
If none of the above reasons apply to your team, a GPU-specialist provider is likely 3-7x cheaper than Azure for the same H100 80GB workload. The specialist providers listed in the table above are purpose-built for GPU compute and have invested in diffusion-model-specific optimisations that general-purpose clouds have not.
Egress and Hidden Costs on Azure
Azure charges $0.0870/GB for outbound data transfer. For AI image workloads, egress applies when transferring generated images out of Azure to your own servers, a CDN, or end users. A 1 MB output image costs $0.00009 in egress fees. At 10,000 images/month at 1 MB each, egress adds approximately $0.87/month to your compute bill. At 100,000 images/month, egress costs $8.70/month.
Egress within the same Azure region is free or near-free. If you store generated images on Azure object storage (S3/GCS/Azure Blob/OCI Object Storage) in the same region, you pay only the storage rate (typically $0.02-$0.023/GB/month) and avoid transfer fees entirely until the images leave the cloud. OCI has notably lower egress at $0.0085/GB (with the first 10 TB/month free), making it the cheapest hyperscaler for output-heavy AI workloads.
When Azure Makes Sense for AI Image Workloads
Azure is the right choice for H100 80GB AI inference when your team already operates significant infrastructure on Azure and values vendor consolidation, when your workload must meet compliance requirements that only major cloud providers satisfy, or when your data pipeline is already in Azure and cross-provider egress would cost more than the price premium on GPU compute.
Azure is the wrong choice when your sole criterion is GPU cost per hour. Specialist providers offer the same H100 80GB hardware at $0.990/hr versus $6.98/hr on Azure. For a team generating 100,000 Flux Dev images per month (62 GPU-hours), the difference is approximately $374/month. At that scale, the specialist option funds considerable engineering effort.
Reserved Instances and Committed Use on Azure
Azure offers discounted pricing for reserved capacity. 1-year reservations typically save 30-40% versus on-demand; 3-year reservations save 50-60%. For teams with stable, predictable GPU workloads, reserved capacity is almost always cheaper than on-demand even when Azure is more expensive than specialists. Run the numbers: if your workload uses the H100 80GB more than 50% of the time, a reserved instance at $4.54/hr (estimated 1-year rate) may undercut specialist on-demand pricing.
Free tier and credits: most major cloud providers offer $200-$300 in free credits for new accounts. Azure occasionally runs GPU credit promotions for startups and academic researchers. For early-stage teams, these credits can fund initial AI workload development before committing to a long-term cloud provider.
Provisioning H100 80GB on Azure: Practical Steps
Provisioning a H100 80GB instance on Azure requires a quota increase in most accounts. Default GPU quotas are often zero; submit a support request to the Azure GPU quota team with your intended use case and expected usage volume. Quota approvals typically take 1-3 business days. Request the quota in the region where you intend to run workloads: GPU availability varies significantly by region, and your target region may have different lead times than others.
Once quota is granted, provisioning takes 5-10 minutes. Use a deep learning base image (Azure maintains official GPU-ready images with CUDA, cuDNN, and PyTorch pre-installed). Install ComfyUI or your inference server on top of the base image, download model weights (Flux Dev is ~24 GB, SDXL is ~7 GB), and configure network access to the inference port. For persistent production deployments, build a custom container image with weights baked in to eliminate the weight download step on each instance start.
Cost control on Azure: set budget alerts at 80% of your monthly GPU budget and configure instance auto-termination for batch jobs. Most hyperscalers offer cost anomaly detection that alerts you to unexpected spend spikes within hours. For spot instances, implement checkpoint saving every 10-15 minutes so an interrupted job can resume from the last checkpoint rather than restarting from scratch. An interrupted 8-hour batch job without checkpointing wastes the full $55.84 in GPU compute with nothing to show for it.
Estimated Cost Running Flux Dev on Azure H100 80GB
A full cost estimate for running Flux Dev on Azure H100 80GB: GPU compute at $6.98/GPU/hr, throughput approximately 1,600 images/hr, giving $0.0044/image. At 10,000 images/month, that is $43.62 in GPU compute plus storage and egress costs.
| Cost component | Monthly estimate | Notes |
|---|---|---|
| GPU compute (10K images) | $43.62 | At ~1,600 imgs/hr, $6.98/hr |
| Storage (model weights) | ~$15-30 | Flux Dev ~24 GB; stored persistently |
| Egress (10K x 1 MB images) | $0.87 | $0.0870/GB x 10 GB output |
| Total estimate | ~$67.00 | Compute + storage + egress |
Compare this to managed inference APIs: Replicate charges $0.025/image for Flux Dev, giving $250 for 10,000 images. The Azure self-hosted option costs approximately $43.62 in GPU compute alone, which is lower than the managed API cost. Specialist providers like RunPod offer the same H100 80GB hardware at $0.990/hr, making them significantly more cost-effective for pure inference workloads.
Bottom line on Azure for AI image workloads: it makes financial sense when your team already runs significant workloads on Azure and values vendor consolidation, compliance, or network proximity. It does not make sense when GPU cost per hour is your primary criterion. The hyperscaler premium for H100 80GB at $6.98/hr versus $0.990/hr on specialist providers represents a significant monthly cost at any meaningful scale. A hybrid approach works well for many teams: run baseline inference on a cost-optimised specialist provider and use Azure only for data-sensitive workloads that require its compliance certifications or for burst capacity when specialist providers are fully allocated.
Review pricing quarterly: hyperscaler GPU pricing has declined steadily since 2023 as the market has become more competitive. Azure pricing has changed materially within 12-month periods in response to competition from specialist GPU clouds. The figures on this page were verified in May 2026; recalculate your cost model every quarter if GPU compute is a significant line item.
One often-overlooked advantage of hyperscalers for AI workloads: the ability to combine GPU compute with other managed services in the same billing account. Object storage, queuing services, monitoring, and content delivery are all available within Azure without cross-provider network costs. For a production AI image pipeline with a multi-stage architecture (upload, process, store, deliver), running everything within Azure can simplify operations even if the GPU compute cost is higher than a specialist provider. The decision ultimately comes down to whether operational simplicity or unit economics is the higher priority for your team at your current stage.