Every MLS listing uploaded by a real estate agent has the same problem: the photo is technically correct but visually unremarkable. A blown-out window. A flat grey sky. A kitchen that looks dingy in artificial light. A bathroom with a photographer's flash reflected in the mirror. These are not artistic failures -- they are the predictable output of a 30-minute photo session done in a hurry. The fix for each one is a documented pipeline step. The tools to run those steps exist. Nobody has turned them into a B2B API.
Manual photo editing services -- PhotoUp, BoxBrownie, FixThePhoto -- charge $3-8 per image and deliver in 24 hours. At individual agent scale that is tolerable. At MLS platform scale, processing 50,000 listing photos per month through a human editing queue is an operational constraint, not a workflow. This article is about building the API that removes that constraint.








| Stack | Infra /mo | AI team | Total cost | Revenue | Margin |
|---|---|---|---|---|---|
Runflow 10% volume discount applied | $900 | $0 | $900 | $15K | 94% |
Cloud API + manual QA similar pricing · no auto-QA · part-time engineer needed | $1.0K | ~$5K | $6.0K | $15K | 60% |
Self-hosted GPU raw compute · full-time AI engineer required | $400 | $12K | $12K | $15K | 17% |
Runflow Sentinel — built-in quality control layer that automatically detects and discards failed or low-quality outputs before delivery. You only pay for images that pass QA. No engineer needed to babysit the pipeline.
Pricing based on Runflow published rates (June 2026) with automatic volume discounts. Revenue column is illustrative — actual client pricing varies by vertical and contract size. GPU self-hosted estimate uses $0.04/img raw compute cost.
The four problems every listing photo has
Listing photos fail in predictable ways. Understanding each failure mode helps you scope the pipeline and explain it to buyers without using technical jargon.
Blown-out windows (HDR problem). A camera exposes for the interior or the exterior, not both. The result is either a dark room with a bright white rectangle where the window should be, or a correctly exposed view with a pitch-black interior. The fix is HDR blending: capture or synthesize multiple exposures and merge them. The view through the window becomes visible while the room stays bright.
Flat or overcast sky (replacement problem). A grey sky makes an exterior look uninviting regardless of how good the property is. Sky replacement -- segmenting the sky region and compositing a better sky -- is the most requested single enhancement in real estate editing. Done well, it is undetectable. Done badly, it looks like a stock photo cut-and-paste.
Wrong color temperature (color grade problem). Kitchens and bathrooms lit by warm incandescent bulbs look yellow in photos. Living rooms under fluorescent office lighting look cold and clinical. Color grading corrects white balance, lifts shadows, and adds the warmth that makes a space look liveable rather than clinical.
Flash artifacts (reflection and highlight removal). On-camera flash creates harsh shadows, specular highlights on glossy surfaces, and, in bathrooms, a photographer's reflection in the mirror. These artifacts are immediately visible and unprofessional. Removing them requires detecting reflection regions and inpainting natural-looking replacements.
Who actually buys this
The ICP for photo enhancement is identical to virtual staging and day-to-dusk: MLS platforms, prop-tech SaaS, and real estate photography agencies. The difference is that enhancement is higher-volume and lower-margin per image than staging, which makes it more suitable as a high-frequency platform feature than as a standalone product.
MLS platforms process every photo on every listing. If enhancement is built into the upload flow -- automatic correction applied to every photo before it goes live -- it becomes a platform quality feature, not an optional add-on. That is worth $0.20-0.80 per image to a platform processing 50,000 photos per month. Annual contract value: $120,000-480,000.
Photography agencies are the faster path to initial revenue. A regional agency doing 300 shoots per month is currently paying $900-2,400 per month to BoxBrownie for manual corrections. You can replace that workflow at lower cost with same-day turnaround instead of 24 hours. Contracts close in 2-4 weeks rather than the 3-9 months typical for platform deals.
What the market looks like today
The photo enhancement market is larger than virtual staging but equally fragmented at the service level. Every significant competitor is a human editing operation. None have built an API.
| Product | Price per Image | API Available | Target Customer | Turnaround |
|---|---|---|---|---|
| PhotoUp | $3-8 | No | Agents, agencies | 24h (human) |
| BoxBrownie | $4-10 | No | Agents, photographers | 24h (human) |
| FixThePhoto | $3-7 | No | Photographers, agents | 24-48h (human) |
| Phixer | $5-9 | No | Real estate agents | 24h (human) |
| API Service | $0.20-1.00 (B2B) | Yes (core product) | MLS platforms, agencies | Under 20 seconds |
The pattern is consistent: every market participant is a service business that charges per image and uses human labor. The API layer does not exist. At the platform price of $0.40 per image and compute cost of $0.06-0.10, gross margins exceed 75% before fixed overhead.
The tech stack to build it
The pipeline has four independent modules. Each can be run selectively depending on which corrections the input photo needs -- not every image needs all four. A classification step at the top of the pipeline determines which corrections to apply before any processing begins.
Module 1: HDR window blend. Detect windows using a segmentation model (SAM or a real estate-fine-tuned YOLO). If a window region is more than 2.5 stops overexposed relative to the interior mean, apply exposure fusion: synthesize an underexposed version of the window region using ControlNet depth conditioning and merge with a luminosity mask. The result preserves the interior exposure while making the exterior visible through the window.
Module 2: Sky replacement. For exterior shots, classify sky condition (clear blue, overcast white, problematic grey) and apply sky replacement only when the sky quality is below threshold. Use SAM for segmentation and Flux Fill or SDXL Inpainting for sky synthesis. The same module as used in a day-to-dusk pipeline, but running at a natural-light blue-sky style rather than twilight.
Module 3: Color grading. Apply a room-type-aware color grade: kitchens get a brightness lift and cooler white balance; living rooms get a warmer tone with shadow recovery; bathrooms get a clean neutral grade. The room type classifier from your virtual staging pipeline can be reused here. Color grading is the lowest-compute step -- no diffusion inference required, just a parametric color transform calibrated per room type.
Module 4: Flash artifact removal. Detect specular highlights (white regions that exceed the local luminosity mean by a threshold) and mirror reflections (bilateral symmetry in bathroom mirror regions). For specular highlights, use inpainting to replace with natural-looking material textures. For mirror reflections, detect the photographer silhouette and inpaint the mirror region. This is the most compute-intensive module -- run it only when artifact detection confidence is high.
What it takes to build: the infrastructure problem
Photo enhancement is a higher-frequency pipeline than virtual staging. An MLS platform might process 2,000 listing photos in a single morning when agents upload before a deadline. The bursty load profile makes GPU provisioning harder -- you need to autoscale from near-zero to high throughput in seconds, then scale back down. Static GPU reservation means paying for idle capacity most of the day.
| Path | Setup time | Infra cost/mo | Team cost/mo (est.) | True TCO/mo |
|---|---|---|---|---|
| Runflow managed API | 1-3 days | ~$2,500-3,500 | $0 (no GPU engineer) | ~$2,500-3,500 |
| Self-hosted ComfyUI (RunPod) | 3-6 weeks | ~$1,800-2,800 | ~$8,000-12,000 | ~$10,000-15,000 |
| Build from scratch (Replicate) | 6-10 weeks | ~$2,000-3,200 | ~$15,000-25,000 | ~$17,000-28,000 |
For bursty workloads, managed APIs with built-in autoscaling are significantly more cost-effective than static GPU reservation. Runflow scales to zero when idle and to multiple workers under load, with sub-2s cold starts. For a pipeline that might process 5,000 images in two hours then nothing for the rest of the day, usage-based billing is dramatically cheaper than reserving GPUs around the clock.








| Stack | Infra /mo | AI team | Total cost | Revenue | Margin |
|---|---|---|---|---|---|
Runflow 10% volume discount applied | $900 | $0 | $900 | $15K | 94% |
Cloud API + manual QA similar pricing · no auto-QA · part-time engineer needed | $1.0K | ~$5K | $6.0K | $15K | 60% |
Self-hosted GPU raw compute · full-time AI engineer required | $400 | $12K | $12K | $15K | 17% |
Runflow Sentinel — built-in quality control layer that automatically detects and discards failed or low-quality outputs before delivery. You only pay for images that pass QA. No engineer needed to babysit the pipeline.
Pricing based on Runflow published rates (June 2026) with automatic volume discounts. Revenue column is illustrative — actual client pricing varies by vertical and contract size. GPU self-hosted estimate uses $0.04/img raw compute cost.
Unit economics
Enhancement economics depend heavily on how many modules run per image. Running all four modules costs more than running just sky replacement. Design your pricing around a base correction tier (HDR + color grade) and a premium tier (all four modules). Base tier compute runs $0.05-0.08 per image. Premium tier with flash artifact removal runs $0.10-0.18.
| Volume | Runflow (usage-based) | Manual (PhotoUp equiv) |
|---|---|---|
| 5K images/mo | ~$350-600 | $15,000-40,000 |
| 25K images/mo | ~$1,500-2,800 | $75,000-200,000 |
| 100K images/mo | ~$5,500-10,000 | $300K-800K |
At a platform price of $0.40 per image (base tier) and $0.07 compute cost, you are running 82.5% gross margin. At 25,000 images per month, that is $8,250 per month gross margin from a single platform contract, before fixed overhead of $2,000-3,000.
Pricing and packaging for B2B
Enhancement services package better as tiered bundles than as per-module pricing. Platform buyers do not want to specify which modules to run per image -- they want to submit a photo and receive an enhanced version. Offer two tiers:
Standard enhancement: HDR blend + color grade + sky replacement where needed. Covers 85% of listing photo problems. $0.25-0.50 per image B2B. Under 15 seconds processing.
Premium enhancement: all four modules including flash artifact removal and perspective correction. $0.60-1.20 per image B2B. Under 30 seconds processing. Appropriate for high-value listings where photo quality directly impacts sale price.
Consider offering an "auto-select" mode where your pipeline classifies the input photo and applies whichever modules are needed, billed at the tier that includes all applied modules. This removes the decision burden from the platform buyer while ensuring every photo gets the right treatment.
The hardest technical problem: HDR window reconstruction
Window HDR blending is the most technically demanding module. The challenge is not segmentation -- windows are easy to detect -- it is synthesizing a believable exterior view in the window region when the original is completely blown out. You are not blending two exposures of the same scene; you are generating what should be visible through the window based on context clues: property location, listing description, other photos in the set.
The practical approach that passes production review: use depth-conditioned inpainting to generate exterior content consistent with the window frame geometry and size. Do not attempt to synthesize a specific view -- generate a plausible generic exterior (sky, trees, neighbouring buildings at appropriate depth) that is contextually consistent with the interior. Buyers evaluate whether the result looks real, not whether it matches the actual view. A believable generic exterior consistently passes review.
The edge case to handle explicitly: multiple windows in a single frame. Each window needs independent detection and processing. A living room with three windows of different sizes and exposures requires per-window classification and per-window HDR treatment. Batch processing all windows with a single mask produces visually inconsistent results.
Bundling with virtual staging and day-to-dusk
Photo enhancement, virtual staging, and day-to-dusk serve the same ICP through the same integration point (the listing editor photo upload flow). Selling all three as a bundled "listing media API" dramatically increases contract value without increasing sales effort. A platform that pays $0.40 per image for enhancement, $1.00 per image for day-to-dusk on selected exteriors, and $2.00 per image for virtual staging on empty interiors generates average revenue of $0.80-1.20 per listing photo depending on property type.
The bundle argument to platform buyers: one vendor, one integration, one support contract. Instead of managing separate relationships with a photo editor, a twilight conversion service, and a staging vendor, the platform buys a single API that handles all three. The operational simplification is worth paying a small premium over the sum of the individual service prices.
The simplest way to deploy this pipeline
Photo enhancement is a multi-module pipeline that creates more infrastructure complexity than a single-purpose workflow like day-to-dusk. Four independent processing modules, conditional execution logic, per-module quality checks, and bursty load patterns all add up. Building and maintaining this on self-hosted GPUs requires dedicated engineering effort that most teams do not want to commit before validating the business.
Runflow removes the infrastructure problem: upload your ComfyUI workflow, call the API, get the enhanced image back. The autoscaling and queue management handle the bursty load pattern without any configuration. Usage-based billing means you pay for what you process, not for what you provision.
What you get: custom ComfyUI workflows via REST API, sub-2s cold starts, built-in autoscaling for bursty workloads, per-image billing with no minimum commitment to start. The same stack that handles virtual staging and day-to-dusk pipelines runs photo enhancement -- different workflows, identical infrastructure.