Virtual staging is one of the clearest inefficiencies in real estate marketing. A photographer spends two hours shooting an empty apartment. The listing agent then pays a staging company $150-400 per photo to add furniture digitally. The result is a JPEG that takes 48-72 hours to arrive. No API. No integration with the MLS platform. No programmatic access of any kind.
The software exists to do this in seconds. The business gap is not technical: it is that every current vendor built a web dashboard for individual agents instead of an API for platforms.








The problem nobody has solved cleanly
Real estate agents currently pay $150-400 per photo for virtual staging. The workflow is: upload photos to a vendor portal, select a style, wait 24-72 hours, download the results. This is entirely manual, and that is not a coincidence - it is a business model choice. Virtual staging companies built B2C dashboards because the unit economics are simple: charge per photo, collect payment upfront, no need for engineering integrations.
The gap is at the platform level. MLS systems, real estate portals, and prop-tech SaaS products process tens of thousands of listings. If any of them could offer virtual staging as a native feature - click a button in the listing editor, receive staged photos in under a minute - they would. None of them can, because no current vendor exposes a real API. The existing solutions are:
- Virtual Staging AI: web dashboard, no API, no white-label, B2C only
- REimagineHome: web dashboard with basic API beta, limited rate, no SLA
- Styldod: manual human staging workflow, no automation at all
- Bella Virtual Staging: human designers, turnaround 24-48h, no API
The gap is clear: no vendor offers a production-grade, white-label API that a platform can integrate. This is the opportunity.
Who actually buys this
The individual agent is not the right customer. There are roughly 3 million licensed real estate agents in the US, mostly independent or in small brokerages. Selling to them means high acquisition cost, high churn, and an average contract value of $50-200 per month. The sales cycle is short but the economics are terrible at scale.
The real buyer is the platform layer above the agent: MLS systems, real estate portals (think Zillow-tier but regional or vertical), and prop-tech companies that sell software to brokerages. These buyers have different economics entirely:
- Average contract value: $50,000–500,000 per year
- Sales cycle: 3-9 months, but once closed, multi-year contracts
- Churn: very low - switching costs are high once integrated into a workflow
- Volume: 10,000–100,000 listings per month per MLS platform
This is a classic B2B2C model. The platform buys the API, embeds it in their product, and their agent-customers use it without knowing who powers it. The platform captures margin between what they charge agents and what they pay per API call. This model also means the platform handles all agent onboarding, support, and billing - your only relationship is with the platform.
What the market looks like today
The current competitive landscape shows a uniform gap: every significant player targets individual agents through a web interface. None offer a white-label API suitable for platform integration.
| Product | Price per Photo | API Available | Target Customer | Turnaround |
|---|---|---|---|---|
| Virtual Staging AI | $19-29 | No | Individual agents (B2C) | Minutes (AI) |
| REimagineHome | $0.75-1.50 | Beta (limited) | Agents, small teams | Minutes (AI) |
| Styldod | $16-24 | No | Agents, agencies | 24-48h (human) |
| Bella Virtual Staging | $25-35 | No | Agencies, developers | 24-48h (human) |
| BoxBrownie | $24-32 | No | Agents, photographers | 24h (human) |
| API Service | $3-12 (B2B) | Yes (core product) | MLS platforms, portals | Under 60 seconds |
The REimagineHome beta API is worth monitoring. It exists, but it has no documented SLA, no white-label offering, and limited rate limits. It is not positioned as a B2B infrastructure product - it is a consumer feature they bolted on. That is very different from what platforms actually need.
The tech stack to build it
The core pipeline has four steps. None of them require custom model training - you are orchestrating existing capabilities, not doing ML research.
Step 1: Room type detection. Before generating anything, classify the input photo: living room, bedroom, kitchen, bathroom, dining room, home office. This step determines which prompt template to use. A bad staging result almost always traces back to wrong room classification - the model applied bedroom furniture to a living room because nobody checked first.
Step 2: Empty room confirmation. Check that the photo actually shows an empty or near-empty room. Photos with existing furniture need different handling - either rejection or a furniture removal step before staging. Accepting furnished rooms and returning garbage results will kill your B2B relationships.
Step 3: Inpainting generation. Use a diffusion inpainting model with a room-appropriate prompt. Flux inpainting and SDXL inpainting are the current quality leaders. The prompt structure matters: style (modern, Scandinavian, traditional), room type, specific furniture items, lighting description. Good prompts are 40-80 tokens, not two words.
Step 4: Post-processing. Perspective correction, edge blending between generated furniture and original floor/walls, color grading to match the original photo lighting. This step separates results that look real from results that look like AI. It is not optional for B2B use.
For the inference layer, you have three practical options to start: Runflow (managed ComfyUI workflows via API), Replicate (model hosting with pay-per-run), and fal.ai (fast inference with low cold starts). All three support Flux and SDXL inpainting. The choice depends on your volume and latency requirements.
What it takes to build: do you need a ComfyUI engineer?
This is the cost that most implementation articles skip. The per-image API cost is only part of the equation. The real question is who builds and maintains the pipeline. Depending on the path you choose, that answer ranges from zero internal effort to a six-figure engineering hire.
A ComfyUI workflow engineer is a specialist who builds, debugs, and maintains image generation pipelines in ComfyUI. They understand model behavior, node dependencies, custom node conflicts, and how to handle edge cases like unusual room angles or mixed lighting. On platforms like Toptal or Upwork, experienced ComfyUI engineers charge $80-150/hr. A production-ready virtual staging workflow typically takes 40-120 hours to build and stabilize, plus ongoing maintenance as models update.
| Path | ComfyUI expertise needed | Who handles errors | Internal team cost | Time to working demo |
|---|---|---|---|---|
| ★ Best - Runflow managed API | None | Runflow team - workflow is pre-built and error-corrected before reaching your API | $0 | 1-3 days |
| Self-hosted ComfyUI | Yes - dedicated engineer or freelancer | Your team debugs model failures, node conflicts, OOM errors, bad outputs | $5,000-18,000 to build + $2,000-4,000/month to maintain | 6-14 weeks |
| Replicate (fofr/any-comfyui-workflow) | Partial - you still build and test the workflow locally first | Replicate surfaces errors but you diagnose and fix them | $2,000-8,000 to build + dev time per model update | 3-8 weeks |
| fal.ai custom deployment | Partial - Docker + ComfyUI config required | Your team, fal.ai support for infrastructure issues only | $3,000-10,000 to build | 4-10 weeks |
The managed API path removes the workflow engineering problem entirely. Runflow prepares the ComfyUI workflow, handles error detection, and validates outputs before they reach your integration. What you receive is a clean API call that returns a correctly staged image - no model debugging, no node version conflicts, no OOM failures to handle. For teams without an in-house AI engineer, this difference is the line between shipping in a week and spending a quarter on infrastructure.
For teams that do have ComfyUI expertise, self-hosting on RunPod or Modal gives you the most control over model selection and fine-tuning. The economics favor self-hosting at volumes above ~10,000 images/month, but the engineering overhead is real and ongoing - every Flux model update or custom node version bump requires testing and re-validation.








Unit economics
The economics of this business are excellent at scale. Input cost per image is low and drops with volume. The B2B price floor is determined by what platforms will pay, not by your cost - and platforms price staging to their agent-customers at $5-20 per photo.
| Volume/Month | ★ Runflow (API only, no team) | Replicate API + engineer retainer | fal.ai API + engineer retainer | ★ Runflow monthly total | Replicate monthly total | Monthly saving with Runflow |
|---|---|---|---|---|---|---|
| 100 images | ~$0.45/img (no team needed) | $0.50 API + $1,500-3,000 retainer | $0.40 API + $1,500-3,000 retainer | ~$45 ★ | $1,550-3,050 | $1,505-3,005 saved |
| 1,000 images | ~$0.35/img (no team needed) | $400 API + $3,000-5,000 retainer | $300 API + $3,000-5,000 retainer | ~$350 ★ | $3,400-5,400 | $3,050-5,050 saved |
| 10,000 images | ~$0.20/img (no team needed) | $2,500 API + $5,000-10,000 engineer | $1,800 API + $5,000-10,000 engineer | ~$2,000 ★ | $7,500-12,500 | $5,500-10,500 saved |
| 50,000 images | ~$0.12/img (no team needed) | $7,500 API + $12,000-20,000 engineer | $5,000 API + $12,000-20,000 engineer | ~$6,000 ★ | $19,500-27,500 | $13,500-21,500 saved |
Runflow wins at every volume below 200,000 images/month - by $1,500 to $21,500 per month. Team overhead figures assume: 100-1k images: freelancer retainer with on-call availability ($1,500-3,000/month). 10k images: dedicated part-time or junior AI engineer ($5,000-10,000/month). 50k images: full-time AI engineer plus possible DevOps support ($12,000-20,000/month). These are not hypothetical - a ComfyUI workflow engineer on Toptal or Upwork costs $80-150/hr, and on-call availability for production incidents requires a minimum monthly retainer regardless of volume.
These margins are real, but they assume you are selling to platforms at B2B rates. If you try to compete at the B2C agent level ($1-3 per photo), the margins collapse and you are in a pricing war with players who have years of head start. Stay B2B.
The important cost line is not per-image inference - it is the fixed overhead: post-processing infrastructure, room classification model hosting, API gateway, monitoring, and the sales/support cost of enterprise contracts. Budget $5,000-15,000 per month in fixed costs before hitting profitability at meaningful scale.
Pricing and packaging for B2B
Prop-tech platforms have strong opinions about how they want to pay for API services, shaped by years of integrating third-party tools. The three common models are: per-image processed, per-active-agency per month, and revenue share. Each has different implications.
Per-image processed is the cleanest model and the one that works best for virtual staging specifically. The platform understands exactly what they are paying for, they can pass costs through directly to agents, and there is no ambiguity about what "active" means. The downside: revenue varies with listing volume, which is seasonal in real estate.
Per-active-agency per month (a seat-based model) provides predictable revenue, which platforms often prefer for their own budgeting. The problem is defining "active": an agency that staged zero photos this month still counts as a seat, which feels wrong to buyers. This model works better for general SaaS features than for a per-use capability like staging.
Revenue share is occasionally requested by large platforms that want to tie costs to their own success. Structurally it is fine, but it creates reporting complexity and opens disputes about what counts as a completed transaction. Avoid it unless the platform insists and the deal size justifies the overhead.
Recommendation: per-image with a monthly minimum. Example: $0.05 per image processed (at scale), minimum $2,000/month. The minimum protects you from integrations that go live but never reach meaningful volume. The per-image model is transparent, easy to audit, and aligns incentives. Add a setup fee ($5,000-15,000) to cover integration support and qualify serious buyers from tire-kickers.
The hardest technical problem: room type detection
Room type detection is the step that determines result quality, and it is harder than it looks. A photo of an empty living room can look very similar to an empty dining room, especially in open-plan apartments. A narrow bedroom can look like a hotel room or a home office. Get the classification wrong and the generated staging will use the wrong furniture set - a common failure mode that produces obviously wrong results.
The approach that works: a two-stage classification pipeline. First, a fast visual classifier (ViT-base or CLIP zero-shot) scores the image across room type categories. Second, if the top score is below a confidence threshold (typically 0.75-0.85), route to a second-pass model or flag for human review. Do not try to stage low-confidence inputs - the failure rate is too high.
The confidence threshold is a business decision, not just a technical one. A threshold of 0.90 means fewer errors but more rejections, which means more agent friction. A threshold of 0.70 means higher throughput but more bad results. For B2B platform customers, start conservative (0.85) and loosen it as you validate quality metrics.
The other classification challenge is distinguishing between truly empty rooms and rooms with minimal furniture. An empty room gets inpainting from scratch. A room with a single chair in the corner needs furniture removal first, then staging - or rejection with a "room must be empty" error. Building a binary "is this room empty enough" classifier is a separate step that most competitors skip, and the results show.
Practical tooling: Replicate hosts ViT and CLIP models you can call via API without managing infrastructure. For production volume, fine-tune a ViT-base model on a labeled dataset of real estate photos - 500-1,000 labeled examples per room type is enough to reach 92-95% accuracy on the long tail of unusual spaces.
Go-to-market: why you cannot sell to agents
The temptation is to launch a web dashboard and let individual agents sign up. It is faster, it proves the product works, and it generates early revenue. The problem is that it pulls you into a market you do not want to be in.
Real estate agents are fragmented buyers. There are 3 million of them in the US, mostly working independently or in small offices. Each sale is a small contract, marketing to them is expensive, and support overhead per customer is high. The B2C virtual staging market is already competitive and commoditizing. Every dollar you spend acquiring agents is a dollar not spent on platform sales.
Platforms buy differently. The decision makers are CTOs and CPOs, not agents. They care about three things: reliability (SLA for response time and uptime), data privacy (no using their photos to retrain your models), and integration simplicity (clean REST API, good documentation, sandbox environment). They do not need a pretty web dashboard - they need a well-documented API with a 99.5% uptime SLA and a support channel.
What you need before approaching a platform buyer:
- Demo: processes their own sample photos, not a gallery of hand-picked results
- Documented SLA: response time P95 under 60 seconds, uptime 99.5% or better
- Data policy: you do not use customer photos for model training
- Sandbox API key: they can test without signing a contract
- References or case studies, even if early-stage
The right sales channels for prop-tech platforms are different from B2C. RESO (Real Estate Standards Organization) and RETS ecosystem events are where MLS technology buyers gather. PropTech conferences (Inman Connect, NAR NXT) have a platform/enterprise track separate from the agent-facing content. Real estate photographers are an underrated channel: they already sell post-processing services to agents and brokerages, and a partnership gives you warm introductions to platform buyers.
The sales cycle is 3-9 months. Budget for it. The first platform deal is the hardest because you have no references. Consider a pilot structure: 3 months at reduced rate (or free), with defined success metrics, in exchange for a case study and reference call. A single successful platform pilot is worth more than 500 individual agent subscribers for closing the next enterprise deal.
The Simplest Way to Deploy This Pipeline
Most teams building on this pattern hit the same three walls: GPU provisioning, cold start latency, and maintaining a ComfyUI instance that breaks whenever a model update ships. The path of least resistance is a managed ComfyUI API provider rather than self-hosting.
Runflow is built specifically for this use case: upload your workflow, call the REST API with your source image, get the staged result back. No AI engineer required. No GPU fleet to manage. No cold start tuning. The free tier covers prototyping; production pricing scales with actual volume.
What you get: custom ComfyUI workflows via REST API, sub-2s cold starts, built-in queue management, and pricing per image rather than per GPU hour. Try Runflow free - no credit card required.