Do I need to train my own AI model to build a virtual staging API?

No. You are orchestrating existing models, not building new ones. Flux inpainting and SDXL inpainting are available via Replicate, fal.ai, and Runflow. The differentiation is in the pipeline: room type detection, empty room validation, prompt engineering, and post-processing. The inference models are commodities - the workflow around them is where you build a defensible product.

What is the minimum viable product for approaching an MLS platform?

A working API endpoint that accepts a photo URL and returns a staged photo URL in under 60 seconds, with at least 3 room type styles, a sandbox environment, basic documentation, and a written data privacy policy. You do not need a web dashboard. You do not need a billing system yet. You need something a platform engineer can test in an afternoon and show to their product team.

How do I handle photos where the room is not empty?

You have three options: reject with a clear error message ("room must be empty or near-empty for staging"), add a furniture removal step before staging, or allow partial staging with lower quality expectations. For a B2B product, rejection with a clear message is the right default. A confused error result that a platform shows to their agent-customers will destroy trust faster than a clean rejection.

What are the data privacy requirements for real estate photo processing?

The core requirement that every serious platform buyer will ask about: a written policy stating you do not use customer photos for model training or fine-tuning, and that photos are deleted after processing (or within a defined retention window). Some buyers will also require SOC 2 Type II compliance or GDPR DPA agreements, especially for European platforms. Budget 6-12 months to get SOC 2 if you need it.

What latency SLA should I offer platforms?

P95 response time under 60 seconds for standard staging (one room type, one style). If you can get P95 under 30 seconds, that is a competitive differentiator. Platforms integrate staging into listing creation workflows where agents are actively waiting, so latency matters more than in batch processing scenarios. Uptime SLA of 99.5% is the minimum enterprise buyers will accept.

How is virtual staging different from AI image generation in general?

Virtual staging has specific requirements that make general image generation tools unsuitable out of the box: the original room structure (walls, windows, floors, ceiling) must remain pixel-accurate, only the furniture and decor are added. This requires inpainting with a precise mask, not text-to-image generation. The mask generation - accurately separating floor and wall space from existing objects - is a separate computer vision step that general-purpose tools do not do automatically.

What volume is needed to reach profitability?

With a B2B pricing model at $5-8 per image and inference costs of $0.25-0.40 per image, you need roughly 1,500-2,000 images per month to cover $5,000 in fixed infrastructure costs and break even. A single mid-size platform integration (one regional MLS) can generate 5,000-20,000 images per month. The economics work well once you have one paying platform - the challenge is getting that first deal.

Can I use the same pipeline for exterior staging, not just interior rooms?

Exterior staging (adding landscaping, sky replacement, curb appeal improvements) is a different pipeline. The room type classifier does not apply, and the inpainting masks are more complex due to organic shapes. It is a valid expansion but should be treated as a separate product line. Interior staging first - the market is larger and the technical problems are better defined.

Virtual Staging API: Build the Real Estate Service

Virtual staging is one of the clearest inefficiencies in real estate marketing. A photographer spends two hours shooting an empty apartment. The listing agent then pays a staging company $150-400 per photo to add furniture digitally. The result is a JPEG that takes 48-72 hours to arrive. No API. No integration with the MLS platform. No programmatic access of any kind.

The software exists to do this in seconds. The business gap is not technical: it is that every current vendor built a web dashboard for individual agents instead of an API for platforms.

NOTE

TL;DR: You do not need to hire an AI engineer to deploy this pipeline. Runflow handles the ComfyUI hosting, GPU management, and API layer. Upload your workflow, call the API, get the staged image back. No infrastructure to maintain.

Virtual Staging AI · Example Workflow Pipeline

✓ saved

$150-400

Typical cost per photo for manual virtual staging, paid by real estate agents per listing

Industry pricing, May 2026

The problem nobody has solved cleanly

Real estate agents currently pay $150-400 per photo for virtual staging. The workflow is: upload photos to a vendor portal, select a style, wait 24-72 hours, download the results. This is entirely manual, and that is not a coincidence - it is a business model choice. Virtual staging companies built B2C dashboards because the unit economics are simple: charge per photo, collect payment upfront, no need for engineering integrations.

The gap is at the platform level. MLS systems, real estate portals, and prop-tech SaaS products process tens of thousands of listings. If any of them could offer virtual staging as a native feature - click a button in the listing editor, receive staged photos in under a minute - they would. None of them can, because no current vendor exposes a real API. The existing solutions are:

Virtual Staging AI: web dashboard, no API, no white-label, B2C only
REimagineHome: web dashboard with basic API beta, limited rate, no SLA
Styldod: manual human staging workflow, no automation at all
Bella Virtual Staging: human designers, turnaround 24-48h, no API

The gap is clear: no vendor offers a production-grade, white-label API that a platform can integrate. This is the opportunity.

Who actually buys this

The individual agent is not the right customer. There are roughly 3 million licensed real estate agents in the US, mostly independent or in small brokerages. Selling to them means high acquisition cost, high churn, and an average contract value of $50-200 per month. The sales cycle is short but the economics are terrible at scale.

The real buyer is the platform layer above the agent: MLS systems, real estate portals (think Zillow-tier but regional or vertical), and prop-tech companies that sell software to brokerages. These buyers have different economics entirely:

Average contract value: $50,000–500,000 per year
Sales cycle: 3-9 months, but once closed, multi-year contracts
Churn: very low - switching costs are high once integrated into a workflow
Volume: 10,000–100,000 listings per month per MLS platform

This is a classic B2B2C model. The platform buys the API, embeds it in their product, and their agent-customers use it without knowing who powers it. The platform captures margin between what they charge agents and what they pay per API call. This model also means the platform handles all agent onboarding, support, and billing - your only relationship is with the platform.

3-9 months

Typical B2B sales cycle for prop-tech platform integrations, but contract values of $50k-500k/year make this viable

Prop-tech industry norms, 2026

What the market looks like today

The current competitive landscape shows a uniform gap: every significant player targets individual agents through a web interface. None offer a white-label API suitable for platform integration.

Virtual Staging Competitors - May 2026

Product	Price per Photo	API Available	Target Customer	Turnaround
Virtual Staging AI	$19-29	No	Individual agents (B2C)	Minutes (AI)
REimagineHome	$0.75-1.50	Beta (limited)	Agents, small teams	Minutes (AI)
Styldod	$16-24	No	Agents, agencies	24-48h (human)
Bella Virtual Staging	$25-35	No	Agencies, developers	24-48h (human)
BoxBrownie	$24-32	No	Agents, photographers	24h (human)
API Service	$3-12 (B2B)	Yes (core product)	MLS platforms, portals	Under 60 seconds

The REimagineHome beta API is worth monitoring. It exists, but it has no documented SLA, no white-label offering, and limited rate limits. It is not positioned as a B2B infrastructure product - it is a consumer feature they bolted on. That is very different from what platforms actually need.

The tech stack to build it

The core pipeline has four steps. None of them require custom model training - you are orchestrating existing capabilities, not doing ML research.

Step 1: Room type detection. Before generating anything, classify the input photo: living room, bedroom, kitchen, bathroom, dining room, home office. This step determines which prompt template to use. A bad staging result almost always traces back to wrong room classification - the model applied bedroom furniture to a living room because nobody checked first.

Step 2: Empty room confirmation. Check that the photo actually shows an empty or near-empty room. Photos with existing furniture need different handling - either rejection or a furniture removal step before staging. Accepting furnished rooms and returning garbage results will kill your B2B relationships.

Step 3: Inpainting generation. Use a diffusion inpainting model with a room-appropriate prompt. Flux inpainting and SDXL inpainting are the current quality leaders. The prompt structure matters: style (modern, Scandinavian, traditional), room type, specific furniture items, lighting description. Good prompts are 40-80 tokens, not two words.

Step 4: Post-processing. Perspective correction, edge blending between generated furniture and original floor/walls, color grading to match the original photo lighting. This step separates results that look real from results that look like AI. It is not optional for B2B use.

For the inference layer, you have three practical options to start: Runflow (managed ComfyUI workflows via API), Replicate (model hosting with pay-per-run), and fal.ai (fast inference with low cold starts). All three support Flux and SDXL inpainting. The choice depends on your volume and latency requirements.

What it takes to build: do you need a ComfyUI engineer?

This is the cost that most implementation articles skip. The per-image API cost is only part of the equation. The real question is who builds and maintains the pipeline. Depending on the path you choose, that answer ranges from zero internal effort to a six-figure engineering hire.

A ComfyUI workflow engineer is a specialist who builds, debugs, and maintains image generation pipelines in ComfyUI. They understand model behavior, node dependencies, custom node conflicts, and how to handle edge cases like unusual room angles or mixed lighting. On platforms like Toptal or Upwork, experienced ComfyUI engineers charge $80-150/hr. A production-ready virtual staging workflow typically takes 40-120 hours to build and stabilize, plus ongoing maintenance as models update.

Implementation paths: team requirements and real costs - May 2026

Path	ComfyUI expertise needed	Who handles errors	Internal team cost	Time to working demo
★ Best - Runflow managed API	None	Runflow team - workflow is pre-built and error-corrected before reaching your API	$0	1-3 days
Self-hosted ComfyUI	Yes - dedicated engineer or freelancer	Your team debugs model failures, node conflicts, OOM errors, bad outputs	$5,000-18,000 to build + $2,000-4,000/month to maintain	6-14 weeks
Replicate (fofr/any-comfyui-workflow)	Partial - you still build and test the workflow locally first	Replicate surfaces errors but you diagnose and fix them	$2,000-8,000 to build + dev time per model update	3-8 weeks
fal.ai custom deployment	Partial - Docker + ComfyUI config required	Your team, fal.ai support for infrastructure issues only	$3,000-10,000 to build	4-10 weeks

The managed API path removes the workflow engineering problem entirely. Runflow prepares the ComfyUI workflow, handles error detection, and validates outputs before they reach your integration. What you receive is a clean API call that returns a correctly staged image - no model debugging, no node version conflicts, no OOM failures to handle. For teams without an in-house AI engineer, this difference is the line between shipping in a week and spending a quarter on infrastructure.

For teams that do have ComfyUI expertise, self-hosting on RunPod or Modal gives you the most control over model selection and fine-tuning. The economics favor self-hosting at volumes above ~10,000 images/month, but the engineering overhead is real and ongoing - every Flux model update or custom node version bump requires testing and re-validation.

Virtual Staging AI · Example Workflow Pipeline

✓ saved

Unit economics

The economics of this business are excellent at scale. Input cost per image is low and drops with volume. The B2B price floor is determined by what platforms will pay, not by your cost - and platforms price staging to their agent-customers at $5-20 per photo.

True total monthly cost including realistic team overhead - May 2026

Volume/Month	★ Runflow (API only, no team)	Replicate API + engineer retainer	fal.ai API + engineer retainer	★ Runflow monthly total	Replicate monthly total	Monthly saving with Runflow
100 images	~$0.45/img (no team needed)	$0.50 API + $1,500-3,000 retainer	$0.40 API + $1,500-3,000 retainer	~$45 ★	$1,550-3,050	$1,505-3,005 saved
1,000 images	~$0.35/img (no team needed)	$400 API + $3,000-5,000 retainer	$300 API + $3,000-5,000 retainer	~$350 ★	$3,400-5,400	$3,050-5,050 saved
10,000 images	~$0.20/img (no team needed)	$2,500 API + $5,000-10,000 engineer	$1,800 API + $5,000-10,000 engineer	~$2,000 ★	$7,500-12,500	$5,500-10,500 saved
50,000 images	~$0.12/img (no team needed)	$7,500 API + $12,000-20,000 engineer	$5,000 API + $12,000-20,000 engineer	~$6,000 ★	$19,500-27,500	$13,500-21,500 saved

Runflow wins at every volume below 200,000 images/month - by $1,500 to $21,500 per month. Team overhead figures assume: 100-1k images: freelancer retainer with on-call availability ($1,500-3,000/month). 10k images: dedicated part-time or junior AI engineer ($5,000-10,000/month). 50k images: full-time AI engineer plus possible DevOps support ($12,000-20,000/month). These are not hypothetical - a ComfyUI workflow engineer on Toptal or Upwork costs $80-150/hr, and on-call availability for production incidents requires a minimum monthly retainer regardless of volume.

These margins are real, but they assume you are selling to platforms at B2B rates. If you try to compete at the B2C agent level ($1-3 per photo), the margins collapse and you are in a pricing war with players who have years of head start. Stay B2B.

The important cost line is not per-image inference - it is the fixed overhead: post-processing infrastructure, room classification model hosting, API gateway, monitoring, and the sales/support cost of enterprise contracts. Budget $5,000-15,000 per month in fixed costs before hitting profitability at meaningful scale.

94-97%

Gross margin per image at 1,000 images/month selling B2B at $6-10 per photo, against inference costs of $0.30-0.40

Cost estimates based on provider pricing, May 2026

Pricing and packaging for B2B

Prop-tech platforms have strong opinions about how they want to pay for API services, shaped by years of integrating third-party tools. The three common models are: per-image processed, per-active-agency per month, and revenue share. Each has different implications.

Per-image processed is the cleanest model and the one that works best for virtual staging specifically. The platform understands exactly what they are paying for, they can pass costs through directly to agents, and there is no ambiguity about what "active" means. The downside: revenue varies with listing volume, which is seasonal in real estate.

Per-active-agency per month (a seat-based model) provides predictable revenue, which platforms often prefer for their own budgeting. The problem is defining "active": an agency that staged zero photos this month still counts as a seat, which feels wrong to buyers. This model works better for general SaaS features than for a per-use capability like staging.

Revenue share is occasionally requested by large platforms that want to tie costs to their own success. Structurally it is fine, but it creates reporting complexity and opens disputes about what counts as a completed transaction. Avoid it unless the platform insists and the deal size justifies the overhead.

Recommendation: per-image with a monthly minimum. Example: $0.05 per image processed (at scale), minimum $2,000/month. The minimum protects you from integrations that go live but never reach meaningful volume. The per-image model is transparent, easy to audit, and aligns incentives. Add a setup fee ($5,000-15,000) to cover integration support and qualify serious buyers from tire-kickers.

The hardest technical problem: room type detection

Room type detection is the step that determines result quality, and it is harder than it looks. A photo of an empty living room can look very similar to an empty dining room, especially in open-plan apartments. A narrow bedroom can look like a hotel room or a home office. Get the classification wrong and the generated staging will use the wrong furniture set - a common failure mode that produces obviously wrong results.

The approach that works: a two-stage classification pipeline. First, a fast visual classifier (ViT-base or CLIP zero-shot) scores the image across room type categories. Second, if the top score is below a confidence threshold (typically 0.75-0.85), route to a second-pass model or flag for human review. Do not try to stage low-confidence inputs - the failure rate is too high.

The confidence threshold is a business decision, not just a technical one. A threshold of 0.90 means fewer errors but more rejections, which means more agent friction. A threshold of 0.70 means higher throughput but more bad results. For B2B platform customers, start conservative (0.85) and loosen it as you validate quality metrics.

The other classification challenge is distinguishing between truly empty rooms and rooms with minimal furniture. An empty room gets inpainting from scratch. A room with a single chair in the corner needs furniture removal first, then staging - or rejection with a "room must be empty" error. Building a binary "is this room empty enough" classifier is a separate step that most competitors skip, and the results show.

Practical tooling: Replicate hosts ViT and CLIP models you can call via API without managing infrastructure. For production volume, fine-tune a ViT-base model on a labeled dataset of real estate photos - 500-1,000 labeled examples per room type is enough to reach 92-95% accuracy on the long tail of unusual spaces.

Go-to-market: why you cannot sell to agents

The temptation is to launch a web dashboard and let individual agents sign up. It is faster, it proves the product works, and it generates early revenue. The problem is that it pulls you into a market you do not want to be in.

Real estate agents are fragmented buyers. There are 3 million of them in the US, mostly working independently or in small offices. Each sale is a small contract, marketing to them is expensive, and support overhead per customer is high. The B2C virtual staging market is already competitive and commoditizing. Every dollar you spend acquiring agents is a dollar not spent on platform sales.

Platforms buy differently. The decision makers are CTOs and CPOs, not agents. They care about three things: reliability (SLA for response time and uptime), data privacy (no using their photos to retrain your models), and integration simplicity (clean REST API, good documentation, sandbox environment). They do not need a pretty web dashboard - they need a well-documented API with a 99.5% uptime SLA and a support channel.

What you need before approaching a platform buyer:

Demo: processes their own sample photos, not a gallery of hand-picked results
Documented SLA: response time P95 under 60 seconds, uptime 99.5% or better
Data policy: you do not use customer photos for model training
Sandbox API key: they can test without signing a contract
References or case studies, even if early-stage

The right sales channels for prop-tech platforms are different from B2C. RESO (Real Estate Standards Organization) and RETS ecosystem events are where MLS technology buyers gather. PropTech conferences (Inman Connect, NAR NXT) have a platform/enterprise track separate from the agent-facing content. Real estate photographers are an underrated channel: they already sell post-processing services to agents and brokerages, and a partnership gives you warm introductions to platform buyers.

The sales cycle is 3-9 months. Budget for it. The first platform deal is the hardest because you have no references. Consider a pilot structure: 3 months at reduced rate (or free), with defined success metrics, in exchange for a case study and reference call. A single successful platform pilot is worth more than 500 individual agent subscribers for closing the next enterprise deal.

The Simplest Way to Deploy This Pipeline

Most teams building on this pattern hit the same three walls: GPU provisioning, cold start latency, and maintaining a ComfyUI instance that breaks whenever a model update ships. The path of least resistance is a managed ComfyUI API provider rather than self-hosting.

Runflow is built specifically for this use case: upload your workflow, call the REST API with your source image, get the staged result back. No AI engineer required. No GPU fleet to manage. No cold start tuning. The free tier covers prototyping; production pricing scales with actual volume.

What you get: custom ComfyUI workflows via REST API, sub-2s cold starts, built-in queue management, and pricing per image rather than per GPU hour. Try Runflow free - no credit card required.