What makes jewelry try-on technically harder than clothing try-on?

Clothing try-on requires fitting a 2D textile pattern to a 3D body shape - computationally complex but physically well-understood. Jewelry try-on requires simulating the optical physics of metal and gemstone surfaces: reflection, refraction, and light transmission. A gold ring reflects the skin beneath it, the ambient environment, and any light sources in the scene. A diamond refracts and transmits light based on its cut geometry. Generic diffusion models learn statistical image patterns and cannot reliably simulate these physical interactions. A physically-based rendering component is required for output quality that reaches the product-page bar.

How does the pipeline handle different metal types?

Each metal type has different reflectance properties described by its BRDF (bidirectional reflectance distribution function). Yellow gold has warm specular highlights, white gold and platinum have cooler neutral reflections, rose gold has a pinkish cast that interacts differently with different skin tones. The reflection map generator in Stage 3 must be parameterized separately for each metal type. In practice, this means training or calibrating the reflection model on reference photography for each metal category. Supporting five metal types (yellow gold, white gold, rose gold, silver, platinum) covers the majority of fine jewelry sold online.

Can the API handle gemstone rendering, or just plain metal?

Plain metal pieces (bands, chain necklaces, hoops) are the simplest starting point and should be the first-version scope. Faceted gemstones (diamonds, sapphires, rubies, emeralds) require a refraction model that simulates light transmission through the crystal structure - this is substantially more complex than metal rendering. A first version that handles plain metal and simple bezel-set stones covers the majority of minimalist and fine jewelry DTC brands. Add complex faceted stone rendering as a second-version capability once the metal rendering pipeline is validated with brand customers.

What photo requirements should customers follow for best results?

For ring and bracelet try-on: a clear, well-lit photo of the hand or wrist against a clean background, with the relevant body part facing the camera. Natural daylight or soft studio lighting produces the best lighting estimation. For necklace try-on: a photo showing the neck and upper chest with good lighting on the skin. For earring try-on: a profile or three-quarter view of the face showing the ear clearly. All photos should be taken without other jewelry in frame to avoid confusion in the landmark detection step. These requirements should be enforced with a photo quality scorer that provides clear guidance when a submitted photo will not produce a quality result.

How should the API be priced for jewelry DTC brands?

Per-render pricing at $0.15-0.40 per successful render is the standard model for jewelry try-on. The higher price point versus other image API use cases is justified by the physics rendering complexity and the high conversion value - a single incremental sale on a $300 ring covers more than 1,000 renders. Volume tiers are standard: lower per-render rates above 5,000 and 20,000 renders per month. For platform-level integrations (marketplace or jeweler software vendor), a volume contract with monthly minimums and a significantly lower per-render rate is more appropriate than the per-render retail price.

What is the latency requirement for a checkout-integrated try-on experience?

Under 10 seconds for the initial render is the threshold for acceptable checkout integration. Above 10 seconds, customer drop-off increases significantly during the preview wait. For a browsing or product-page try-on (not inside the checkout flow), up to 15 seconds is acceptable. For a real-time interactive try-on where the customer can move the jewelry around the frame, under 3 seconds per render is required - this is not achievable with a diffusion-based pipeline and requires a real-time rendering approach that is a different technical problem.

How does the pipeline handle size accuracy for rings and bracelets?

Ring and bracelet sizing requires estimating the physical dimensions of the finger or wrist from the photo. A monocular photo provides no direct depth information, so the pipeline must estimate relative scale from finger proportions and known anatomical averages. This estimation is accurate enough for a visual representation but is not a sizing tool - it cannot replace a ring sizer or wrist measurement. The API should clearly communicate that the output is a visual representation of how the piece will look, not a precise size simulation. Brands that need size accuracy should collect ring size separately in the checkout flow.

What is the fastest path to a first brand integration?

Build the metal-only pipeline first (no gemstones), test it on plain gold and silver pieces, and approach minimalist jewelry DTC brands as your first customers. Minimalist jewelry (thin bands, simple chains, small hoops) is the easiest product category to render correctly and the fastest-growing segment of DTC jewelry. A working demo on 10-20 pieces from a target brand's catalog is more persuasive than any specification document. Offer a 30-day pilot at no cost in exchange for quality feedback and a case study - the feedback improves the model and the case study opens the next brand conversation.

Jewelry Virtual Try-On: The Reflection Physics Gap Nobody Has Solved

Virtual try-on works well for clothing. It works passably for eyewear. For jewelry it consistently fails, and the reason is physics. Metal surfaces reflect their environment. Gemstones refract and transmit light in ways that depend on the stone's cut, the light source angle, and the background behind the stone. Every generic virtual try-on model treats jewelry like a flat image overlay - which means the gold ring on the rendered finger has no environmental reflections, the diamond has no sparkle, and the result looks like a sticker placed on a photo. The quality bar for a jewelry DTC brand is not a sticker. It is an image that could appear in a campaign shoot.

No dedicated B2B API solves this today. The consumer AR try-on apps (Snapchat Lens Studio, some jeweler plugins) approximate it well enough for a mobile filter but not for a product page or a conversion-optimized shopping experience. The jewelry brands and the e-commerce platforms that serve them have a specific need: a customer uploads a photo of their hand, wrist, neck, or ear, selects a piece, and receives a photorealistic composite that accounts for their skin tone, lighting, and the physical properties of the metal and stone. That pipeline requires a reflection physics model, not a diffusion overlay.

NOTE

TL;DR: The pipeline runs on ComfyUI with a body landmark node, a physically-based reflection map, and a compositing step that places the jewelry with correct light interaction. Runflow handles the API layer so you ship the jeweler integration, not the GPU infrastructure.

Jewelry Try-On AI · Example Workflow Pipeline

✓ saved

$70B+

Global online jewelry market size - the addressable market for a try-on API that converts browsers into buyers

Statista global jewelry market data, May 2026

Why generic virtual try-on fails for jewelry

The failure mode is visible in every current implementation. A gold necklace rendered on a model's neck in a virtual try-on looks like a flat PNG with a golden color fill. Real gold necklaces catch light, reflect the skin and clothing beneath them, and shift in appearance as the ambient light changes. A solitaire diamond ring placed on a hand with a generic overlay has none of the light transmission that makes a diamond identifiable as a diamond. The result does not help a customer decide whether to buy. It often hurts conversion by making the product look cheaper than it is.

The technical gap is the absence of a physically-based rendering step. Standard diffusion models are trained on photographs and learn statistical correlations between visual elements. They do not model the physics of light interaction with metal surfaces or gemstone crystal structures. Generating a photorealistic jewelry try-on requires either a physics-based renderer integrated into the pipeline, or a diffusion model fine-tuned specifically on high-quality jewelry photography with enough coverage of metal types, stone cuts, and lighting conditions to approximate the physics correctly.

This is the gap. No dedicated API solves it. The brands and platforms that need it either build custom internal solutions (expensive, slow) or use the generic AR overlays that do not meet their quality bar. The window to build the first production-quality jewelry try-on API with physics-correct rendering is open.

Conversion rate lift from virtual try-on reported by jewelry brands that implemented custom AR solutions - when the rendering quality is photorealistic

Jewelry e-commerce case studies, May 2026

The technical pipeline: reflection maps and physics-based compositing

The jewelry try-on pipeline has four stages. The first two are shared with other try-on applications. The third and fourth are specific to jewelry and are where the quality differentiation happens.

Stage 1 - Body landmark detection. A pose estimation model identifies the relevant anatomical landmarks for the jewelry type: wrist position for bracelets, finger joints for rings, neck and collarbone geometry for necklaces, earlobe position and orientation for earrings. The landmark detection must account for the photo angle and produce a 3D coordinate estimate, not just a 2D position.

Stage 2 - Skin tone and lighting extraction. A skin analysis node extracts the dominant skin tone from the body area and estimates the ambient lighting direction and intensity from the photo. These parameters feed the reflection map computation in Stage 3. Inaccurate lighting estimation is the most common source of quality failure in jewelry try-on.

Stage 3 - Physically-based reflection map generation. A reflection map is computed for the specific metal type (yellow gold, white gold, rose gold, silver, platinum) based on the extracted lighting parameters. For gemstones, a refraction and transmission model computes the light behavior based on the stone cut and clarity grade. This stage is the technical differentiator - it is what makes the rendered jewelry look like real jewelry in the photo context rather than a generic overlay.

Stage 4 - Compositing with shadow and depth integration. The jewelry asset is composited onto the body landmark with correct scaling, perspective foreshortening, and a shadow/contact shadow layer that grounds it physically in the scene. The final output includes a soft shadow under the piece and any skin occlusion from rings or bracelets that are partially hidden by the finger or wrist.

Total pipeline latency on a dedicated A100: 4-8 seconds per render depending on stone complexity. Simple metal pieces (plain bands, chain necklaces) process faster than multi-stone pieces with complex refraction. Both are fast enough for a synchronous checkout preview.

Unit economics: the jewelry e-commerce model

The pricing model for a jewelry try-on API is a per-render fee charged to the platform or brand, with the option to charge a session fee for multi-angle exploration. At $0.15-0.40 per render (your API price) and $200-2,000+ per product (typical jewelry price point), the API cost is a rounding error relative to the conversion value. A single incremental sale on a $500 ring more than covers hundreds of renders.

Full cost comparison - managed API vs self-hosted:

TCO: Managed API vs Self-Hosted GPU for Jewelry Try-On Pipeline - May 2026

Cost component	Runflow (managed)	fal.ai (managed)	Self-hosted (RunPod A100)
Inference per render	~$0.08	~$0.08-0.12	~$0.05 (hardware only)
Cold start latency	None (warm)	2-8s	60-120s
Engineer overhead	$0/mo	$0/mo	$8,000-12,000/mo
Monthly cost at 20K renders	~$1,600	~$1,800	~$9,500 (infra + 0.5 engineer)
Min. volume to break even	Any volume	Any volume	~400,000 renders/mo

Jewelry try-on volume per brand is lower than apparel try-on - a jewelry DTC brand with 500 SKUs and moderate traffic generates 5,000-20,000 renders per month. At that volume, the managed API wins on total cost by a factor of 5-6x versus self-hosted once engineer overhead is included. The break-even for self-hosting only makes sense for a platform aggregating try-on across many brands at high total volume.

~$0.08

Cost per jewelry try-on render on a managed A100 API - versus $0 apparent cost for a static product photo that converts at 1/3 the rate

Runflow inference pricing, May 2026

The ICP: who pays and how distribution works

There are three distinct buyer types for a jewelry try-on API, each with different integration depth and commercial structure.

The first is the jewelry DTC brand. Brands selling direct at price points above $100 per piece have a strong conversion incentive. A try-on feature that is demonstrably better than existing AR overlays is a marketing differentiator and a conversion tool. Direct integration into their Shopify or custom e-commerce stack - upload photo, select piece, receive composite - is the standard flow. Charge per render or a monthly volume subscription.

The second is the jewelry e-commerce platform. Etsy, Not On The High Street, and specialty jewelry marketplaces serve thousands of independent jewelers simultaneously. An API integration at the platform level means one commercial agreement that covers the entire seller base. Platform integrations take longer to close but generate higher volume and more predictable revenue than individual brand deals.

The third is the jeweler software provider. POS and inventory software for physical jewelers (like Jewel360, Podium for jewelers, or custom ERP systems) often has a customer-facing consultation component. A try-on API embedded in the consultation flow - a customer sits with a sales associate, tries on pieces virtually - adds a product capability that independent jewelers cannot build themselves.

What this is not: the AR filter trap

Snapchat Lens Studio, Instagram AR effects, and mobile AR try-on are in the market and work reasonably well for consumer discovery. Building a competitive consumer AR filter is not the opportunity here. The market is not trying to beat Snapchat at consumer AR - it is trying to build the back-end API that powers production-quality try-on for brands that need better output than a mobile filter provides.

The B2B API route targets a different quality level and a different buyer. A brand using your API wants output they can put on their product page or in a campaign - not output that looks like a social media filter. That quality requirement is the moat. It requires the physics-based rendering step that no consumer AR tool invests in, and it is what justifies the $0.15-0.40 per render price point versus a free filter.

How to build it: the 30-day path to a working API

Week 1: Source or build the physically-based reflection model. The core technical component is the reflection map generator for metal surfaces. Research physically-based rendering (PBR) approaches and identify whether a fine-tuned diffusion model or a hybrid PBR-plus-diffusion approach better fits your quality target. Test against reference jewelry photography to establish a quality baseline before building the full pipeline.

Week 2: Build the body landmark detection and skin analysis nodes. Test accuracy across a diverse range of skin tones, hand orientations, and photo qualities. The landmark detection for ring placement must handle partially obscured fingers, nail polish, and different ring positions on the finger. Build the lighting estimation node and validate it against photos taken in different lighting conditions.

Week 3: Build the full compositing pipeline and quality scoring. Wire together the four stages. Build a quality scoring node that checks: reflection plausibility (does the metal look like metal), scale accuracy (is the jewelry the right size for the body landmark), and edge integration (are the contact points between jewelry and skin physically plausible). Define the rejection threshold by sampling 200 renders and identifying failures.

Week 4: First brand demo. Approach 3-5 jewelry DTC brands with a working demo using their actual product catalog. Render 10 pieces on different customer photo types. If 8 of 10 pass the brand's quality bar, the commercial conversation follows. Jewelry brands are highly quality-sensitive and will not integrate a tool that makes their products look worse than the product photography.

The technical constraints to know before you start

Three constraints that will slow you down if you do not account for them upfront:

Metal type coverage. Yellow gold, white gold, rose gold, silver, and platinum all have different reflectance properties. The reflection map model must be trained or parameterized separately for each metal type. Rose gold on warm skin tones requires different color balance than white gold. Building a single generic metal model will produce obviously wrong results for at least some metal types.

Gemstone complexity. Faceted stones (diamonds, sapphires, emeralds) require a refraction model to look photorealistic. Cabochon stones (opals, moonstones, turquoise) require a surface scattering model. Plain metal pieces without stones are the simplest starting point. Scope the stone types you support explicitly and build the refraction model for diamonds first - they are the highest-value category and the most demanding to render correctly.

Customer photo quality. Jewelry try-on requires the body landmark to be clearly visible, well-lit, and photographed against a relatively clean background. Heavily tattooed hands, elaborate nail art, very dark or very light skin in poor lighting, and cluttered backgrounds all degrade landmark detection and lighting estimation accuracy. Define the input quality requirements, build a photo quality scorer, and reject poor inputs with a clear error before spending GPU time on a render that will fail.

What the competitive landscape looks like today

As of May 2026, no company offers a dedicated REST API for physics-correct jewelry virtual try-on marketed to DTC brands or e-commerce platforms. Snapchat's Camera Kit and Meta's Spark AR have jewelry try-on templates but are consumer AR tools, not B2B APIs, and do not produce the render quality required for product pages. Several Shopify app vendors offer basic AR jewelry overlays, but all use flat-image compositing without reflection physics. The result quality is uniformly below what jewelry brands would use for conversion-optimized product pages.

The window is open but requires technical depth. The reflection physics component is the real barrier to entry - it requires either PBR expertise or substantial fine-tuning of a diffusion model on high-quality jewelry photography. That barrier is also the moat. A competitor that builds a flat-overlay jewelry try-on API is not competing with a physics-based one. Quality is the differentiator and the reason a brand will pay $0.30 per render instead of using a free filter.

Where to start

For most builders, Runflow is the right starting point. The platform runs full custom ComfyUI workflows natively, which means the multi-stage pipeline - landmark detection, reflection map, compositing, quality scoring - deploys without rewriting nodes for a proprietary system. Inference cost is approximately $0.08 per render on a dedicated A100, which makes the unit economics viable at jewelry DTC volumes.

Cold start latency matters here. Jewelry try-on is a synchronous checkout action - a customer waiting 90 seconds for a preview during checkout drops off. Warm instances eliminate that problem and are the operational requirement for any checkout-integrated try-on experience.

Self-hosting becomes economical at around 400,000 renders per month - a volume that typically requires aggregating try-on across multiple brand partners rather than a single DTC brand. Until you reach platform scale, the managed path keeps cost predictable and lets you focus on the technical differentiation (the reflection model) rather than GPU infrastructure operations.

The per-render API model used in jewelry try-on follows the same structure as pet portrait generation and ghost mannequin photography. If you are evaluating multiple verticals, the unit economics and infrastructure decisions are comparable across all three - the pipelines differ but the hosting and pricing architecture is the same.