What input image format does a footwear on-foot API require?

Most APIs accept JPEG or PNG. The ideal input is a white or transparent background catalog image with the shoe in side or 3/4 view. Front-facing catalog shots produce lower-quality results because the foot geometry is harder to infer from a head-on perspective.

Can the API generate on-foot variants for any shoe type?

Running shoes, sneakers, loafers, dress shoes, and ankle boots are well-supported by current models. Very high-heeled shoes, sandals with thin straps, and shoes with unusual geometry (e.g., toe-split running shoes) are harder and may require pipeline tuning.

How is colorway fidelity maintained in the generated image?

Production pipelines use the segmented shoe as a hard constraint during compositing - the original shoe is composited onto the generated foot, not regenerated. This preserves exact colorway, texture, and branding details. Pipelines that regenerate the shoe from a text description cannot reliably maintain colorway fidelity.

What scene types can be specified?

Common scene categories include indoor (wood floor, tile, concrete), outdoor (sidewalk, grass, gravel, trail), and athletic (gym floor, track, court). More specific prompts (e.g., 'Brooklyn sidewalk, golden hour') require a diffusion model with strong prompt-following. Scene diversity is a key differentiator between API providers.

How does processing time compare to a studio shoot?

API generation takes 8-45 seconds per image depending on provider and resolution. A full colorway set of 8 variants takes under 10 minutes. A studio shoot for the equivalent set takes 6-8 weeks including booking, production, and post-processing.

What is the typical cost per image for a managed API?

At standard resolution (1024x1024 to 2048x2048), managed footwear on-foot APIs typically price at $0.12-0.35 per image. Volume discounts apply above 1,000 images per month. Self-hosted ComfyUI pipelines can reduce inference cost to $0.02-0.06, but require a dedicated engineer to maintain.

Can the generated images be used in commercial advertising?

Usage rights depend on the specific API provider's terms of service. Most managed APIs grant full commercial rights to outputs. If using a self-hosted pipeline with open-weight models, check the model license - Stable Diffusion 3.5 and FLUX models have commercial-friendly licenses. Always verify before using outputs in paid advertising.

Is this suitable for generating images for Amazon and other marketplaces?

Amazon's main image requirements (white background, 85% fill) apply to the primary image. Lifestyle images in the secondary image slots have fewer restrictions and AI-generated on-foot variants are acceptable. Check each marketplace's AI content policy, as requirements are evolving. For Amazon specifically, lifestyle variants belong in positions 2-7, not the hero image.

Footwear On-Foot Variants API: Catalog to Lifestyle in One Call

Footwear brands photograph each silhouette once in a controlled studio. When the same shoe launches in six colorways, the standard answer is six more studio sessions: book a model, rent a space, pay a photographer, wait two weeks. An on-foot variant API collapses that to a single API call per colorway.

Per studio lifestyle shoot

Time-to-market without AI

Average seasonal launch

What an on-foot variant API does

The pipeline takes a catalog image (white background, side or 3/4 view) and outputs a lifestyle shot of the same shoe worn by a model in a contextually appropriate environment. It does not require a model, studio, or photographer. The shoe geometry, colorway, and details are preserved exactly. Only the background and foot/leg context are generated.

This is different from virtual try-on, which places a specific customer's foot into a shoe. On-foot variant generation creates generic but realistic lifestyle imagery for product pages, social ads, and editorial use - output that would otherwise require a physical shoot.

Footwear On-Foot AI · Example Generation Pipeline

✓ saved

Catalog → Concrete / Jeans

Cost · revenue · margin

What you pay, what you charge, what you keep

Stack	Infra /mo	AI team	Total cost	Revenue	Margin
Runflow 10% volume discount applied	$900	$0	$900	$6.0K	85%
Cloud API + manual QA similar pricing · no auto-QA · part-time engineer needed	$1.0K	~$5K	$6.0K	$6.0K	0%
Self-hosted GPU raw compute · full-time AI engineer required	$400	$12K	$12K	$6.0K	loss

Runflow Sentinel — built-in quality control layer that automatically detects and discards failed or low-quality outputs before delivery. You only pay for images that pass QA. No engineer needed to babysit the pipeline.

Pricing based on Runflow published rates (June 2026) with automatic volume discounts. Revenue column is illustrative — actual client pricing varies by vertical and contract size. GPU self-hosted estimate uses $0.04/img raw compute cost.

The pipeline architecture

A production-grade footwear on-foot pipeline runs four steps: shoe segmentation, foot and leg synthesis, compositing, and post-processing.

Step 1: Shoe segmentation

A segmentation model isolates the shoe from the catalog background and extracts its silhouette, sole angle, and toe-box orientation. This determines how the foot geometry will be synthesized - a running shoe at a forward angle needs different foot positioning than a loafer in a 3/4 view.

Step 2: Foot and scene synthesis

A diffusion model generates the foot, leg, and background conditioned on the shoe geometry and a scene prompt. The prompt can specify context: concrete sidewalk, gym floor, outdoor trail, wood interior. Style parameters control whether the output targets athletic, lifestyle, or editorial aesthetics.

Step 3: Compositing

The segmented shoe is composited onto the generated foot at the correct perspective and scale. Shadow generation and edge blending are applied to make the result photorealistic. Sole contact with the surface is critical - misaligned shadows are the most common failure mode in naive implementations.

Step 4: Post-processing

Color grading matches the shoe's lighting to the generated background. Specular highlights on leather, mesh, and rubber soles require different treatment. A final quality classifier rejects outputs where shoe geometry is distorted or the sole-ground contact looks physically implausible.

Who builds this

The primary buyers of footwear on-foot variant APIs are footwear brands launching multi-colorway product lines, marketplaces aggregating footwear inventory from multiple brands, and e-commerce agencies managing seasonal catalog production at volume.

Footwear on-foot API - implementation options, May 2026

Approach	Cost per image	Time to first output	Control
Managed API (Runflow or similar)	$0.12-0.35	< 1 day integration	Scene prompt, style, model type
Self-hosted ComfyUI pipeline	$0.02-0.06 (GPU cost)	2-4 weeks to production	Full - any model or LoRA
Traditional studio shoot	$200-580 per image*	6-8 weeks	Full creative control
Stock photography licensing	$15-80 per image	Immediate, limited selection	None - pre-shot images only

*Per-image studio cost calculated from half-day session rate divided by typical deliverable count (6-12 images per half day).

TCO: managed API vs self-hosted

At low volume (under 500 images per month), a managed API is almost always cheaper than self-hosted once engineering time is factored in. The break-even point for investing in a self-hosted pipeline is typically 2,000-5,000 images per month, depending on GPU costs and team size.

TCO comparison - 1,000 images per month, May 2026

Cost category	Managed API	Self-hosted ComfyUI
Inference cost	$120-350	$20-60
Engineering setup	$0	$8,000-12,000/mo (1 FTE)
Infra / GPU servers	$0	$400-800/mo
Maintenance	$0	$2,000-4,000/mo (part-time)
Total monthly cost	$120-350	$10,400-16,800

Self-hosted becomes cost-effective above roughly 30,000-50,000 images per month, where inference savings outweigh the fixed engineering cost. Below that threshold, a managed API is the correct default.

Quality benchmarks

On-foot variant generation is harder than background removal or virtual staging because it requires synthesizing new geometry (feet, legs) that must be geometrically consistent with the shoe. Quality varies significantly across pipeline configurations.

The most reliable quality signal is sole-ground contact accuracy: does the shoe's sole sit flush with the surface, with correct shadow direction and softness? Secondary signals are shoe geometry preservation (no distortion of sole shape or toe box) and colorway fidelity (the generated image matches the original colorway without color shift).

What to look for in an API

When evaluating a footwear on-foot API, request test outputs for at least four shoe categories: running shoes (mesh, complex sole), leather dress shoes (smooth surfaces, stitching), canvas sneakers (flat sole, minimal detail), and boots (height, shaft geometry). Each category tests different aspects of the pipeline.

Reject any API that cannot demonstrate colorway fidelity. If the generated shoe shifts from navy to black, the output is unusable for catalog purposes. This is the most common failure mode in generic image generation APIs that are not specifically tuned for footwear.

Integration pattern

The standard integration pattern for a footwear brand's existing catalog pipeline adds the on-foot generation step after the background removal step. The catalog image is already clean (white background, shoe isolated). The API call adds a scene parameter and returns a lifestyle variant. No changes to the upstream catalog production workflow are required.

For marketplaces aggregating inventory from multiple brands, the integration point is the product ingestion pipeline. When a new SKU is ingested, the on-foot variant is generated and stored alongside the original catalog image. The API call can be triggered asynchronously with a webhook callback.

Both patterns share the same requirement: the API must accept a webhook URL for asynchronous delivery. On-foot generation takes 8-45 seconds per image. Calling it synchronously inside a product ingestion flow will cause timeouts. Design the integration as a background job with a status check endpoint, and only surface the lifestyle variant to the product page once the webhook confirms delivery. This is standard practice in production image generation pipelines and avoids the temptation to block product listings on image generation completion.

Failure modes to plan for

On-foot variant generation has predictable failure modes that any production deployment needs to handle. Building a quality gate that rejects bad outputs before they reach your product page is as important as building the generation pipeline itself.

Sole-ground contact failure is the most visually obvious problem. The shoe appears to float above the surface, or the shadow is cast in the wrong direction relative to the background lighting. This is caused by a mismatch between the composite angle of the shoe and the generated ground plane. Fix by adding a shadow-matching post-processing step or by constraining scene generation to specific camera angles that match the catalog image's perspective.

Colorway drift occurs when the compositing step introduces a color shift - a white sneaker becomes off-white or a navy shoe shifts toward black. This is most common when the background generation step affects the shoe region through imprecise masking. The fix is to use hard segmentation masks with no feathering in the shoe region, then apply color correction post-composite to bring luminance back to the source image.

Shoe distortion is a third failure mode specific to shoes with complex geometry: chunky soles, platform heels, or toe-box detail. The compositing step can introduce perspective warping if the generated foot angle does not match the catalog image's viewing angle. The fix is to pre-classify catalog images by viewing angle and route them to different generation prompts - side-view images and 3/4-view images need different foot geometry constraints.

A simple automated quality gate runs three checks: (1) color histogram comparison between the shoe region in the input and output to catch colorway drift, (2) edge detection on the sole-ground contact line to verify shadow direction, (3) segmentation model rerun on the output to confirm shoe geometry is intact. Outputs that fail any check are flagged for human review rather than published automatically.

Footwear On-Foot AI · Example Generation Pipeline

✓ saved

Catalog → Concrete / Jeans

Cost · revenue · margin

What you pay, what you charge, what you keep

Stack	Infra /mo	AI team	Total cost	Revenue	Margin
Runflow 10% volume discount applied	$900	$0	$900	$6.0K	85%
Cloud API + manual QA similar pricing · no auto-QA · part-time engineer needed	$1.0K	~$5K	$6.0K	$6.0K	0%
Self-hosted GPU raw compute · full-time AI engineer required	$400	$12K	$12K	$6.0K	loss

Realistic output expectations

Current on-foot variant generation produces commercially usable output for sneakers, running shoes, loafers, and chelsea boots in approximately 80-90% of cases when the input catalog image is clean and the shoe is a standard silhouette. The remaining 10-20% require either manual retouching or a regeneration with different scene parameters.

For sandals and shoes with thin straps, usable output rates drop to 60-70% because the strap geometry is harder to preserve through compositing. High-heeled shoes with stiletto heels show similar success rates. If your catalog includes a high proportion of these shoe types, budget for a higher manual review rate.

The clearest signal that an API is production-ready for your catalog is not benchmark numbers - it is a test run on 50-100 of your own SKUs across all silhouettes you sell. Any provider that will not let you run a paid test batch before committing to a contract is worth treating with caution. Insist on a test batch, evaluate against your actual quality bar, and only then commit to a production contract.