Refract Studio

A personal, full-stack
agentic video studio.

Higgsfield-class — but mine: cheaper, smarter, under my control.
The Product-Lock ad is the crown. Everything else is a ring around it.

Part 1 — The Vision
"Create my own video system that competes with Higgsfield across every domain — most professional, most comprehensive, smartest, complexity hidden behind the scenes, at a cheap price, with sound, on WaveSpeed + Replicate, all mine. The product ad stays the crown — but also general video, sound, talking-head. Later, an editing suite. OpenMontage is the vision — I just want it mine: cheap, special, excellent, under my control." — Ben, 2026-06-22

North Star

End-to-end production across every genre — product ad · cinematic · talking-head/avatar · sound & music · montage · explainer.

Not a model wrapper. A skill + orchestration layer that turns a short prompt + the real product into a finished, claim-safe, Hebrew-first ad.

The model is a swappable commodity. The genius is the orchestration, craft, critics, and cost — not any single model.

Positioning — the unoccupied triangle

Product-Lock video

i2v from the EXACT product still. The AI never invents the product. Arcads/Creatify can't show the real SKU in use.

Claim-safe + dual critics

Claim ledger + Red Flag Expert (motion/slop) + Copy Director (script/VO) before SHIP. The compliance gap nobody fills.

Hebrew-first

$1.58B Israeli SMB market, WhatsApp-native, zero Hebrew-first competitor.

The moat — consistency without distortion

The single hardest thing in AI video, and our crown discipline: the same thing never drifts.

  • Product consistency — never let a model redraw the real product (i2v from the exact still; first+last keyframe lock).
  • Character consistency — a built, named, ownable mascot that looks identical across every clip & episode → write ongoing stories about him.
  • A brand that owns a consistent character owns a franchise, not a one-off ad.
Part 2 — The Creation Process

How a film actually gets made

It looks like "type a prompt." It is really a complex, multi-step pipeline with adversarial ping-pong loops — that's where the quality comes from.

The pipeline — one brief, ten orchestrated stages

  1. Brief — Hebrew prompt + pick the real product + claim attestation
  2. Prompt craft — video-prompt-improver fills MCSLA slots + a named camera move (iterative)
  3. Still / Product-Lock — Flux still from the exact product, or a locked character ref
  4. Composite (P4) — if product-in-scene: cutout + placement (on_table / in_hand)
  5. Motion (i2v) — first+last keyframe lock; cheapest provider that clears the bar
  6. Dual critics — Red Flag + Copy Director attack → revise → regenerate (the ping-pong)
  7. Sound stage — VO + ambient + SFX + music duck → video_final
  8. Montage — stitch multiple beats into one 15s piece
  9. Finish — upscale · grade · captions (Hebrew RTL) · platform aspect
  10. SHIP gate — claim-safe verdict, delivered as a link

Stage 2 · Prompt ping-pong

The prompt is drafted, attacked, sharpened

The skill doesn't take the prompt at face value. It fills a structured slot framework (subject · action · camera · lighting · atmosphere), then critiques its weakest slot and rewrites — logging the winning delta so prompts compound across every future job.

A named camera move (push-in, rack, dolly-out…) is mandatory — bias toward cinematic craft, never a safe default push-in.

Stages 3–5 · The lock chain

Still → Composite → Motion, identity pinned at every hop

  • Still — generated from the real SKU or a canonical character reference.
  • Composite (P4) — product cut out & placed into the scene; never re-drawn.
  • Motion — i2v with both endpoints pinned (first+last keyframe) = minimal drift.

Scored provider selector

Each job routes to the cheapest provider that clears the quality bar — scored on task-fit · quality · control · reliability · cost · latency · continuity. Replicate + WaveSpeed behind one seam.

Stage 6 · The ping-pong that guarantees quality

Adversarial dual-critic loop → SHIP

Designer drafts Red Flag Expert attacks the image Revise
Copy Director attacks the words Revise & regenerate SHIP  (≤ 5 versions)

Two independent critics, research-based rubrics. They try to break the work — slop, distortion, weak hook, unsafe claim. The piece only ships when both are satisfied. The model never gets to redraw the product to "fix" a critique.

Stages 7–9 · Finishing

Sound, montage, and the edit layer

  • Sound stack — edge-tts VO + ambient bed + SFX overlay + music ducking → video_final.mp4. Sound is a named Higgsfield gap we close.
  • Montage — deterministic ffmpeg stitch of several beats → one 15s piece (no drift, measured durations).
  • Finish — upscale · color grade · Hebrew RTL captions (ffmpeg ASS) · per-platform aspect presets.

Stage 10 · The human ping-pong

PRISM gates — Ben decides, agents execute

The studio is built inside a governed factory. The creative ping-pong with the human is deliberate and bounded:

  • P6 gates (Ben only): Vision · Brand · Naming · Logo · Positioning — each delivered as a deployed link, not chat.
  • Everything else autonomous — agents decide technical matters with documented reasoning.
  • Prove-before-build — real output before more spec; over-build is flagged as a decision.
Part 3 — The Product

Built as capability rings

Staged, not small. Each ring is shippable; together they reach the most-comprehensive system without collapsing.

Capability rings — the build order

R0Foundation — lock_policy + cheapest-provider selector (Replicate + WaveSpeed)
R1Crown — Product-Lock ad, end-to-end done-ish P1–P5
R2General video + Sound — scene modes · VO/SFX/music mix live
R3Talking-head / Avatar — HeyGen/OmniHuman-class + lip-sync
R4Montage / multi-shot — stitch · ShotSpec · beat editor backend live
R5Captions + Export — ffmpeg ASS (Hebrew RTL) · platform profiles
R6Editing suite — timeline UI (horizon)

The cheap-but-premium engine

How we beat Higgsfield on price:

  • Provider seam (ports.py) routes each job to the cheapest provider that clears the quality bar.
  • Cheap by default, premium opt-in — WaveSpeed undercuts cost; Replicate stays for reach.
  • Still-first, cost-efficient — storyboard → generate → orchestrate. Complexity stays hidden: you type a brief, the studio picks pipeline → provider → critics → finish.
Part 4 — The Result

What's actually built

Shipped & verified

  • Ring 1 Crown — Product-Lock ad pipeline live (P1–P5). done
  • Ring 4 Montage — 3 mixed-aspect clips → one 15.02s montage, render-verified.
  • Ring 2 Sound — VO + ambient + SFX → video_final; live ben-family smoke ≈ $0.053.
  • P7 Character — reference-image plumbing + cast refs; ref_fallback verified on a live flux-schnell run ($0.003).
146tests passing
~$0.05full sound film
15.02smontage, verified

Gate decisions — approved

  • Ambition — approved in full: full-stack Higgsfield-class studio, all rings, product-ad crown.
  • Providers — approved: add WaveSpeed alongside Replicate (cheapest-clears-bar routing).
  • OpenMontage — approved: clean-room blueprint, no fork, AGPL-safe.
  • First ring built — Ring 4 (montage), then R0 → R2 (sound) → R3 (talking-head) → R5/6.

What's next

The product ad is the crown.
The studio is the franchise.

Next: name pick & hero workflow (Ben) · Ring 3 lip-sync · WaveSpeed live · the editing suite.

Refract Studio · Vision → Creation → Result · PRISM