Refract Studio
A personal, full-stack
agentic video studio.
Higgsfield-class — but mine: cheaper, smarter, under my control.
The Product-Lock ad is the crown. Everything else is a ring around it.
Part 1 — The Vision
"Create my own video system that competes with Higgsfield across every domain — most professional, most comprehensive, smartest, complexity hidden behind the scenes, at a cheap price, with sound, on WaveSpeed + Replicate, all mine. The product ad stays the crown — but also general video, sound, talking-head. Later, an editing suite. OpenMontage is the vision — I just want it mine: cheap, special, excellent, under my control."
— Ben, 2026-06-22
North Star
End-to-end production across every genre — product ad · cinematic · talking-head/avatar · sound & music · montage · explainer.
Not a model wrapper. A skill + orchestration layer that turns a short prompt + the real product into a finished, claim-safe, Hebrew-first ad.
The model is a swappable commodity. The genius is the orchestration, craft, critics, and cost — not any single model.
Positioning — the unoccupied triangle
Product-Lock video
i2v from the EXACT product still. The AI never invents the product. Arcads/Creatify can't show the real SKU in use.
Claim-safe + dual critics
Claim ledger + Red Flag Expert (motion/slop) + Copy Director (script/VO) before SHIP. The compliance gap nobody fills.
Hebrew-first
$1.58B Israeli SMB market, WhatsApp-native, zero Hebrew-first competitor.
The moat — consistency without distortion
The single hardest thing in AI video, and our crown discipline: the same thing never drifts.
- Product consistency — never let a model redraw the real product (i2v from the exact still; first+last keyframe lock).
- Character consistency — a built, named, ownable mascot that looks identical across every clip & episode → write ongoing stories about him.
- A brand that owns a consistent character owns a franchise, not a one-off ad.
Part 2 — The Creation Process
How a film actually gets made
It looks like "type a prompt." It is really a complex, multi-step pipeline with adversarial ping-pong loops — that's where the quality comes from.
The pipeline — one brief, ten orchestrated stages
- Brief — Hebrew prompt + pick the real product + claim attestation
- Prompt craft — video-prompt-improver fills MCSLA slots + a named camera move (iterative)
- Still / Product-Lock — Flux still from the exact product, or a locked character ref
- Composite (P4) — if product-in-scene: cutout + placement (on_table / in_hand)
- Motion (i2v) — first+last keyframe lock; cheapest provider that clears the bar
- Dual critics — Red Flag + Copy Director attack → revise → regenerate (the ping-pong)
- Sound stage — VO + ambient + SFX + music duck → video_final
- Montage — stitch multiple beats into one 15s piece
- Finish — upscale · grade · captions (Hebrew RTL) · platform aspect
- SHIP gate — claim-safe verdict, delivered as a link
Stage 2 · Prompt ping-pong
The prompt is drafted, attacked, sharpened
The skill doesn't take the prompt at face value. It fills a structured slot framework (subject · action · camera · lighting · atmosphere), then critiques its weakest slot and rewrites — logging the winning delta so prompts compound across every future job.
A named camera move (push-in, rack, dolly-out…) is mandatory — bias toward cinematic craft, never a safe default push-in.
Stages 3–5 · The lock chain
Still → Composite → Motion, identity pinned at every hop
- Still — generated from the real SKU or a canonical character reference.
- Composite (P4) — product cut out & placed into the scene; never re-drawn.
- Motion — i2v with both endpoints pinned (first+last keyframe) = minimal drift.
Scored provider selector
Each job routes to the cheapest provider that clears the quality bar — scored on task-fit · quality · control · reliability · cost · latency · continuity. Replicate + WaveSpeed behind one seam.
Stage 6 · The ping-pong that guarantees quality
Adversarial dual-critic loop → SHIP
Designer drafts
→
Red Flag Expert attacks the image
→
Revise
Copy Director attacks the words
→
Revise & regenerate
↺
SHIP (≤ 5 versions)
Two independent critics, research-based rubrics. They try to break the work — slop, distortion, weak hook, unsafe claim. The piece only ships when both are satisfied. The model never gets to redraw the product to "fix" a critique.
Stages 7–9 · Finishing
Sound, montage, and the edit layer
- Sound stack — edge-tts VO + ambient bed + SFX overlay + music ducking → video_final.mp4. Sound is a named Higgsfield gap we close.
- Montage — deterministic ffmpeg stitch of several beats → one 15s piece (no drift, measured durations).
- Finish — upscale · color grade · Hebrew RTL captions (ffmpeg ASS) · per-platform aspect presets.
Stage 10 · The human ping-pong
PRISM gates — Ben decides, agents execute
The studio is built inside a governed factory. The creative ping-pong with the human is deliberate and bounded:
- P6 gates (Ben only): Vision · Brand · Naming · Logo · Positioning — each delivered as a deployed link, not chat.
- Everything else autonomous — agents decide technical matters with documented reasoning.
- Prove-before-build — real output before more spec; over-build is flagged as a decision.
Part 3 — The Product
Built as capability rings
Staged, not small. Each ring is shippable; together they reach the most-comprehensive system without collapsing.
Capability rings — the build order
R0Foundation — lock_policy + cheapest-provider selector (Replicate + WaveSpeed)
R1Crown — Product-Lock ad, end-to-end done-ish P1–P5
R2General video + Sound — scene modes · VO/SFX/music mix live
R3Talking-head / Avatar — HeyGen/OmniHuman-class + lip-sync
R4Montage / multi-shot — stitch · ShotSpec · beat editor backend live
R5Captions + Export — ffmpeg ASS (Hebrew RTL) · platform profiles
R6Editing suite — timeline UI (horizon)
The cheap-but-premium engine
How we beat Higgsfield on price:
- Provider seam (
ports.py) routes each job to the cheapest provider that clears the quality bar.
- Cheap by default, premium opt-in — WaveSpeed undercuts cost; Replicate stays for reach.
- Still-first, cost-efficient — storyboard → generate → orchestrate. Complexity stays hidden: you type a brief, the studio picks pipeline → provider → critics → finish.
Part 4 — The Result
What's actually built
Shipped & verified
- Ring 1 Crown — Product-Lock ad pipeline live (P1–P5). done
- Ring 4 Montage — 3 mixed-aspect clips → one 15.02s montage, render-verified.
- Ring 2 Sound — VO + ambient + SFX → video_final; live ben-family smoke ≈ $0.053.
- P7 Character — reference-image plumbing + cast refs; ref_fallback verified on a live flux-schnell run ($0.003).
146tests passing
~$0.05full sound film
15.02smontage, verified
Gate decisions — approved
- Ambition — approved in full: full-stack Higgsfield-class studio, all rings, product-ad crown.
- Providers — approved: add WaveSpeed alongside Replicate (cheapest-clears-bar routing).
- OpenMontage — approved: clean-room blueprint, no fork, AGPL-safe.
- First ring built — Ring 4 (montage), then R0 → R2 (sound) → R3 (talking-head) → R5/6.
What's next
The product ad is the crown.
The studio is the franchise.
Next: name pick & hero workflow (Ben) · Ring 3 lip-sync · WaveSpeed live · the editing suite.