Auto-Caption Design System
A code-truthful spec for a self-hosted video captioning tool. The palette is derived directly from app.py's embedded stylesheet — the same hex values that appear in the running app appear in these tokens. When code and this doc disagree, code wins.
A focused editor, not a consumer app.
An in-browser video editor built on top of a Python pipeline. Three principles hold the design together — break any one and it stops feeling like a professional tool.
Deep navy, one blue accent, semantic signals.
The surface palette runs from near-black #0D1117 to the card's #161E2C. Saturation is reserved for the accent and status signals — everything else is desaturated.
Surfaces — dark panel hierarchy
Text — readable on dark
Accent & status colours
Inter for UI, JetBrains Mono for data.
Inter is the root family at weight 400/500 for body and 700 for headings. JetBrains Mono appears only on timecodes, hex values, CLI snippets, and section eyebrows — anything that needs to feel like data output rather than prose.
A base-4 scale.
Eight steps, doubling loosely from a 4px hairline to 64px big-screen rhythm.
Crisp editor corners — tighter than a consumer app.
Tool UIs benefit from slightly more angular corners than games or marketing sites. Four steps from pill status badges down to barely-rounded timeline segments.
Every shadow is deep navy — never grey.
On a dark background, grey shadows disappear. Deep navy alpha keeps layering legible and maintains the depth hierarchy between the page and its panels.
All rendered on the dark editor surface.
Representative components rendered on the app's actual #0b0d10 background so every colour decision reads in context.
Timeline segments
Action buttons
Caption overlay in preview
in the live preview
Editor settings panel
Feedback, not decoration.
Short, purposeful, and respectful. With prefers-reduced-motion on, every animation collapses to a plain opacity fade.
Direct, plain, no jargon that isn't necessary.
This is a tool, not a product. Copy is functional, not cheerful — tell the user exactly what happened and what to do next.
"Transcription complete — 12 segments."
"Drag the overlay to reposition."
"Right-click to clear position."
"Awesome! Your captions are ready! 🎉"
"Pro tip: try the AI-powered refine!"
"Upgrade for more features."
Five ways to break the feel.
If a new screen does any of these, it doesn't belong in Auto-Caption.
- Don't use light backgrounds inside the editor. Every panel lives on the dark surface hierarchy. A white card looks like a bug, not a design choice.
- Don't use the blue accent as a decoration. Blue is reserved for interactive elements and active states. Decorative blue obscures what's clickable.
- Don't use Inter for timecodes or hex values. Anything that is "data" — timestamps, colour values, model names — must use JetBrains Mono so it reads as machine output.
- Don't invent new motion patterns. Use the four documented patterns; chaining novel animations inside a tool UI reads as jitter, not polish.
- Don't add real-money features or upgrade nags. This is a self-hosted tool. The appropriate response to scope limits is a clear error message, not a paywall.
Where every token actually lives.
This page documents the system; the code defines it. When they disagree, the code wins.