Auto-Caption Design System

A code-truthful spec for a self-hosted video captioning tool. The palette is derived directly from app.py's embedded stylesheet — the same hex values that appear in the running app appear in these tokens. When code and this doc disagree, code wins.

2 typefaces 1 blue accent dark base v1.0
01 · Identity

A focused editor, not a consumer app.

An in-browser video editor built on top of a Python pipeline. Three principles hold the design together — break any one and it stops feeling like a professional tool.

What you see is what gets burned.The preview overlay matches the libass render to the pixel — same position, same wrap, same font size.
Dark surfaces, one bright accent.The dark panel hierarchy comes from the running app. The blue accent (#5B8DEF) is the only loud colour in the chrome.
Mono for data, sans for prose.JetBrains Mono appears only for timecodes, hex values, and system labels. Everything else is Inter.
02 · Color

Deep navy, one blue accent, semantic signals.

The surface palette runs from near-black #0D1117 to the card's #161E2C. Saturation is reserved for the accent and status signals — everything else is desaturated.

Surfaces — dark panel hierarchy

bg
#0D1117
Page base, deepest layer
bg-subtle
#111620
Section wash
bg-card
#161E2C
Editor panels, cards
bg-elevated
#1A2438
Elevated / hover surfaces

Text — readable on dark

text
#E8EAED
Primary — titles, scores
text-dim
#A8B3C0
Body, descriptions
text-muted
#6E7D8E
Captions, helper text

Accent & status colours

blue (primary)
#5B8DEF
Brand accent — active, CTA
blue-mid
#79A5FF
Hover, bright labels
purple
#8B7DEF
Claude / AI-refine indicator
green
#4CB87A
Positioned segment, success
amber
#F0A030
Timeline playhead, warning
red
#F07060
Error, danger, counter
03 · Typography

Inter for UI, JetBrains Mono for data.

Inter is the root family at weight 400/500 for body and 700 for headings. JetBrains Mono appears only on timecodes, hex values, CLI snippets, and section eyebrows — anything that needs to feel like data output rather than prose.

Inter 800Display · hero · large headings
Captions ready.
Inter 700Panel header · card titles · 24–28px
Editor & Timeline
Inter 400/500Body · UI labels · 16px default
Drop a video, adjust the caption style, hit Burn — get back the same file with subtitles burned in by FFmpeg. No round-trip.
JetBrains Mono 600Timecodes · hex · CLI · eyebrows
00:01.420 → 00:04.890  ·  model: base  ·  PlayResY=288
04 · Spacing

A base-4 scale.

Eight steps, doubling loosely from a 4px hairline to 64px big-screen rhythm.

sp-1
4px
sp-2
8px
sp-3
12px
sp-4
16px
sp-5
24px
sp-6
32px
sp-7
48px
sp-8
64px
05 · Radius

Crisp editor corners — tighter than a consumer app.

Tool UIs benefit from slightly more angular corners than games or marketing sites. Four steps from pill status badges down to barely-rounded timeline segments.

Pill Status badges, eyebrow chips
r-pill · 999px
Card Editor panels, upload zone
r-lg · 16px
Button Action buttons, inputs
r-md · 12px
Chip Timeline segments, badges
r-sm · 6px
06 · Elevation

Every shadow is deep navy — never grey.

On a dark background, grey shadows disappear. Deep navy alpha keeps layering legible and maintains the depth hierarchy between the page and its panels.

sh-sm
sh-md
glow-blue
glow-green
07 · Components

All rendered on the dark editor surface.

Representative components rendered on the app's actual #0b0d10 background so every colour decision reads in context.

Timeline segments

00:00.000 — Hello, welcome
00:04.200 — to Auto-Caption
00:07.800 — drop your video
default active has custom position

Action buttons

Caption overlay in preview

video preview
This caption is draggable
in the live preview

Editor settings panel

Caption Style
Font size 24px
Alignment Bottom-center
Polish with Claude ON
08 · Motion

Feedback, not decoration.

Short, purposeful, and respectful. With prefers-reduced-motion on, every animation collapses to a plain opacity fade.

Segment activate block highlights as clip plays, 280ms ease
Burn
Burn progress button pulses while FFmpeg is running
panel
Panel open scale 0.96 → 1, fade in, 220ms
Error shake input shakes when validation fails, 450ms
09 · Voice

Direct, plain, no jargon that isn't necessary.

This is a tool, not a product. Copy is functional, not cheerful — tell the user exactly what happened and what to do next.

We say

"Transcription complete — 12 segments."

"Drag the overlay to reposition."

"Right-click to clear position."

We don't say

"Awesome! Your captions are ready! 🎉"

"Pro tip: try the AI-powered refine!"

"Upgrade for more features."

10 · Doing it wrong

Five ways to break the feel.

If a new screen does any of these, it doesn't belong in Auto-Caption.

  • Don't use light backgrounds inside the editor. Every panel lives on the dark surface hierarchy. A white card looks like a bug, not a design choice.
  • Don't use the blue accent as a decoration. Blue is reserved for interactive elements and active states. Decorative blue obscures what's clickable.
  • Don't use Inter for timecodes or hex values. Anything that is "data" — timestamps, colour values, model names — must use JetBrains Mono so it reads as machine output.
  • Don't invent new motion patterns. Use the four documented patterns; chaining novel animations inside a tool UI reads as jitter, not polish.
  • Don't add real-money features or upgrade nags. This is a self-hosted tool. The appropriate response to scope limits is a clear error message, not a paywall.
11 · Source of truth

Where every token actually lives.

This page documents the system; the code defines it. When they disagree, the code wins.

CSS tokenscaption-case/css/main.css
App stylesapp.py (embedded <style>)
Pipelinecaption_pipeline.py
Serverapp.py (FastAPI)
Repositorygithub.com/ilan-stack/caption-app →