Self-hosted · runs on localhost

The editor, live on your
own machine.

Auto-Caption is a self-hosted FastAPI app - not a public SaaS. The screenshots below show the real running UI. To use it, clone the repo, install dependencies, and start the server. The source code is on GitHub.

View on GitHub Clone → pip install -r requirements.txt → uvicorn app:app --port 8765

localhost:8765

Timeline

Segments turn green when you've set a custom position.

The timeline shows one block per Whisper segment. Clicking a block activates it (shown in blue). Dragging the overlay in the preview above sets pos_x/pos_y on the active segment - the block turns green. Right-click to clear back to global alignment.

Timeline - 4 segments

00:00.0 – Hello

00:02.4 – welcome

00:04.8 – to Auto

00:06.1 – Caption

default ▶ active ⬤ custom pos

Caption overlay

Drag the caption to reposition. The burn matches exactly.

The overlay is an absolutely-positioned div that tracks the video frame. Drag it anywhere - the coordinates are stored in script units (PlayResY=288) and handed to libass as a \pos(x,y) override in the ASS file. The burn lands on the same pixel you dragged to.

video frame

Drag me anywhere ↕↔

pos_x: 360 · pos_y: 168

Pipeline

Whisper + optional Claude, then FFmpeg burns.

The server runs Whisper with word_timestamps=True, snaps segment boundaries to actual word onsets, and (if toggled) pipes the text through Claude Sonnet for grammar and filler cleanup. The final step chains one or two subtitles filters in FFmpeg, depending on whether box + outline are both requested.

● Transcription complete

12 segments · word-level timestamps

● Claude: grammar pass done

Reduced filler words: 4

● FFmpeg burn — pass 1/2

Style controls

Font, size, colour, wrap width, alignment - all editable.

The style panel on the right exposes font size, font name, primary colour, outline width, alignment (1-9 numpad), margin, and whether to show a background box. Adjusting any control re-renders the preview overlay immediately - you see the result before committing a burn.

Caption Style

Font size24px

Font nameArial

Alignment2 (bottom-center)

Box backgroundON

Outline width2px

Whisper modelbase

Run it yourself.

Auto-Caption is fully open source. Clone the repo, install the Python dependencies, and have a self-hosted captioning studio running on your machine in under five minutes. No API keys required except your own Anthropic key if you want the Claude polish step.

View on GitHub