The editor, live on your
own machine.
Auto-Caption is a self-hosted FastAPI app — not a public SaaS. The screenshots below show the real running UI. To use it, clone the repo, install dependencies, and start the server. The source code is on GitHub.
pip install -r requirements.txt → uvicorn app:app --port 8765
Segments turn green when you've set a custom position.
The timeline shows one block per Whisper segment. Clicking a block activates it (shown in blue). Dragging the overlay in the preview above sets pos_x/pos_y on the active segment — the block turns green. Right-click to clear back to global alignment.
Drag the caption to reposition. The burn matches exactly.
The overlay is an absolutely-positioned div that tracks the video frame. Drag it anywhere — the coordinates are stored in script units (PlayResY=288) and handed to libass as a \pos(x,y) override in the ASS file. The burn lands on the same pixel you dragged to.
Whisper + optional Claude, then FFmpeg burns.
The server runs Whisper with word_timestamps=True, snaps segment boundaries to actual word onsets, and (if toggled) pipes the text through Claude Sonnet for grammar and filler cleanup. The final step chains one or two subtitles filters in FFmpeg, depending on whether box + outline are both requested.
Font, size, colour, wrap width, alignment — all editable.
The style panel on the right exposes font size, font name, primary colour, outline width, alignment (1-9 numpad), margin, and whether to show a background box. Adjusting any control re-renders the preview overlay immediately — you see the result before committing a burn.
Run it yourself.
Auto-Caption is fully open source. Clone the repo, install the Python dependencies, and have a self-hosted captioning studio running on your machine in under five minutes. No API keys required except your own Anthropic key if you want the Claude polish step.