Whisper Transcription

Overview

Whisper Transcription is an automatic video transcription service running on a lightweight Debian 13 LXC container. It uses the OpenAI Whisper API to convert video files into SRT subtitle files without any local GPU requirements.

How It Works

The workflow is simple: drop a video file into the SMB share, and the service handles the rest.

File detection — A systemd polling service scans the incoming directory every 30 seconds for new video files (.mp4, .mkv, .webm, .mov)
Stability check — Before processing, the service verifies the file has finished copying by comparing file sizes 10 seconds apart. This prevents processing partially-transferred files over SMB
Audio extraction — ffmpeg extracts audio as a mono 16kHz MP3 at 48kbps, keeping file sizes small for the API
Chunking — If the audio exceeds the 25MB API limit, it's automatically split into 20-minute chunks
Transcription — Each chunk is sent to the OpenAI Whisper API (whisper-1 model) which returns SRT-formatted subtitles
Merging — For chunked files, the SRT segments are merged with corrected timestamps to produce a single continuous transcript
Output routing — The original video, extracted audio, and SRT file are moved to a completed directory organized by filename

Architecture

The service runs as an unprivileged LXC container with bind mounts to the same ZFS datasets used by the Samba file server. This means files dropped via SMB are immediately visible to the transcription service with zero network overhead — both containers access the underlying ZFS storage directly.

A separate AI-powered correction step (run on demand) reviews the raw transcripts to fix common Whisper mistakes like misheard technical terms and hallucinated content before the transcripts are published.

Why OpenAI API

A previous setup used whisper.cpp with local GPU acceleration, which required building from source for a specific GPU architecture, maintaining the ROCm stack, and keeping the inference host online. The API approach trades a small per-file cost (~$0.006/minute) for zero maintenance and consistent results.

Whisper Transcription

Identity

Host

Network

Resources

Depends On

Depended On By

Overview

How It Works

Architecture

Why OpenAI API