hwdsl2 / docker-whisper

Public

Docker image for a self-hosted Whisper speech-to-text server with an OpenAI-compatible audio transcription API. Powered by faster-whisper. Supports all Whisper models, multiple response formats (JSON, SRT, VTT), offline/air-gapped mode, and multi-arch (amd64/arm64).

hub.docker.comrhwdsl2whisper-server deep-learning inference openai quantization self-hosted

100% credibility

Found Apr 12, 2026 at 10 stars -- GitGems finds repos before they trend. Get early access to the next one.

AI Analysis

Shell

AI Summary

A self-hosted service that turns audio recordings into text using advanced speech recognition, keeping everything private on your machine.

How It Works

🗣️ Discover private speech-to-text

You hear about a simple way to turn your audio recordings into text without sharing them online.

🚀 Start your helper service

With one easy action on your computer, you bring your personal transcription service to life.

⏳ It prepares itself

Your service quietly gathers what it needs and warms up, ready for action in moments.

📤 Share an audio clip

You send a voice note, podcast, or meeting recording to your service.

✨ See the magic happen

In seconds, your spoken words appear as clear, readable text on screen.

🎉 Private transcription anytime

Now you can convert any audio to text securely on your own setup, whenever you need it.

Sign up to see the full architecture

4 more

Star Growth

See how this repo grew from 10 to 10 stars Sign Up Free

Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose

AI-Generated Review

What is docker-whisper?

This Docker image spins up a self-hosted speech-to-text server using faster-whisper, exposing an OpenAI-compatible API at POST /v1/audio/transcriptions. Apps calling OpenAI's Whisper endpoint switch over with a single OPENAI_BASE_URL env var change, supporting all Whisper models from tiny to large-v3-turbo, response formats like JSON, SRT, or VTT, and major audio types via ffmpeg. Built on Python 3.12 slim with multi-arch support for amd64/arm64, it runs offline after initial docker image download from Docker Hub or Quay, keeping audio data private on your server.

Why is it gaining traction?

It stands out as a drop-in OpenAI replacement without vendor lock-in or costs, with docker github actions automating builds and publishing to docker github registry for reliable docker image tag and docker images list management. Features like model pre-caching, air-gapped mode via WHISPER_LOCAL_ONLY, and a whisper_manage CLI for docker exec tasks make deployment via docker-compose or docker run straightforward, even on low-spec hardware like Raspberry Pi. Devs dig the Swagger docs at /docs and reverse proxy guides for quick HTTPS setup.

Who should use this?

Backend engineers building voice pipelines or transcription services who want to ditch OpenAI billing and data sharing. Self-hosters running docker github runner setups or air-gapped servers needing docker image prune flexibility without internet dependency. Hardware tinkerers on arm64 deploying local AI stacks alongside tools like LiteLLM for full voice AI.

Verdict

Grab it if you need private STT—solid docs, MIT license, and GitHub Actions CI make it production-ready despite 10 stars and 1.0% credibility score signaling early maturity. Test on non-critical workloads first; low adoption means monitor for edge cases in docker image remove/update flows.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.

Stars

Forks

1,630

Followers

Base stars: 10 stars

Bonus: AI verified quality (100%)

Account age: 4,641 days

Repo age: 4 days

License: NOASSERTION

Updated: Apr 12, 2026