ivan-digital

AI speech toolkit for Apple Silicon — ASR, TTS, speech-to-speech, VAD, and diarization powered by MLX and CoreML

118
13
100% credibility
Found Feb 06, 2026 at 19 stars 6x -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Swift
AI Summary

A tool that converts spoken audio into text using a powerful recognition model designed to run smoothly on Apple Silicon devices supporting dozens of languages.

How It Works

1
🔍 Discover speech-to-text magic

You hear about a free tool that turns any audio recording into written text, understanding 52 languages right on your Apple Mac or iPhone.

2
📱 Get it ready on your Mac

Follow the easy guide to set up the tool on your Apple computer, no fancy skills needed.

3
🌍 Pick your voice model

Choose a small or powerful version that handles noise and dialects perfectly for your needs.

4
🎤 Choose your audio

Grab any sound file like a voice memo, interview, or podcast clip from your files.

5
Listen and transcribe

Hit go, and the tool hears your audio, thinks fast, and spits out the words as readable text.

Enjoy perfect results

See the full, accurate transcription appear instantly, ready to copy, edit, or share with friends.

Sign up to see the full architecture

4 more

Sign Up Free

Star Growth

See how this repo grew from 19 to 118 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is qwen3-asr-swift?

This GitHub Swift repo delivers a pure Swift implementation of Alibaba's Qwen3-ASR speech-to-text model, optimized for Apple Silicon using MLX Swift. It transcribes audio in 52 languages (30 major + 22 Chinese dialects) with top noise robustness, processing 10-second clips in ~0.6s on M-series chips via simple API calls or CLI. Import it only via Swift Package Manager into your Apple app for on-device ASR—no Python or cloud needed.

Why is it gaining traction?

It crushes Whisper in noisy environments (17% vs 63% WER) at half the size, with RTF under 0.07 for real-time feel on Macs/iPhones. Native Swift means seamless GitHub Actions CI, no FFI hassles, and streaming transcription support. Devs dig the auto-downloading models and cache control for quick prototypes.

Who should use this?

iOS/macOS devs building voice notes, meeting transcribers, or podcasts apps on Apple hardware. Swift GitHub projects needing local ASR without server costs, especially for multilingual or noisy-audio use cases like field recordings. Teams using Swift GitHub Copilot for rapid voice UI integration.

Verdict

Grab it for Apple ASR experiments—solid README, CLI (`qwen3-asr-cli audio.wav`), and tests make it dev-ready despite 29 stars and 1.0% credibility score. Early maturity means watch for streaming polish, but it's a smart Swift implementation guide for on-device ML.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.