fikrikarim

On-device, real-time multimodal AI. Have natural voice and vision conversations with an AI that runs entirely on your machine. Powered by Gemma 4 E2B and Kokoro.

922
70
100% credibility
Found Apr 06, 2026 at 743 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
HTML
AI Summary

Parlor is an open-source application for real-time voice and vision conversations with AI that runs entirely on local hardware like Apple Silicon Macs or Linux machines with GPU.

How It Works

1
🔍 Discover Parlor

You hear about Parlor, a free tool for having natural chats with an AI using your voice and camera, all running right on your own computer.

2
💻 Get it on your machine

You bring the tool to your Mac or Linux computer and set it up with easy steps.

3
🚀 Launch the app

You start the app, and it grabs everything it needs automatically the first time.

4
🌐 Open in browser

A web page pops up in your browser where you allow access to your microphone and camera.

5
🗣️ Talk and show

You speak freely or point your camera at things, and it listens and watches without buttons.

6
💬 AI chats back

The AI understands you, sees what you show, and replies in a real voice right away, even if you interrupt.

🎉 Private conversations anytime

Now you enjoy helpful, natural talks for learning languages or fun, all private on your device.

Sign up to see the full architecture

5 more

Sign Up Free

Star Growth

See how this repo grew from 743 to 922 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is parlor?

Parlor lets you have natural voice and vision conversations with an AI that runs entirely on your Mac or Linux machine—no cloud, no servers. Point your browser camera at objects, speak freely using hands-free voice detection, and get spoken responses with vision understanding, all in real-time via WebSocket to a local FastAPI server. Powered by Gemma 4 E2B for multimodal input and Kokoro for TTS, it solves the high cost of hosting voice AI by keeping everything on-device.

Why is it gaining traction?

It delivers barge-in interruptions, sentence-level streaming audio, and end-to-end latency under 3s on Apple Silicon, making local multimodal chats feel responsive without GPU beasts like RTX 5090s. Developers dig the zero-cost setup—git clone, uv sync, python server.py, hit localhost:8000—and automatic model downloads for instant multilingual practice, like pointing at objects while fallbacking to native languages.

Who should use this?

Language teachers building on-device tutors for real-time speaking practice, indie devs prototyping private voice-vision apps, or hardware hackers testing local AI on laptops before phone ports. Ideal for anyone evaluating on-device real-time models where privacy trumps agentic power.

Verdict

Grab it for experiments if you have Apple Silicon or a GPU Linux box—80 stars and solid README make it approachable, but the 1.0% credibility score flags its research-preview status with expected bugs. Solid foundation for local multimodal demos, just temper expectations on polish.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.