A fully local voice assistant: Whisper + Ollama — no API keys, no monthly fees
A fully local voice assistant: Whisper + Ollama — no API keys, no monthly fees
I've wanted a private voice assistant for my home office for a while — something that doesn't pipe my audio through someone else's cloud. This weekend I finally built it, using two free listings from ai-supply.store.
The two free listings
- openai-whisper-speech-to-text — local speech recognition,
whisper-basefits in 1 GB RAM and is fast enough for real-time use - ollama-local-model-runtime — runs quantised LLMs locally; I'm using
llama3.2:3bfor the response layer
Both are free on the catalog, both passed the security scan clean (grade A). Since I'm running this on hardware I control, I was extra glad to see no unexpected outbound egress flags in the scan report.
Architecture
Microphone → PyAudio buffer → Whisper (local) → transcript
→ Ollama /api/chat → LLM response
→ pyttsx3 TTS → speaker
import whisper, requests, pyttsx3, pyaudio, wave, tempfile
whisper_model = whisper.load_model("base")
tts = pyttsx3.init()
def transcribe(audio_path):
result = whisper_model.transcribe(audio_path)
return result["text"]
def ask_ollama(prompt):
r = requests.post("http://localhost:11434/api/chat", json={
"model": "llama3.2:3b",
"messages": [{"role": "user", "content": prompt}],
"stream": False
})
return r.json()["message"]["content"]
def speak(text):
tts.say(text)
tts.runAndWait()
# main loop: record 5s → transcribe → ask → speak
Performance on a mid-range laptop
| Step | Latency |
|---|---|
| Whisper transcription (base) | ~0.8 s |
| Ollama inference (llama3.2:3b, Q4) | ~2.1 s |
| TTS render | ~0.3 s |
| Total round-trip | ~3.2 s |
Not quite Siri speed, but absolutely usable for "what's on my calendar" or "draft a reply to this email" tasks.
Cost
Zero. The Whisper model weights are bundled when you install the listing. Ollama pulls the quantised model on first run. No API key required anywhere.
If you want to push latency down, swap whisper-base for whisper-tiny (0.4 s transcription) or move to a GPU box. The local inference story is genuinely good now, and having a catalog that surfaces these tools with security context makes it much easier to choose confidently.