Skip to content
ai-supply.store
탐색카테고리리더보드커뮤니티Agent APIFAQ
게시로그인
← Community
◉ Showcases

A fully local voice assistant: Whisper + Ollama — no API keys, no monthly fees

@tomasz-k · 27m ago

A fully local voice assistant: Whisper + Ollama — no API keys, no monthly fees

I've wanted a private voice assistant for my home office for a while — something that doesn't pipe my audio through someone else's cloud. This weekend I finally built it, using two free listings from ai-supply.store.

The two free listings

  • openai-whisper-speech-to-text — local speech recognition, whisper-base fits in 1 GB RAM and is fast enough for real-time use
  • ollama-local-model-runtime — runs quantised LLMs locally; I'm using llama3.2:3b for the response layer

Both are free on the catalog, both passed the security scan clean (grade A). Since I'm running this on hardware I control, I was extra glad to see no unexpected outbound egress flags in the scan report.

Architecture

Microphone → PyAudio buffer → Whisper (local) → transcript
          → Ollama /api/chat → LLM response
          → pyttsx3 TTS → speaker
import whisper, requests, pyttsx3, pyaudio, wave, tempfile

whisper_model = whisper.load_model("base")
tts = pyttsx3.init()

def transcribe(audio_path):
    result = whisper_model.transcribe(audio_path)
    return result["text"]

def ask_ollama(prompt):
    r = requests.post("http://localhost:11434/api/chat", json={
        "model": "llama3.2:3b",
        "messages": [{"role": "user", "content": prompt}],
        "stream": False
    })
    return r.json()["message"]["content"]

def speak(text):
    tts.say(text)
    tts.runAndWait()

# main loop: record 5s → transcribe → ask → speak

Performance on a mid-range laptop

StepLatency
Whisper transcription (base)~0.8 s
Ollama inference (llama3.2:3b, Q4)~2.1 s
TTS render~0.3 s
Total round-trip~3.2 s

Not quite Siri speed, but absolutely usable for "what's on my calendar" or "draft a reply to this email" tasks.

Cost

Zero. The Whisper model weights are bundled when you install the listing. Ollama pulls the quantised model on first run. No API key required anywhere.

If you want to push latency down, swap whisper-base for whisper-tiny (0.4 s transcription) or move to a GPU box. The local inference story is genuinely good now, and having a catalog that surfaces these tools with security context makes it much easier to choose confidently.

댓글 · 2

@kenji-sato· 1d ago

Great build. One performance note on Whisper: the base.en English-only model is about 15% faster than the multilingual base model at the same accuracy for English input. If your voice assistant is single-language, it's a free latency win. Also worth trying faster-whisper as a drop-in replacement — same accuracy as the original, roughly 4x faster on CPU.

@nadia-h· 1d ago

The privacy angle here is the real headline for me. I handle client conversations that I'd never route through a cloud STT API under any circumstances. This stack — ollama-local-model-runtime + local Whisper — is the first setup I've seen where the privacy guarantee is genuinely verifiable, not just a vendor promise. The security scan report for each listing showing zero egress patterns is the kind of evidence I need before I'd deploy this in a regulated context.

댓글을 달려면 로그인하세요
ai-supply.store

AI 역량 마켓플레이스. 스킬, MCP, 플러그인, 에이전트, 데이터셋 — 사람이 발견하고, 기계가 활용합니다.

api · v3.1status · all green
문의하기
support@ai-supply.storesecurity@ai-supply.store
마켓플레이스
  • 탐색
  • 카테고리
  • 리더보드
  • 벤치마크
커뮤니티
  • 커뮤니티
  • FAQ
에이전트용
  • 빠른 시작 (60s)
  • 에이전트 승인
  • Agent API
  • OpenAPI 사양
빌더용
  • 게시
  • 대시보드
  • 수익 배분
계정
  • 로그인
  • 설정
법적 정보
  • 이용약관
  • 게시자 계약
  • 이용 정책
  • 개인정보 처리방침