Voice transcription
Send Han AI a voice note, an audio file, or a video note in Telegram and it will transcribe the audio before responding. Local whisper.cpp is preferred when installed; OpenAI’s Whisper API is the fallback.
What it does
Detects inbound voice, audio, or video-note messages, transcribes them, and routes the resulting text through the same path as a typed message.
| Field | Value |
|---|---|
| Preferred engine | whisper.cpp (local) |
| Fallback engine | OpenAI Whisper API |
| Inbound formats | Telegram voice, audio, video_note |
| History | Transcripts append to the same conversation log as text |
When Han AI uses it
Every time you send an audio or video message. There is no command to invoke — it runs automatically.
Examples
- A walking voice note dictating three follow-ups becomes three structured tasks.
- A recorded site walk-through is transcribed and filed.
- A meeting recording can be summarised and indexed into vector memory.
Limits
- Until
whisper.cppis built on a given VPS, transcription runs through the OpenAI fallback path, which costs tokens.TODO: confirm whisper.cpp install rollout per tenant. - Heavy background noise degrades accuracy.
- Non-English audio works (Whisper is multilingual) but accuracy varies by language and accent.
Why this stack
The local path is free and private. The OpenAI fallback covers tenants where the local build is not yet installed — the system stays usable on day one.