Voice transcription

Send Han AI a voice note, an audio file, or a video note in Telegram and it will transcribe the audio before responding. Local whisper.cpp is preferred when installed; OpenAI’s Whisper API is the fallback.

What it does

Detects inbound voice, audio, or video-note messages, transcribes them, and routes the resulting text through the same path as a typed message.

Field	Value
Preferred engine	`whisper.cpp` (local)
Fallback engine	OpenAI Whisper API
Inbound formats	Telegram voice, audio, video_note
History	Transcripts append to the same conversation log as text

When Han AI uses it

Every time you send an audio or video message. There is no command to invoke — it runs automatically.

Examples

A walking voice note dictating three follow-ups becomes three structured tasks.
A recorded site walk-through is transcribed and filed.
A meeting recording can be summarised and indexed into vector memory.

Limits

Until whisper.cpp is built on a given VPS, transcription runs through the OpenAI fallback path, which costs tokens. TODO: confirm whisper.cpp install rollout per tenant.
Heavy background noise degrades accuracy.
Non-English audio works (Whisper is multilingual) but accuracy varies by language and accent.

Why this stack

The local path is free and private. The OpenAI fallback covers tenants where the local build is not yet installed — the system stays usable on day one.