Skip to content

Voice messages

Voice notes, audio files, and video notes are all handled. The audio is transcribed on your VPS, then the resulting text flows through the same path as a typed message.

How to use it

Open the Telegram chat with your bot and record a voice note. Send. Han AI replies with text as if you had typed.

The same applies to forwarded audio files and Telegram round video notes. For video notes only the audio is transcribed; visual content is not analysed.

How transcription works

Step	What happens
1	The audio file arrives via the Telegram bot.
2	Your VPS picks the best available transcriber — local `whisper.cpp` if installed, otherwise the OpenAI Whisper API path.
3	The transcript is appended to the same conversation history as a normal text turn, at `/var/hanai/state/coo-history.jsonl`.
4	Han AI reads the transcript and replies.

Local versus API

Path	When
Local `whisper.cpp`	Preferred when installed. Lower latency, no per-minute cost, audio never leaves your VPS.
OpenAI Whisper API	Default fallback. Used when local install is absent.

If you send a high volume of voice notes, ask your operator to install whisper.cpp at /opt/whisper.cpp/. It’s cheaper and keeps the audio on your hardware.

Languages

Telegram audio in any language Whisper supports will transcribe. Bot replies follow the language norm of the conversation so far.

What doesn’t work yet

Replies as voice. Han AI replies in text. Generating speech back is not in scope today.
Real-time streaming. Send a complete voice note rather than expecting partial responses while you speak.

Next

Live mode — what happens after the transcript reaches the COO.
Where your data lives — where transcripts are stored.