Receive Real Time Audio
To start receive real time audio streams, you need to include your websocket URL in create_bot.real_time_media .websocket_audio_destination_url.
This URL should have the
wss://
prefix.
Real Time Audio Protocol (Separate Streams)
Separate audio streams per participant are only available on the Zoom Native Bot and the Microsoft Teams Native Bot.
The first message on websocket connection will be:
{"protocol_version": 1, "bot_id": "YOUR-BOT-ID-HERE", "separate_streams": true}
The following websocket messages will be in binary format as follows:
- First 32 bits are a little-endian unsigned integer representing the "participant_id".
- The remaining data in the websocket packet is S16LE format audio, sampled at 16000Hz, mono
The following is sample code to decode these messages:
import asyncio
import websockets
async def echo(websocket):
async for message in websocket:
if isinstance(message, str):
print(message)
else:
stream_id = int.from_bytes(message[0:4], byteorder='little')
with open(f'output/{stream_id}-output.raw', 'ab') as f:
f.write(message[4:])
print("wrote message")
async def main():
async with websockets.serve(echo, "0.0.0.0", 8765):
await asyncio.Future()
asyncio.run(main())
Real Time Audio Protocol (Combined Streams)
Combined audio streams are available on the Zoom Web Bot, Microsoft Teams Web Bot, Google Meet Bot, and Webex Bot.
The first message on websocket connection will be:
{"protocol_version": 1, "bot_id": "YOUR-BOT-ID-HERE", "separate_streams": false}
{
protocol_version: 1,
bot_id: '...',
recording_id: '...',
separate_streams: false,
offset: 0.0
}
The offset
is the offset (in seconds) relative to the in_call_recording
event on the bot.
The following websocket messages will be in binary format as follows:
- All data in the websocket packet is S16LE format audio, sampled at 16000Hz, mono
import asyncio
import websockets
async def echo(websocket):
async for message in websocket:
if isinstance(message, str):
print(message)
else:
with open(f'output/output.raw', 'ab') as f:
f.write(message)
print("wrote message")
async def main():
async with websockets.serve(echo, "0.0.0.0", 8765):
await asyncio.Future()
asyncio.run(main())
Diarization using call events
When receiving audio streams, you can utilize Call Event Webhooks to receive real-time speaker changes. You can also receive these messages through a websocket connection by specifying the real_time_media.websocket_speaker_timeline_destination_url
when calling Create Bot.
Websocket example:
{ user_id: 16778240, name: 'John Doe', timestamp: 18.76719 }
timestamp
is the offset (in seconds) relative to the in_call_recording
event for the bot.
Webhook example:
{
"event": "bot.active_speaker_notify",
"data": {
"participant_id": 16778240,
"created_at": "2024-04-08T20:29:44.001399994Z",
"relative_ts": 5.865013889,
"bot_id": "2a06cd2f-b126-4eee-9d48-eebdb3195187"
}
}
relative_ts
is the offset (in seconds) relative to the in_call_recording
event for the bot.
Regardless of which method you use to receive call events, these can be used to determine the participant ID for a stream of audio packets until the next speaker change event.
You can then use the meeting_participants
on bot 9e77800d-ead9-4615-85fb-b71a045c7850
to map the ID to a participant name and attribute the words to the speaker:
// GET https://api.recall.ai/api/v1/bot/9e77800d-ead9-4615-85fb-b71a045c7850/
{
"meeting_participants": [
{
"id": 100,
"name": "John Doe",
"events": [],
"is_host": true,
"platform": "unknown",
"extra_data": null
}
],
...
}
FAQ
Do muted participants produce audio?
No, muted participants do not produce any audio.
If a participant is unmuted but silent, you will receive empty audio packets.
Will bots receive audio from other bots?
Since bots are participants, if there are other bots in a call, the bot will receive audio from the bot like any other participant.
Since bots are muted by default, unless another bot is outputting audio, the bot will not receive audio packets from other bots.
Updated 21 days ago