📘
To start receive real time audio streams, you need to include your websocket URL in create_bot.real_time_media .websocket_audio_destination_url.
This URL should have the wss:// prefix.

Real Time Audio Protocol (Separate Streams)

📘
Separate audio streams per participant are only available on the Zoom Native Bot and the Microsoft Teams Native Bot.

The first message on websocket connection will be:
{"protocol_version": 1, "bot_id": "YOUR-BOT-ID-HERE", "separate_streams": true}
The following websocket messages will be in binary format as follows:

First 32 bits are a little-endian unsigned integer representing the "participant_id".
The remaining data in the websocket packet is S16LE format audio, sampled at 16000Hz, mono

The following is sample code to decode these messages:

import asyncio
import websockets


async def echo(websocket):
    async for message in websocket:
        if isinstance(message, str):
            print(message)
        else:
            stream_id = int.from_bytes(message[0:4], byteorder='little')
            with open(f'output/{stream_id}-output.raw', 'ab') as f:
                f.write(message[4:])
                print("wrote message")


async def main():
    async with websockets.serve(echo, "0.0.0.0", 8765):
        await asyncio.Future()

asyncio.run(main())

Real Time Audio Protocol (Combined Streams)

📘
Combined audio streams are available on the Zoom Web Bot, Microsoft Teams Web Bot, Google Meet Bot, and Webex Bot.

The first message on websocket connection will be:
{"protocol_version": 1, "bot_id": "YOUR-BOT-ID-HERE", "separate_streams": false}

{
  protocol_version: 1,
  bot_id: '...',
  recording_id: '...',
  separate_streams: false,
  offset: 0.0 
}

The offset is the offset (in seconds) relative to the in_call_recording event on the bot.

The following websocket messages will be in binary format as follows:

All data in the websocket packet is S16LE format audio, sampled at 16000Hz, mono

import asyncio
import websockets


async def echo(websocket):
    async for message in websocket:
        if isinstance(message, str):
            print(message)
        else:
            with open(f'output/output.raw', 'ab') as f:
                f.write(message)
                print("wrote message")


async def main():
    async with websockets.serve(echo, "0.0.0.0", 8765):
        await asyncio.Future()

asyncio.run(main())

Diarization using call events

When receiving audio streams, you can utilize Call Event Webhooks to receive real-time speaker changes. You can also receive these messages through a websocket connection by specifying the real_time_media.websocket_speaker_timeline_destination_url when calling Create Bot.

Websocket example:

{ user_id: 16778240, name: 'John Doe', timestamp: 18.76719 }

timestamp is the offset (in seconds) relative to the in_call_recording event for the bot.

Webhook example:

{
    "event": "bot.active_speaker_notify",
    "data": {
        "participant_id": 16778240,
        "created_at": "2024-04-08T20:29:44.001399994Z",
        "relative_ts": 5.865013889,
        "bot_id": "2a06cd2f-b126-4eee-9d48-eebdb3195187"
    }
}

relative_ts is the offset (in seconds) relative to the in_call_recording event for the bot.

Regardless of which method you use to receive call events, these can be used to determine the participant ID for a stream of audio packets until the next speaker change event.

You can then use the meeting_participants on bot 9e77800d-ead9-4615-85fb-b71a045c7850 to map the ID to a participant name and attribute the words to the speaker:

// GET https://api.recall.ai/api/v1/bot/9e77800d-ead9-4615-85fb-b71a045c7850/
{
  "meeting_participants": [
    {
      "id": 100,
      "name": "John Doe",
      "events": [],
      "is_host": true,
      "platform": "unknown",
      "extra_data": null
    }
  ],
  ...
}

FAQ

Do muted participants produce audio?

No, muted participants do not produce any audio.

If a participant is unmuted but silent, you will receive empty audio packets.

Will bots receive audio from other bots?

Since bots are participants, if there are other bots in a call, the bot will receive audio from the bot like any other participant.

Since bots are muted by default, unless another bot is outputting audio, the bot will not receive audio packets from other bots.