Network Format⚓︎

Network Protocol

Knowledge of the network format is not necessary to work with Dissonance in most cases. This documentation is only required if you want to interact with Dissonance over the network from your own non-Unity code. For example writing a Dissonance server in another language.

The Dissonance network system manages three main bits of data:

Who is in the session
Who is listening to which rooms
Voice packets

This document will give you an overview of how the Dissonance network system manages this data. To see the exact packet format look at PacketWriter.cs and PacketReader.cs in the Dissonance package, these structs have a method for writing/reading each different packet type.

Glossary⚓︎

Peer⚓︎

Every different machine in the session is a peer. This include both the server and the client.

Client⚓︎

A peer which is recording and playing voice.

Server⚓︎

A which manages the organisation of the session and relays voice to clients.

Host⚓︎

A peer which is both a server and a client.

Dedicated Server⚓︎

A server which is not a client (i.e. no auto recording or playback).

Reliable⚓︎

Some packets are sent reliably. This means that the packets will arrive at their destination in the order they were sent, there are no lost packets. This is used for all non-voice packets.

Unreliable⚓︎

Some packets are sent unreliably. This means that the packets may be lost in transport or arrive in a different order. This is always used for voice packets.

Frame⚓︎

Audio is recorded, processed and played back in frames, this is a buffer of 10-40ms of audio. Every frame is packed into a single network packet.

Room⚓︎

A room is a type of channel which requires the listener to explicitly subscribe to the room to hear any audio sent to that room. Rooms have a name (a string) but on the network rooms are generally referred to by a 16 bit ID which is calculated by the ToRoomId(string name) method.

Packet Header⚓︎

All packets contain a header which is used to check that the packet is valid.

The first 16 bits of every Dissonance packet are a 16 bit magic number 0x8BC7. This is read from the start of the packet and if it's incorrect the packet is immediately discarded. If something goes wrong and non-Dissonance packets are sent to Dissonance this prevents them from being decoded.

The next 8 bits are the packet type, this tells Dissonance how the contents of the packet should be decoded. The values used for this are defined in MessageTypes.cs.

After that all packets (except HandshakeRequest) have a 32 bit session number, this is a unique number randomly generated by the server when it starts a new session. If the session number does not match the packet is immediately discarded. If something goes wrong and packets from one Dissonance session are sent to another Dissonance session this prevents them from being decoded.

Kicking From A Session⚓︎

If a packet with an incorrect session number is received by the server it will send back an ErrorWrongSession packet to the client which contains the session number being used by the server. If the client is not using this session number it will disconnect and reconnect to the server.

Joining A Session⚓︎

A new client sends a HandshakeRequest message to the server. This tells the server the codec settings in use by this client as well as it's name.
The server replies with a HandshakeResponse message. This sends the complete state of the server to the client:
the session ID. A unique value prepended to all packets.
The client ID. A unique 16 bit ID for this client.
Client list. A list of all other clients in the session (name, codec setting, unique ID).
Room list. A list of the room names which at least one client is currently listening to.
Listeners list. A list of clients and the rooms which they are currently listening to.
The client replies with a ClientState message. This tells the server the complete state of the client:
Name
Client ID
Codec Settings
Rooms list. A list of the rooms this client is currently listening to.

Info

The HandshakeResponse message contains data about all clients currently in the session. In a very large session this can cause a problem with oversize packets. It is valid for the server to send some/none of the client data in the initial HandshakeResponse packet and instead to send it in individual ClientState messages immediately after the HandshakeResponse

Joining Or Leaving A Room⚓︎

The server maintains a list of which rooms every client is currently listening to. Sending a complete ClientState message every time a client join or leaves a room would be wasteful, instead a DeltaClientState message is sent. This contains:

Flag indicating joining or leaving
Client ID
Room name

Replicating Data⚓︎

The update messages ClientState and DeltaClientState are sent from clients to the server, which updates it's internal state. The server also broadcasts these messages out to all clients which update their own state. This means that every client has exactly the same list of who is listening to which rooms.

Peer To Peer⚓︎

It's possible for peers to communicate directly. When this is setup the metadata messages are still sent to the server but voice packets are sent directly from one client to another.

To set this up a client sends a HandshakeP2P message to every peer which it knows how to directly contact. The HandshakeP2P message contains the ID of the sending client. When a client receives a HandshakeP2P message from another client it can take note of the connection which that message came through, send back a HandshakeP2P message in response over that connection, and now the two peers can communicate directly.

Voice Packets⚓︎

Each client records audio, preprocesses it to improve audio quality, encodes it (using opus) and then sends the packet. The client decides who to send the packet to based on it's knowledge of who is listening to what. The client sends the voice packet via P2P to as many client as possible. The remaining packets are relayed via the server.

The VoiceData packet contains:

Sender Client ID
8 bit bitfield of packet flags
Sequence number. A number which can be used to put packets into the correct order
A list of channels which this voice is addressed to. For each channel:
16 bit channel bitfield
16 Bit channel ID
A frame of encoded audio

Bitfield⚓︎

The 8 bit bitfield contains: - The MSB is always set to 1 - The remaining 7 bits contain a wrapping counter which increments every time the "channel session" changes. The "channel session" changes whenever all sending channels are closed (i.e. there is an interruption in the voice stream).

Relayed Packets⚓︎

When packets cannot be sent directly with P2P they can be relayed via the server. The ServerRelayReliable and ServerRelayUnreliable packets are used for this purpose. These packets contain a list of destination client IDs and then an array of bytes.

When the server receives one of these packets it sends the array of bytes out to all of the listed clients. The server will discard attempts to relay HandshakeP2P packets.

Text Packets⚓︎

Text packets can be sent through the Dissonance session, unlike voice they are always relayed via the server. The TextData packet contains:

Recipient type. This indicates if the packet is targeted at a player or a room.
Sender ID. Client ID of the sender.
Recipient ID. This is either a room ID or a client ID, depending upon the recipient type.
A string of UTF8 encoded text.

Leaving A Session⚓︎

When a client leaves the session the server sends a RemoveClient message out to all clients. This simply contains the ID of the client which is leaving the session.