Add spec for MSC2746 (#1511)
* Change version field to a string And add the notes on how the version field works. * Add spec requiring tracks to be within streams. * Put streams spec in its own section * Add 'invitee' field * Add party_id * Remember how JSON works * Add m.call.select_answer * Update examples * Add select_answer to call flow example diagram * Add m.call.reject * Make party_id required in other events * Add possible ways for client to handle an invite * Convert hangup & reject events to YAML So we can have a bulleted list in the description for the values of 'reason'. * Add new reason codes to hangup & reject * Add m.call.negotiate * Add other sections * Revert changes to package lock * Typos * Fix type of other version fields, fix anchor. * Add newsfragment * Fix reason in hangup/reject * Change tense Co-authored-by: Hubert Chathi <hubertc@matrix.org> * Tense, typos & grammar Co-authored-by: Hubert Chathi <hubertc@matrix.org> * Linkify Co-authored-by: Hubert Chathi <hubertc@matrix.org> * Remove unnecessary parts from link Co-authored-by: Hubert Chathi <hubertc@matrix.org> * Capitalise Co-authored-by: Hubert Chathi <hubertc@matrix.org> * Fix hangup reasons * Clarify who can answer Co-authored-by: Hubert Chathi <hubertc@matrix.org> * Linkify Co-authored-by: Hubert Chathi <hubertc@matrix.org> * Remove reference to 'this MSC'. * Move common VoIP fields into a call event type. * Move common voip events to the content, not the actual event * Remove reason from reject event I confused myself, but it's not in the MSC and it shouldn't be. * Failure to YAML * Fix number of room members allowed when sending voip events. Co-authored-by: Hubert Chathi <hubertc@matrix.org> * Add 'added in' version Co-authored-by: Hubert Chathi <hubertc@matrix.org> * Another added-in Co-authored-by: Hubert Chathi <hubertc@matrix.org> * Add missing comma --------- Co-authored-by: Hubert Chathi <hubertc@matrix.org> Co-authored-by: Travis Ralston <travisr@matrix.org>
This commit is contained in:
parent
17ebdf7c86
commit
4fd9f39966
17 changed files with 426 additions and 69 deletions
|
@ -6,11 +6,128 @@ This module outlines how two users in a room can set up a Voice over IP
|
|||
WebRTC 1.0 standard. Call signalling is achieved by sending [message
|
||||
events](#events) to the room. In this version of the spec, only two-party
|
||||
communication is supported (e.g. between two peers, or between a peer
|
||||
and a multi-point conferencing unit). This means that clients MUST only
|
||||
send call events to rooms with exactly two participants.
|
||||
and a multi-point conferencing unit). Calls can take place in rooms with
|
||||
multiple members, but only two devices can take part in the call.
|
||||
|
||||
All VoIP events have a `version` field. This is used to determine whether
|
||||
devices support this new version of the protocol. For example, clients can use
|
||||
this field to know whether to expect an `m.call.select_answer` event from their
|
||||
opponent. If clients see events with `version` other than `0` or `"1"`
|
||||
(including, for example, the numeric value `1`), they should treat these the
|
||||
same as if they had `version` == `"1"`.
|
||||
|
||||
Note that this implies any and all future versions of VoIP events should be
|
||||
backwards-compatible. If it does become necessary to introduce a non
|
||||
backwards-compatible VoIP spec, the intention would be for it to simply use a
|
||||
separate set of event types.
|
||||
|
||||
#### Party Identifiers
|
||||
Whenever a client first participates in a new call, it generates a `party_id` for itself to use for the
|
||||
duration of the call. This needs to be long enough that the chance of a collision between multiple devices
|
||||
both generating an answer at the same time generating the same party ID is vanishingly small: 8 uppercase +
|
||||
lowercase alphanumeric characters is recommended. Parties in the call are identified by the tuple of
|
||||
`(user_id, party_id)`.
|
||||
|
||||
The client adds a `party_id` field containing this ID to the top-level of the content of all VoIP events
|
||||
it sends on the call, including `m.call.invite`. Clients use this to identify remote echo of their own
|
||||
events: since a user may call themselves, they cannot simply ignore events from their own user. This
|
||||
field also identifies different answers sent by different clients to an invite, and matches `m.call.candidates`
|
||||
events to their respective answer/invite.
|
||||
|
||||
A client implementation may choose to use the device ID used in end-to-end cryptography for this purpose,
|
||||
or it may choose, for example, to use a different one for each call to avoid leaking information on which
|
||||
devices were used in a call (in an unencrypted room) or if a single device (ie. access token) were used to
|
||||
send signalling for more than one call party.
|
||||
|
||||
A grammar for `party_id` is defined [below](#grammar-for-voip-ids).
|
||||
|
||||
#### Politeness
|
||||
In line with [WebRTC perfect negotiation](https://w3c.github.io/webrtc-pc/#perfect-negotiation-example)
|
||||
there are rules to establish which party is polite in the process of renegotiation. The callee is
|
||||
always the polite party. In a glare situation, the politenes of a party is therefore determined by
|
||||
whether the inbound or outbound call is used: if a client discards its outbound call in favour of
|
||||
an inbound call, it becomes the polite party.
|
||||
|
||||
#### Call Event Liveness
|
||||
`m.call.invite` contains a `lifetime` field that indicates how long the offer is valid for. When
|
||||
a client receives an invite, it should use the event's `age` field in the sync response plus the
|
||||
time since it received the event from the homeserver to determine whether the invite is still valid.
|
||||
The use of the `age` field ensures that incorrect clocks on client devices don't break calls.
|
||||
|
||||
If the invite is still valid *and will remain valid for long enough for the user to accept the call*,
|
||||
it should signal an incoming call. The amount of time allowed for the user to accept the call may
|
||||
vary between clients. For example, it may be longer on a locked mobile device than on an unlocked
|
||||
desktop device.
|
||||
|
||||
The client should only signal an incoming call in a given room once it has completed processing the
|
||||
entire sync response and, for encrypted rooms, attempted to decrypt all encrypted events in the
|
||||
sync response for that room. This ensures that if the sync response contains subsequent events that
|
||||
indicate the call has been hung up, rejected, or answered elsewhere, the client does not signal it.
|
||||
|
||||
If on startup, after processing locally stored events, the client determines that there is an invite
|
||||
that is still valid, it should still signal it but only after it has completed a sync from the homeserver.
|
||||
|
||||
The minimal recommended lifetime is 90 seconds - this should give the user enough time to actually pick
|
||||
up the call.
|
||||
|
||||
#### ICE Candidate Batching
|
||||
Clients should aim to send a small number of candidate events, with guidelines:
|
||||
* ICE candidates which can be discovered immediately or almost immediately in the invite/answer
|
||||
event itself (eg. host candidates). If server reflexive or relay candidates can be gathered in
|
||||
a sufficiently short period of time, these should be sent here too. A delay of around 200ms is
|
||||
suggested as a starting point.
|
||||
* The client should then allow some time for further candidates to be gathered in order to batch them,
|
||||
rather than sending each candidate as it arrives. A starting point of 2 seconds after sending the
|
||||
invite or 500ms after sending the answer is suggested as a starting point (since a delay is natural
|
||||
anyway after the invite whilst the client waits for the user to accept it).
|
||||
|
||||
#### End-of-candidates
|
||||
An ICE candidate whose value is the empty string means that no more ICE candidates will
|
||||
be sent. Clients must send such a candidate in an `m.call.candidates` message.
|
||||
The WebRTC spec requires browsers to generate such a candidate, however note that at time of writing,
|
||||
not all browsers do (Chrome does not, but does generate an `icegatheringstatechange` event). The
|
||||
client should send any remaining candidates once candidate generation finishes, ignoring timeouts above.
|
||||
This allows bridges to batch the candidates together when bridging to protocols that don't support
|
||||
trickle ICE.
|
||||
|
||||
#### DTMF
|
||||
Matrix clients can send DTMF as specified by WebRTC. The WebRTC standard as of August
|
||||
2020 does not support receiving DTMF but a Matrix client can receive and interpret the DTMF sent
|
||||
in the RTP payload.
|
||||
|
||||
#### Grammar for VoIP IDs
|
||||
`call_id`s and `party_id` are explicitly defined to be between 1 and 255 characters long, consisting
|
||||
of the characters `[0-9a-zA-Z._~-]`.
|
||||
|
||||
(Note that this matches the grammar of 'opaque IDs' from
|
||||
[MSC1597](https://github.com/matrix-org/matrix-spec-proposals/blob/rav/proposals/id_grammar/proposals/1597-id-grammar.md#opaque-ids),
|
||||
and that of the `id` property of the
|
||||
[`m.login.sso` flow schema](#definition-mloginsso-flow-schema).)
|
||||
|
||||
#### Behaviour on Room Leave
|
||||
If the client sees the user it is in a call with leave the room, the client should treat this
|
||||
as a hangup event for any calls that are in progress. No specific requirement is given for the
|
||||
situation where a client has sent an invite and the invitee leaves the room, but the client may
|
||||
wish to treat it as a rejection if there are no more users in the room who could answer the call
|
||||
(eg. the user is now alone or the `invitee` field was set on the invite).
|
||||
|
||||
The same behaviour applies when a client is looking at historic calls.
|
||||
|
||||
#### Supported Codecs
|
||||
The Matrix spec does not mandate particular audio or video codecs, but instead defers to the
|
||||
WebRTC spec. A compliant Matrix VoIP client will behave in the same way as a supported 'browser'
|
||||
in terms of what codecs it supports and what variants thereof. The latest WebRTC specification
|
||||
applies, so clients should keep up to date with new versions of the WebRTC specification whether
|
||||
or not there have been any changes to the Matrix spec.
|
||||
|
||||
#### Events
|
||||
|
||||
##### Common Fields
|
||||
|
||||
{{% event-fields event_type="call_event" %}}
|
||||
|
||||
##### Events
|
||||
|
||||
{{% event-group group_name="m.call" %}}
|
||||
|
||||
#### Client behaviour
|
||||
|
@ -25,6 +142,7 @@ A call is set up with message events exchanged as follows:
|
|||
[..candidates..] -------->
|
||||
[Answers call]
|
||||
<--------------- m.call.answer
|
||||
m.call.select_answer ----------->
|
||||
[Call is active and ongoing]
|
||||
<--------------- m.call.hangup
|
||||
```
|
||||
|
@ -42,6 +160,43 @@ Or a rejected call:
|
|||
|
||||
Calls are negotiated according to the WebRTC specification.
|
||||
|
||||
In response to an incoming invite, a client may do one of several things:
|
||||
* Attempt to accept the call by sending an `m.call.answer`.
|
||||
* Actively reject the call everywhere: send an `m.call.reject` as per above, which will stop the call from
|
||||
ringing on all the user's devices and the caller's client will inform them that the user has
|
||||
rejected their call.
|
||||
* Ignore the call: send no events, but stop alerting the user about the call. The user's other
|
||||
devices will continue to ring, and the caller's device will continue to indicate that the call
|
||||
is ringing, and will time the call out in the normal way if no other device responds.
|
||||
|
||||
##### Streams
|
||||
|
||||
Clients are expected to send one stream with one track of kind `audio` (creating a
|
||||
voice call). They can optionally send a second track in the same stream of kind
|
||||
`video` (creating a video call).
|
||||
|
||||
Clients implementing this specification use the first stream and will ignore
|
||||
any streamless tracks. Note that in the JavaScript WebRTC API, this means
|
||||
`addTrack()` must be passed two parameters: a track and a stream, not just a
|
||||
track, and in a video call the stream must be the same for both audio and video
|
||||
track.
|
||||
|
||||
A client may send other streams and tracks but the behaviour of the other party
|
||||
with respect to presenting such streams and tracks is undefined.
|
||||
|
||||
##### Invitees
|
||||
The `invitee` field should be added whenever the call is intended for one
|
||||
specific user, and should be set to the Matrix user ID of that user. Invites
|
||||
without an `invitee` field are defined to be intended for any member of the
|
||||
room other than the sender of the event.
|
||||
|
||||
Clients should consider an incoming call if they see a non-expired invite event where the `invitee` field is either
|
||||
absent or equal to their user's Matrix ID, however they should evaluate whether or not to ring based on their
|
||||
user's trust relationship with the callers and/or where the call was placed. As a starting point, it is
|
||||
suggested that clients ignore call invites from users in public rooms. It is strongly recommended that
|
||||
when clients do not ring for an incoming call invite, they still display the call invite in the room and
|
||||
annotate that it was ignored.
|
||||
|
||||
##### Glare
|
||||
|
||||
"Glare" is a problem which occurs when two users call each other at
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue