Clarify that arbitrary unicode is allowed in user/room IDs and room aliases (#1506)

Signed-off-by: Tulir Asokan <tulir@maunium.net>
Co-authored-by: Travis Ralston <travisr@matrix.org>
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
This commit is contained in:
Tulir Asokan 2025-01-22 12:33:34 +02:00 committed by GitHub
parent a1bdfaa167
commit cd6ae9e1a2
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
2 changed files with 20 additions and 3 deletions

View file

@ -0,0 +1 @@
Clarify that arbitrary unicode is allowed in user/room IDs and room aliases.

View file

@ -611,10 +611,18 @@ characters permitted in user ID localparts. There are currently active
users whose user IDs do not conform to the permitted character set, and
a number of rooms whose history includes events with a `sender` which
does not conform. In order to handle these rooms successfully, clients
and servers MUST accept user IDs with localparts from the expanded
character set:
and servers MUST accept user IDs with localparts consisting of any legal
non-surrogate Unicode code points except for `:` and `NUL` (U+0000), including other control
characters and the empty string.
extended_user_id_char = %x21-39 / %x3B-7E ; all ASCII printing chars except :
User IDs with localparts containing characters outside the range U+0021 to U+007E, or with
an empty localpart, are considered non-compliant. For current room versions, servers must
still accept events using such user IDs over federation; however they SHOULD NOT forward
such user IDs to clients when referenced outside the context of an event. For example,
device list updates from non-compliant user IDs would be dropped by the receiving server.
A future room version may prevent users using a historical character set
from participating. Use of the historical character set is *deprecated*.
##### Mapping from other character sets
@ -663,6 +671,11 @@ Room IDs are case-sensitive. They are not meant to be
human-readable. They are intended to be treated as fully opaque strings
by clients.
The localpart of a room ID (`opaque_id` above) may contain any valid
non-surrogate Unicode code points, including control characters, except `:` and `NUL`
(U+0000), but it is recommended to only include ASCII letters and
digits (`A-Z`, `a-z`, `0-9`) when generating them.
The length of a room ID, including the `!` sigil and the domain, MUST
NOT exceed 255 bytes.
@ -676,6 +689,9 @@ The `domain` of a room alias is the [server name](#server-name) of the
homeserver which created the alias. Other servers may contact this
homeserver to look up the alias.
The localpart of a room alias may contain any valid non-surrogate Unicode codepoints
except `:` and `NUL`.
The length of a room alias, including the `#` sigil and the domain, MUST
NOT exceed 255 bytes.