move matrix-generic documentation from synapse/docs into new matrix-doc project

This commit is contained in:
Matthew Hodgson 2014-10-09 20:30:40 +02:00
parent 06723d7465
commit 556e3f8a71
8 changed files with 3721 additions and 0 deletions

53
drafts/definitions.rst Normal file
View file

@ -0,0 +1,53 @@
Definitions
===========
# *Event* -- A JSON object that represents a piece of information to be
distributed to the the room. The object includes a payload and metadata,
including a `type` used to indicate what the payload is for and how to process
them. It also includes one or more references to previous events.
# *Event graph* -- Events and their references to previous events form a
directed acyclic graph. All events must be a descendant of the first event in a
room, except for a few special circumstances.
# *State event* -- A state event is an event that has a non-null string valued
`state_key` field. It may also include a `prev_state` key referencing exactly
one state event with the same type and state key, in the same event graph.
# *State tree* -- A state tree is a tree formed by a collection of state events
that have the same type and state key (all in the same event graph.
# *State resolution algorithm* -- An algorithm that takes a state tree as input
and selects a single leaf node.
# *Current state event* -- The leaf node of a given state tree that has been
selected by the state resolution algorithm.
# *Room state* / *state dictionary* / *current state* -- A mapping of the pair
(event type, state key) to the current state event for that pair.
# *Room* -- An event graph and its associated state dictionary. An event is in
the room if it is part of the event graph.
# *Topological ordering* -- The partial ordering that can be extracted from the
event graph due to it being a DAG.
(The state definitions are purposely slightly ill-defined, since if we allow
deleting events we might end up with multiple state trees for a given event
type and state key pair.)
Federation specific
-------------------
# *(Persistent data unit) PDU* -- An encoding of an event for distribution of
the server to server protocol.
# *(Ephemeral data unit) EDU* -- A piece of information that is sent between
servers and doesn't encode an event.
Client specific
---------------
# *Child events* -- Events that reference a single event in the same room
independently of the event graph.
# *Collapsed events* -- Events that have all child events that reference it
included in the JSON object.

79
drafts/human-id-rules.rst Normal file
View file

@ -0,0 +1,79 @@
This document outlines the format for human-readable IDs within matrix.
Overview
--------
UTF-8 is quickly becoming the standard character encoding set on the web. As
such, Matrix requires that all strings MUST be encoded as UTF-8. However,
using Unicode as the character set for human-readable IDs is troublesome. There
are many different characters which appear identical to each other, but would
identify different users. In addition, there are non-printable characters which
cannot be rendered by the end-user. This opens up a security vulnerability with
phishing/spoofing of IDs, commonly known as a homograph attack.
Web browers encountered this problem when International Domain Names were
introduced. A variety of checks were put in place in order to protect users. If
an address failed the check, the raw punycode would be displayed to disambiguate
the address. Similar checks are performed by home servers in Matrix. However,
Matrix does not use punycode representations, and so does not show raw punycode
on a failed check. Instead, home servers must outright reject these misleading
IDs.
Types of human-readable IDs
---------------------------
There are two main human-readable IDs in question:
- Room aliases
- User IDs
Room aliases look like ``#localpart:domain``. These aliases point to opaque
non human-readable room IDs. These pointers can change, so there is already an
issue present with the same ID pointing to a different destination at a later
date.
User IDs look like ``@localpart:domain``. These represent actual end-users, and
unlike room aliases, there is no layer of indirection. This presents a much
greater concern with homograph attacks.
Checks
------
- Similar to web browsers.
- blacklisted chars (e.g. non-printable characters)
- mix of language sets from 'preferred' language not allowed.
- Language sets from CLDR dataset.
- Treated in segments (localpart, domain)
- Additional restrictions for ease of processing IDs.
- Room alias localparts MUST NOT have ``#`` or ``:``.
- User ID localparts MUST NOT have ``@`` or ``:``.
Rejecting
---------
- Home servers MUST reject room aliases which do not pass the check, both on
GETs and PUTs.
- Home servers MUST reject user ID localparts which do not pass the check, both
on creation and on events.
- Any home server whose domain does not pass this check, MUST use their punycode
domain name instead of the IDN, to prevent other home servers rejecting you.
- Error code is ``M_FAILED_HUMAN_ID_CHECK``. (generic enough for both failing
due to homograph attacks, and failing due to including ``:`` s, etc)
- Error message MAY go into further information about which characters were
rejected and why.
- Error message SHOULD contain a ``failed_keys`` key which contains an array
of strings which represent the keys which failed the check e.g::
failed_keys: [ user_id, room_alias ]
Other considerations
--------------------
- Basic security: Informational key on the event attached by HS to say "unsafe
ID". Problem: clients can just ignore it, and since it will appear only very
rarely, easy to forget when implementing clients.
- Moderate security: Requires client handshake. Forces clients to implement
a check, else they cannot communicate with the misleading ID. However, this is
extra overhead in both client implementations and round-trips.
- High security: Outright rejection of the ID at the point of creation /
receiving event. Point of creation rejection is preferable to avoid the ID
entering the system in the first place. However, malicious HSes can just allow
the ID. Hence, other home servers must reject them if they see them in events.
Client never sees the problem ID, provided the HS is correctly implemented.
- High security decided; client doesn't need to worry about it, no additional
protocol complexity aside from rejection of an event.

View file

@ -0,0 +1,51 @@
State Resolution
================
This section describes why we need state resolution and how it works.
Motivation
-----------
We want to be able to associate some shared state with rooms, e.g. a room name
or members list. This is done by having a current state dictionary that maps
from the pair event type and state key to an event.
However, since the servers involved in the room are distributed we need to be
able to handle the case when two (or more) servers try and update the state at
the same time. This is done via the state resolution algorithm.
State Tree
------------
State events contain a reference to the state it is trying to replace. These
relations form a tree where the current state is one of the leaf nodes.
Note that state events are events, and so are part of the PDU graph. Thus we
can be sure that (modulo the internet being particularly broken) we will see
all state events eventually.
Algorithm requirements
----------------------
We want the algorithm to have the following properties:
- Since we aren't guaranteed what order we receive state events in, except that
we see parents before children, the state resolution algorithm must not depend
on the order and must always come to the same result.
- If we receive a state event whose parent is the current state, then the
algorithm will select it.
- The algorithm does not depend on internal state, ensuring all servers should
come to the same decision.
These three properties mean it is enough to keep track of the current state and
compare it with any new proposed state, rather than having to keep track of all
the leafs of the tree and recomputing across the entire state tree.
Current Implementation
----------------------
The current implementation works as follows: Upon receipt of a newly proposed
state change we first find the common ancestor. Then we take the maximum
across each branch of the users' power levels, if one is higher then it is
selected as the current state. Otherwise, we check if one chain is longer than
the other, if so we choose that one. If that also fails, then we concatenate
all the pdu ids and take a SHA1 hash and compare them to select a common
ancestor.