From c26ed86215f36fd4f2e6c8d11fc6544643a6f5d0 Mon Sep 17 00:00:00 2001 From: Hubert Chathi Date: Wed, 12 Oct 2016 23:20:55 -0400 Subject: [PATCH 01/33] s/vector/riot/ Replace references to Vector with Riot (when appropriate). --- supporting-docs/guides/2015-08-19-faq.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/supporting-docs/guides/2015-08-19-faq.md b/supporting-docs/guides/2015-08-19-faq.md index bc6ffb17..8e8c5528 100644 --- a/supporting-docs/guides/2015-08-19-faq.md +++ b/supporting-docs/guides/2015-08-19-faq.md @@ -368,7 +368,7 @@ letting the user interact with users and rooms anywhere within the Matrix federation.  Text and image messages are supported, and basic voice-only VoIP calling via WebRTC is supported in one-to-one rooms. (As of October 2015, experimental multi-way calling is also available -on Vector.im). +on Riot.im). ##### How do I connect my homeserver to the public Matrix network? @@ -563,28 +563,28 @@ Data is only shared between servers of participating users of a room. If all use ##### Where can I find a mobile app? -Vector is available for Android and iOS. +Riot is available for Android and iOS. The iOS version can be downloaded from the [Apple store](https://itunes.apple.com/us/app/vector.im/id1083446067). -The Android version can be downloaded from the [Google Play store](https://play.google.com/store/apps/details?id=im.vector.alpha) or [F-Droid](https://f-droid.org/repository/browse/?fdid=im.vector.alpha). If you are not sure which one to choose, install Vector from the [Google Play store](https://play.google.com/store/apps/details?id=im.vector.alpha). +The Android version can be downloaded from the [Google Play store](https://play.google.com/store/apps/details?id=im.vector.alpha) or [F-Droid](https://f-droid.org/repository/browse/?fdid=im.vector.alpha). If you are not sure which one to choose, install Riot from the [Google Play store](https://play.google.com/store/apps/details?id=im.vector.alpha). For the Android app, you can also install the latest development version built by [Jenkins](http://matrix.org/jenkins/job/VectorAndroidDevelop). Use it at your own risk and only if you know what you are doing. -##### I installed Vector via F-Droid, why is it draining my battery? +##### I installed Riot via F-Droid, why is it draining my battery? -The F-Droid release of Vector does not use [Google Cloud Messaging](https://developers.google.com/cloud-messaging/). This allows users that do not have or want Google Services installed to use Vector. +The F-Droid release of Riot does not use [Google Cloud Messaging](https://developers.google.com/cloud-messaging/). This allows users that do not have or want Google Services installed to use Riot. -The drawback is that Vector has to pull for new messages, which can drain your battery. To counter this, you can change the delay between polls in the settings. Higher delay means better battery life (but may delay receiving messages). You can also disable the background sync entirely (which means that you won't get any notifications at all). +The drawback is that Riot has to pull for new messages, which can drain your battery. To counter this, you can change the delay between polls in the settings. Higher delay means better battery life (but may delay receiving messages). You can also disable the background sync entirely (which means that you won't get any notifications at all). If you don't mind using Google Services, you might be better off installing the [Google Play store](https://play.google.com/store/apps/details?id=im.vector.alpha) version. ##### Where can I find a web app? -You can use [Vector.im](https://vector.im) - a glossy web client written on top of [matrix-react-sdk](https://github.com/matrix-org/matrix-react-sdk). +You can use [Riot.im](https://Riot.im) - a glossy web client written on top of [matrix-react-sdk](https://github.com/matrix-org/matrix-react-sdk). -You can also run Vector on your own server. It's a static web application, just download the [last release](https://github.com/vector-im/vector-web/) and unpack it. +You can also run Vector, the code that Riot.im uses, on your own server. It's a static web application, just download the [last release](https://github.com/vector-im/vector-web/) and unpack it. ##### Where can I find a desktop client? From 41f4661d1b73abee509aaab892b64b87fb2e6cd4 Mon Sep 17 00:00:00 2001 From: David Baker Date: Thu, 13 Oct 2016 15:14:29 +0100 Subject: [PATCH 02/33] Fix API path in pushrules examples --- specification/modules/push.rst | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/specification/modules/push.rst b/specification/modules/push.rst index 58c816b3..9bb65b96 100644 --- a/specification/modules/push.rst +++ b/specification/modules/push.rst @@ -545,14 +545,14 @@ Examples To create a rule that suppresses notifications for the room with ID ``!dj234r78wl45Gh4D:matrix.org``:: - curl -X PUT -H "Content-Type: application/json" "https://example.com/_matrix/client/api/%CLIENT_MAJOR_VERSION%/pushrules/global/room/%21dj234r78wl45Gh4D%3Amatrix.org?access_token=123456" -d \ + curl -X PUT -H "Content-Type: application/json" "https://example.com/_matrix/client/%CLIENT_MAJOR_VERSION%/pushrules/global/room/%21dj234r78wl45Gh4D%3Amatrix.org?access_token=123456" -d \ '{ "actions" : ["dont_notify"] }' To suppress notifications for the user ``@spambot:matrix.org``:: - curl -X PUT -H "Content-Type: application/json" "https://example.com/_matrix/client/api/%CLIENT_MAJOR_VERSION%/pushrules/global/sender/%40spambot%3Amatrix.org?access_token=123456" -d \ + curl -X PUT -H "Content-Type: application/json" "https://example.com/_matrix/client/%CLIENT_MAJOR_VERSION%/pushrules/global/sender/%40spambot%3Amatrix.org?access_token=123456" -d \ '{ "actions" : ["dont_notify"] }' @@ -560,7 +560,7 @@ To suppress notifications for the user ``@spambot:matrix.org``:: To always notify for messages that contain the work 'cake' and set a specific sound (with a rule_id of ``SSByZWFsbHkgbGlrZSBjYWtl``):: - curl -X PUT -H "Content-Type: application/json" "https://example.com/_matrix/client/api/%CLIENT_MAJOR_VERSION%/pushrules/global/content/SSByZWFsbHkgbGlrZSBjYWtl?access_token=123456" -d \ + curl -X PUT -H "Content-Type: application/json" "https://example.com/_matrix/client/%CLIENT_MAJOR_VERSION%/pushrules/global/content/SSByZWFsbHkgbGlrZSBjYWtl?access_token=123456" -d \ '{ "pattern": "cake", "actions" : ["notify", {"set_sound":"cakealarm.wav"}] @@ -569,7 +569,7 @@ sound (with a rule_id of ``SSByZWFsbHkgbGlrZSBjYWtl``):: To add a rule suppressing notifications for messages starting with 'cake' but ending with 'lie', superseding the previous rule:: - curl -X PUT -H "Content-Type: application/json" "https://example.com/_matrix/client/api/%CLIENT_MAJOR_VERSION%/pushrules/global/content/U3BvbmdlIGNha2UgaXMgYmVzdA?access_token=123456&before=SSByZWFsbHkgbGlrZSBjYWtl" -d \ + curl -X PUT -H "Content-Type: application/json" "https://example.com/_matrix/client/%CLIENT_MAJOR_VERSION%/pushrules/global/content/U3BvbmdlIGNha2UgaXMgYmVzdA?access_token=123456&before=SSByZWFsbHkgbGlrZSBjYWtl" -d \ '{ "pattern": "cake*lie", "actions" : ["notify"] @@ -579,7 +579,7 @@ To add a custom sound for notifications messages containing the word 'beer' in any rooms with 10 members or fewer (with greater importance than the room, sender and content rules):: - curl -X PUT -H "Content-Type: application/json" "https://example.com/_matrix/client/api/%CLIENT_MAJOR_VERSION%/pushrules/global/override/U2VlIHlvdSBpbiBUaGUgRHVrZQ?access_token=123456" -d \ + curl -X PUT -H "Content-Type: application/json" "https://example.com/_matrix/client/%CLIENT_MAJOR_VERSION%/pushrules/global/override/U2VlIHlvdSBpbiBUaGUgRHVrZQ?access_token=123456" -d \ '{ "conditions": [ {"kind": "event_match", "key": "content.body", "pattern": "beer" }, From 3dd0fcabb364645cd0477a9444b88771480157ab Mon Sep 17 00:00:00 2001 From: Richard van der Hoff Date: Thu, 13 Oct 2016 17:11:18 +0100 Subject: [PATCH 03/33] Render the body of response objects with inheritance --- .../matrix_templates/templates/http-api.tmpl | 6 + templating/matrix_templates/units.py | 203 ++++++++++-------- 2 files changed, 118 insertions(+), 91 deletions(-) diff --git a/templating/matrix_templates/templates/http-api.tmpl b/templating/matrix_templates/templates/http-api.tmpl index 7496ea72..ef796f4e 100644 --- a/templating/matrix_templates/templates/http-api.tmpl +++ b/templating/matrix_templates/templates/http-api.tmpl @@ -27,6 +27,12 @@ Request format: `No parameters` {% endif %} +{% if endpoint.res_headers|length > 0 -%} +Response headers: + +{{ tables.paramtable(endpoint.res_headers) }} +{% endif -%} + {% if endpoint.res_tables|length > 0 -%} Response format: diff --git a/templating/matrix_templates/units.py b/templating/matrix_templates/units.py index f362f63d..1eb465a3 100644 --- a/templating/matrix_templates/units.py +++ b/templating/matrix_templates/units.py @@ -152,18 +152,13 @@ def get_json_schema_object_fields(obj, enforce_title=False): del props[key_name] # Sometimes you just want to specify that a thing is an object without - # doing all the keys. Allow people to do that if they set a 'title'. - if not props and obj.get("title"): + # doing all the keys. + if not props: return [{ - "title": obj["title"], + "title": obj.get("title"), "no-table": True }] - if not props: - raise Exception( - "Object %s has no properties and no title" % obj - ) - required_keys = set(obj.get("required", [])) obj_title = obj.get("title") @@ -279,11 +274,7 @@ def process_prop(key_name, prop, required): "tables": tables, } - -def get_tables_for_schema(schema): - schema = inherit_parents(schema) - tables = get_json_schema_object_fields(schema) - +def deduplicate_tables(tables): # the result may contain duplicates, if objects are referred to more than # once. Filter them out. # @@ -305,6 +296,64 @@ def get_tables_for_schema(schema): return filtered +def get_tables_for_schema(schema): + schema = inherit_parents(schema) + tables = get_json_schema_object_fields(schema) + return deduplicate_tables(tables) + +def get_tables_for_response(api, schema): + schema = inherit_parents(schema) + resp_type = schema.get("type") + + if resp_type is None: + raise KeyError("Response definition for api '%s' missing 'type' field" + % (api)) + + resp_title = schema.get("title", "") + resp_description = schema.get("description", "") + + logger.debug("Found a 200 response for this API; type %s" % resp_type) + + if resp_type == "object": + tables = get_json_schema_object_fields( + schema, + enforce_title=False, + ) + + else: + nested_items = [] + if resp_type == "array": + items = inherit_parents(schema["items"]) + if items["type"] == "object": + nested_items = get_json_schema_object_fields( + items, + enforce_title=True, + ) + value_id = nested_items[0]["title"] + resp_type = "[%s]" % value_id + else: + raise Exception("Unsupported array response type [%s] for %s" % + (items["type"], api)) + + tables = [{ + "title": resp_title, + "rows": [{ + "key": "", + "type": resp_type, + "desc": resp_description, + }] + }] + nested_items + + res = deduplicate_tables(tables) + + if len(res) == 0: + logger.warn( + "This API appears to have no response table. Are you " + + "sure this API returns no parameters?" + ) + + return res + def get_example_for_schema(schema): """Returns a python object representing a suitable example for this object""" if 'example' in schema: @@ -349,13 +398,14 @@ class MatrixUnits(Units): "rate_limited": 429 in single_api.get("responses", {}), "req_param_by_loc": {}, "req_body_tables": [], + "res_headers": [], "res_tables": [], "responses": [], "example": { "req": "", } } - self.log(" ------- Endpoint: %s %s ------- " % (method, path)) + logger.info(" ------- Endpoint: %s %s ------- " % (method, path)) for param in single_api.get("parameters", []): param_loc = param["in"] if param_loc == "body": @@ -398,6 +448,24 @@ class MatrixUnits(Units): "example": example, }) + # add response params if this API has any. + if good_response: + if "schema" in good_response: + endpoint["res_tables"] = get_tables_for_response( + "%s %s" % (method, path), + good_response["schema"] + ) + if "headers" in good_response: + headers = [] + for (header_name, header) in good_response["headers"].iteritems(): + headers.append({ + "key": header_name, + "type": header["type"], + "desc": header["description"], + }) + endpoint["res_headers"] = headers + + # calculate the example request path_template = api.get("basePath", "").rstrip("/") + path qps = [] body = "" @@ -406,7 +474,7 @@ class MatrixUnits(Units): example = get_example_for_param(param) if not example: - self.log( + logger.warn( "The parameter %s is missing an example." % param["name"]) continue @@ -437,65 +505,6 @@ class MatrixUnits(Units): method.upper(), path_template, query_string ) - # add response params if this API has any. - if good_response: - self.log("Found a 200 response for this API") - res_type = Units.prop(good_response, "schema/type") - res_name = Units.prop(good_response, "schema/name") - if res_type and res_type not in ["object", "array"]: - # response is a raw string or something like that - good_table = { - "title": None, - "rows": [{ - "key": "<" + res_type + ">" if not res_name else res_name, - "type": res_type, - "desc": res.get("description", ""), - "req_str": "" - }] - } - if good_response.get("headers"): - for (header_name, header) in good_response.get("headers").iteritems(): - good_table["rows"].append({ - "key": header_name, - "type": "Header<" + header["type"] + ">", - "desc": header["description"], - "req_str": "" - }) - endpoint["res_tables"].append(good_table) - elif res_type and Units.prop(good_response, "schema/properties"): - # response is an object: - schema = good_response["schema"] - res_tables = get_tables_for_schema(schema) - endpoint["res_tables"].extend(res_tables) - elif res_type and Units.prop(good_response, "schema/items"): - # response is an array: - # FIXME: Doesn't recurse at all. - schema = good_response["schema"] - array_type = Units.prop(schema, "items/type") - if Units.prop(schema, "items/allOf"): - array_type = ( - Units.prop(schema, "items/title") - ) - endpoint["res_tables"].append({ - "title": schema.get("title", ""), - "rows": [{ - "key": "N/A", - "type": ("[%s]" % array_type), - "desc": schema.get("description", ""), - "req_str": "" - }] - }) - - for response_table in endpoint["res_tables"]: - self.log("Response: %s" % response_table["title"]) - for r in response_table["rows"]: - self.log("Row: %s" % r) - if len(endpoint["res_tables"]) == 0: - self.log( - "This API appears to have no response table. Are you " + - "sure this API returns no parameters?" - ) - endpoints.append(endpoint) return { @@ -512,22 +521,34 @@ class MatrixUnits(Units): :param dict endpoint_data dictionary of endpoint data to be updated """ try: - req_body_tables = get_tables_for_schema(param["schema"]) + schema = inherit_parents(param["schema"]) + if schema["type"] != "object": + logger.warn( + "Unsupported body type %s for %s %s", schema["type"], + endpoint_data["method"], endpoint_data["path"] + ) + return + + req_body_tables = get_tables_for_schema(schema) + + if req_body_tables == []: + # no fields defined for the body. + return + + # put the top-level parameters into 'req_param_by_loc', and the others + # into 'req_body_tables' + body_params = endpoint_data['req_param_by_loc'].setdefault("JSON body",[]) + body_params.extend(req_body_tables[0]["rows"]) + + body_tables = req_body_tables[1:] + endpoint_data['req_body_tables'].extend(body_tables) + except Exception, e: - logger.warning("Error decoding body of API endpoint %s %s" % - (endpoint_data["method"], endpoint_data["path"]), - exc_info=1) - return - - # put the top-level parameters into 'req_param_by_loc', and the others - # into 'req_body_tables' - body_params = endpoint_data['req_param_by_loc'].setdefault("JSON body",[]) - body_params.extend(req_body_tables[0]["rows"]) - - body_tables = req_body_tables[1:] - # TODO: remove this when PR #255 has landed - body_tables = (t for t in body_tables if not t.get('no-table')) - endpoint_data['req_body_tables'].extend(body_tables) + e2 = Exception( + "Error decoding body of API endpoint %s %s: %s" % + (endpoint_data["method"], endpoint_data["path"], e) + ) + raise e2, None, sys.exc_info()[2] def load_swagger_apis(self): @@ -536,7 +557,7 @@ class MatrixUnits(Units): for filename in os.listdir(path): if not filename.endswith(".yaml"): continue - self.log("Reading swagger API: %s" % filename) + logger.info("Reading swagger API: %s" % filename) filepath = os.path.join(path, filename) with open(filepath, "r") as f: # strip .yaml @@ -653,7 +674,7 @@ class MatrixUnits(Units): return schemata def read_event_schema(self, filepath): - self.log("Reading %s" % filepath) + logger.info("Reading %s" % filepath) with open(filepath, "r") as f: json_schema = yaml.load(f) From 93894ebbbe9a55675a89e0a3625d23661b1a5079 Mon Sep 17 00:00:00 2001 From: Richard van der Hoff Date: Thu, 13 Oct 2016 17:23:11 +0100 Subject: [PATCH 04/33] Fix spurious "None" in non-room events Events like m.direct and m.tag don't inherit from either Message event or State event, and were getting a "None" where there should have been a type. --- templating/matrix_templates/templates/events.tmpl | 4 ++++ templating/matrix_templates/units.py | 10 ++++------ 2 files changed, 8 insertions(+), 6 deletions(-) diff --git a/templating/matrix_templates/templates/events.tmpl b/templating/matrix_templates/templates/events.tmpl index f7a8263e..95edff57 100644 --- a/templating/matrix_templates/templates/events.tmpl +++ b/templating/matrix_templates/templates/events.tmpl @@ -2,9 +2,13 @@ ``{{event.type}}`` {{(4 + event.type | length) * title_kind}} + +{% if (event.typeof | length) %} *{{event.typeof}}* {{event.typeof_info}} +{% endif -%} + {{event.desc | wrap(80)}} {% for table in event.content_fields %} {{"``"+table.title+"``" if table.title else "" }} diff --git a/templating/matrix_templates/units.py b/templating/matrix_templates/units.py index 1eb465a3..7fbaee45 100644 --- a/templating/matrix_templates/units.py +++ b/templating/matrix_templates/units.py @@ -680,7 +680,7 @@ class MatrixUnits(Units): json_schema = yaml.load(f) schema = { - "typeof": None, + "typeof": "", "typeof_info": "", "type": None, "title": None, @@ -703,11 +703,9 @@ class MatrixUnits(Units): STATE_EVENT: "State Event" } if type(json_schema.get("allOf")) == list: - schema["typeof"] = base_defs.get( - json_schema["allOf"][0].get("$ref") - ) - elif json_schema.get("title"): - schema["typeof"] = json_schema["title"] + firstRef = json_schema["allOf"][0]["$ref"] + if firstRef in base_defs: + schema["typeof"] = base_defs[firstRef] json_schema = resolve_references(filepath, json_schema) From 22777970da45ca2d9d9c0172437114d64cf524a6 Mon Sep 17 00:00:00 2001 From: Richard van der Hoff Date: Thu, 13 Oct 2016 17:56:53 +0100 Subject: [PATCH 05/33] Fix speculator link the link to the 'latest version' was broken --- specification/client_server_api.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/specification/client_server_api.rst b/specification/client_server_api.rst index 2645b1d8..a5194a1d 100644 --- a/specification/client_server_api.rst +++ b/specification/client_server_api.rst @@ -45,7 +45,7 @@ Other versions of this specification The following other versions are also available, in reverse chronological order: -- `HEAD `_: Includes all changes since the latest versioned release. +- `HEAD `_: Includes all changes since the latest versioned release. - `r0.2.0 `_ - `r0.1.0 `_ - `r0.0.1 `_ From cfbee938b0285c68395c2d475ddc9a2831303372 Mon Sep 17 00:00:00 2001 From: Richard van der Hoff Date: Thu, 13 Oct 2016 18:06:43 +0100 Subject: [PATCH 06/33] changelog: Fix a couple of punctuations --- changelogs/client_server.rst | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/changelogs/client_server.rst b/changelogs/client_server.rst index fc539dc0..a22f3b7f 100644 --- a/changelogs/client_server.rst +++ b/changelogs/client_server.rst @@ -7,11 +7,11 @@ `underride` to `override`. This works with all known clients which support push rules, but any other clients implementing the push rules API should be aware of this change. This - makes it simple to mute rooms correctly in the API. + makes it simple to mute rooms correctly in the API (`#373 `_). - - Remove ``/tokenrefresh`` from the API. + - Remove ``/tokenrefresh`` from the API (`#395 `_). - - Remove requirement that tokens used in token-based login be macaroons. + - Remove requirement that tokens used in token-based login be macaroons (`#395 `_). - Changes to the API which will be backwards-compatible for clients: @@ -25,10 +25,10 @@ - Add top-level ``account_data`` key to the responses to ``GET /sync`` and ``GET /initialSync`` (`#380 `_). - - Add ``is_direct`` flag to |/createRoom|_ and invite member event. - Add 'Direct Messaging' module. + - Add ``is_direct`` flag to ``POST /createRoom`` and invite member event. + Add 'Direct Messaging' module (`#389 `_). - - Add ``contains_url`` option to ``RoomEventFilter``. + - Add ``contains_url`` option to ``RoomEventFilter`` (`#390 `_). - Add ``filter`` optional query param to ``/messages`` (`#390 `_). From c66a83c9ff47a5a13af7115e4d4b1c2026420131 Mon Sep 17 00:00:00 2001 From: Richard van der Hoff Date: Thu, 13 Oct 2016 22:10:14 +0100 Subject: [PATCH 07/33] Order props in the spec the same as the API Use an OrderedDict when reading the api docs so that properties defined in the API are rendered in the same order in the spec. --- templating/matrix_templates/units.py | 31 ++++++++++++++++++++-------- 1 file changed, 22 insertions(+), 9 deletions(-) diff --git a/templating/matrix_templates/units.py b/templating/matrix_templates/units.py index 7fbaee45..42beccc5 100644 --- a/templating/matrix_templates/units.py +++ b/templating/matrix_templates/units.py @@ -21,6 +21,7 @@ For the actual conversion of data -> RST (including templates), see the sections file instead. """ from batesian.units import Units +from collections import OrderedDict import logging import inspect import json @@ -48,6 +49,20 @@ STATE_EVENT = "core-event-schema/state_event.yaml" logger = logging.getLogger(__name__) +# a yaml Loader which loads mappings into OrderedDicts instead of regular +# dicts, so that we preserve the ordering of properties from the api files. +# +# with thanks to http://stackoverflow.com/a/21912744/637864 +class OrderedLoader(yaml.Loader): + pass +def construct_mapping(loader, node): + loader.flatten_mapping(node) + pairs = loader.construct_pairs(node) + return OrderedDict(pairs) +OrderedLoader.add_constructor( + yaml.resolver.BaseResolver.DEFAULT_MAPPING_TAG, + construct_mapping) + def resolve_references(path, schema): if isinstance(schema, dict): # do $ref first @@ -55,11 +70,11 @@ def resolve_references(path, schema): value = schema['$ref'] path = os.path.join(os.path.dirname(path), value) with open(path) as f: - ref = yaml.load(f) + ref = yaml.load(f, OrderedLoader) result = resolve_references(path, ref) del schema['$ref'] else: - result = {} + result = OrderedDict() for key, value in schema.items(): result[key] = resolve_references(path, value) @@ -194,12 +209,9 @@ def process_prop(key_name, prop, required): value_type = None desc = prop.get("description", "") - prop_type = prop.get('type') + prop_type = prop['type'] tables = [] - if prop_type is None: - raise KeyError("Property '%s' of object '%s' missing 'type' field" - % (key_name, obj)) logger.debug("%s is a %s", key_name, prop_type) if prop_type == "object": @@ -381,6 +393,7 @@ def get_example_for_param(param): return json.dumps(get_example_for_schema(param['schema']), indent=2) + class MatrixUnits(Units): def _load_swagger_meta(self, api, group_name): endpoints = [] @@ -563,7 +576,7 @@ class MatrixUnits(Units): # strip .yaml group_name = filename[:-5].replace("-", "_") group_name = "%s_%s" % (group_name, suffix) - api = yaml.load(f.read()) + api = yaml.load(f.read(), OrderedLoader) api = resolve_references(filepath, api) api["__meta"] = self._load_swagger_meta( api, group_name @@ -584,7 +597,7 @@ class MatrixUnits(Units): filepath = os.path.join(root, filename) with open(filepath) as f: try: - event_info = yaml.load(f) + event_info = yaml.load(f, OrderedLoader) except Exception as e: raise ValueError( "Error reading file %r" % (filepath,), e @@ -677,7 +690,7 @@ class MatrixUnits(Units): logger.info("Reading %s" % filepath) with open(filepath, "r") as f: - json_schema = yaml.load(f) + json_schema = yaml.load(f, OrderedLoader) schema = { "typeof": "", From 0db7eed69d6dbc23bfc2d3e21aab64416997b7d9 Mon Sep 17 00:00:00 2001 From: Hubert Chathi Date: Thu, 13 Oct 2016 18:00:44 -0400 Subject: [PATCH 08/33] add information about Perspectives add some information about how Perspectives works, link to their web site, and fix capitalization to match how the Perspectives Project capitalizes their name --- supporting-docs/guides/2015-08-19-faq.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/supporting-docs/guides/2015-08-19-faq.md b/supporting-docs/guides/2015-08-19-faq.md index 8e8c5528..9a9bbb73 100644 --- a/supporting-docs/guides/2015-08-19-faq.md +++ b/supporting-docs/guides/2015-08-19-faq.md @@ -492,12 +492,16 @@ Yes. Matrix is just a spec, so implementations of the spec are very welcome! It ##### How secure is this? -Server-server traffic is mandatorily TLS from the outset. Server-client traffic mandates transport layer encryption other than for tinkering. Servers maintain a public/private key pair, and sign the integrity of all messages in the context of the historical conversation, preventing tampering. Server keys are distributed using a PERSPECTIVES-style system. +Server-server traffic is mandatorily TLS from the outset. Server-client traffic mandates transport layer encryption other than for tinkering. Servers maintain a public/private key pair, and sign the integrity of all messages in the context of the historical conversation, preventing tampering. Server keys are distributed using a [Perspectives](https://perspectives-project.org/)-style system. End-to-end encryption is coming shortly to clients for both 1:1 and group chats to protect user data stored on servers, using the [Olm](https://matrix.org/git/olm) cryptographic ratchet implementation. As of October 2015 this is blocked on implementing the necessary key distribution and fingerprint management. Privacy of metadata is not currently protected from server administrators - a malicious homeserver administrator can see who is talking to who and when, but not what is being said (once E2E encryption is enabled). See [this presentation from Jardin Entropique](http://matrix.org/~matthew/2015-06-26%20Matrix%20Jardin%20Entropique.pdf) for a more comprehensive discussion of privacy in Matrix. +##### What is Perspectives? + +Rather than relying on Certificate Authorities (CAs) as in traditional SSL, a [Perspectives](https://perspectives-project.org/)-style system uses a more decentralized model for verifying keys. Perspectives uses notary servers to verify that the same key is seen across the network, making a man-in-the-middle attack much harder since an attacker must insert itself into multiple places. For federation in Matrix, each Home Server acts as a notary. When one Home Server connects to another Home Server that uses a key that it doesn't recognize, it contacts other Home Servers to ensure that they all see the same key from that Home Server. + ##### Why HTTP? Doesn't HTTP suck? HTTP is indeed not the most efficient transport, but it is ubiquitous, very well understood and has numerous implementations on almost every platform and language. It also has a simple upgrade path to HTTP/2, which is relatively bandwidth and round-trip efficient. From bfb65d8ceb80cee7fbcf36fdb5e9efad99d6b93d Mon Sep 17 00:00:00 2001 From: Richard van der Hoff Date: Thu, 13 Oct 2016 23:53:25 +0100 Subject: [PATCH 09/33] Serve the API docs with continuserv --- scripts/continuserv/main.go | 71 ++++++++++++++++++++++++++----------- 1 file changed, 51 insertions(+), 20 deletions(-) diff --git a/scripts/continuserv/main.go b/scripts/continuserv/main.go index 59a4cce2..b489e06a 100644 --- a/scripts/continuserv/main.go +++ b/scripts/continuserv/main.go @@ -84,10 +84,14 @@ func watchFS(ch chan struct{}, w *fsnotify.Watcher) { } func makeWalker(base string, w *fsnotify.Watcher) filepath.WalkFunc { - return func(path string, _ os.FileInfo, err error) error { + return func(path string, i os.FileInfo, err error) error { if err != nil { log.Fatalf("Error walking: %v", err) } + if !i.IsDir() { + // we set watches on directories, not files + return nil + } rel, err := filepath.Rel(base, path) if err != nil { @@ -129,20 +133,26 @@ func serve(w http.ResponseWriter, req *http.Request) { wg.Wait() wgMu.Unlock() - file := req.URL.Path - if file[0] == '/' { - file = file[1:] - } - if file == "" { - file = "index.html" - } m := toServe.Load().(bytesOrErr) if m.err != nil { w.Header().Set("Content-Type", "text/plain") w.Write([]byte(m.err.Error())) return } - b, ok := m.bytes[file] + + ok := true + var b []byte + + file := req.URL.Path + if file[0] == '/' { + file = file[1:] + } + b, ok = m.bytes[file] + + if ok && file == "api-docs.json" { + w.Header().Set("Access-Control-Allow-Origin", "*") + } + if ok { w.Header().Set("Content-Type", "text/html") w.Write([]byte(b)) @@ -153,18 +163,23 @@ func serve(w http.ResponseWriter, req *http.Request) { w.Write([]byte("Not found")) } -func populateOnce(dir string) { - defer wg.Done() - mu.Lock() - defer mu.Unlock() +func generate(dir string) (map[string][]byte, error) { cmd := exec.Command("python", "gendoc.py") cmd.Dir = path.Join(dir, "scripts") var b bytes.Buffer cmd.Stderr = &b err := cmd.Run() if err != nil { - toServe.Store(bytesOrErr{nil, fmt.Errorf("error generating spec: %v\nOutput from gendoc:\n%v", err, b.String())}) - return + return nil, fmt.Errorf("error generating spec: %v\nOutput from gendoc:\n%v", err, b.String()) + } + + // cheekily dump the swagger docs into the gen directory so that it is + // easy to serve + cmd = exec.Command("python", "dump-swagger.py", "gen/api-docs.json") + cmd.Dir = path.Join(dir, "scripts") + cmd.Stderr = &b + if err := cmd.Run(); err != nil { + return nil, fmt.Errorf("error generating api docs: %v\nOutput from dump-swagger:\n%v", err, b.String()) } files := make(map[string][]byte) @@ -190,12 +205,28 @@ func populateOnce(dir string) { return nil } - err = filepath.Walk(base, walker) - if err != nil { - toServe.Store(bytesOrErr{nil, fmt.Errorf("error reading spec: %v", err)}) - return + if err := filepath.Walk(base, walker); err != nil { + return nil, fmt.Errorf("error reading spec: %v", err) } - toServe.Store(bytesOrErr{files, nil}) + + // load the special index + indexpath := path.Join(dir, "scripts", "continuserv", "index.html") + bytes, err := ioutil.ReadFile(indexpath) + if err != nil { + return nil, fmt.Errorf("error reading index: %v", err) + } + files[""] = bytes + + return files, nil +} + +func populateOnce(dir string) { + defer wg.Done() + mu.Lock() + defer mu.Unlock() + + files, err := generate(dir) + toServe.Store(bytesOrErr{files, err}) } func doPopulate(ch chan struct{}, dir string) { From d16385a74fa3c763c45e472e58e208002c80eeb2 Mon Sep 17 00:00:00 2001 From: Richard van der Hoff Date: Thu, 13 Oct 2016 22:47:19 +0100 Subject: [PATCH 10/33] More ordering fixes We were breaking the ordering of objects defined by allOf reference --- templating/matrix_templates/units.py | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/templating/matrix_templates/units.py b/templating/matrix_templates/units.py index 42beccc5..146f1439 100644 --- a/templating/matrix_templates/units.py +++ b/templating/matrix_templates/units.py @@ -106,7 +106,7 @@ def inherit_parents(obj): for key in ('properties', 'additionalProperties', 'patternProperties'): if p.get(key): - result.setdefault(key, {}).update(p[key]) + result.setdefault(key, OrderedDict()).update(p[key]) return result @@ -368,11 +368,12 @@ def get_tables_for_response(api, schema): def get_example_for_schema(schema): """Returns a python object representing a suitable example for this object""" + schema = inherit_parents(schema) if 'example' in schema: example = schema['example'] return example if 'properties' in schema: - res = {} + res = OrderedDict() for prop_name, prop in schema['properties'].iteritems(): logger.debug("Parsing property %r" % prop_name) prop_example = get_example_for_schema(prop) From af84ca09a0554593436a716d14ff7dad968efbbd Mon Sep 17 00:00:00 2001 From: Richard van der Hoff Date: Fri, 14 Oct 2016 11:16:17 +0100 Subject: [PATCH 11/33] Better support for examples in responses Walk the response schema to generate examples. --- templating/matrix_templates/units.py | 27 ++++++++++++++++++++++----- 1 file changed, 22 insertions(+), 5 deletions(-) diff --git a/templating/matrix_templates/units.py b/templating/matrix_templates/units.py index 146f1439..ed18d972 100644 --- a/templating/matrix_templates/units.py +++ b/templating/matrix_templates/units.py @@ -384,16 +384,33 @@ def get_example_for_schema(schema): return schema.get('type', '??') def get_example_for_param(param): + """Returns a stringified example for a parameter""" if 'x-example' in param: return param['x-example'] schema = param.get('schema') if not schema: return None - if 'example' in schema: - return schema['example'] - return json.dumps(get_example_for_schema(param['schema']), - indent=2) + return json.dumps(get_example_for_schema(schema), indent=2) +def get_example_for_response(response): + """Returns a stringified example for a response""" + exampleobj = None + if 'examples' in response: + exampleobj = response["examples"].get("application/json") + # the openapi spec suggests that examples in the 'examples' section should + # be formatted as raw objects rather than json-formatted strings, but we + # have lots of the latter in our spec, which work with the swagger UI, + # so grandfather them in. + if isinstance(exampleobj, basestring): + return exampleobj + + if exampleobj is None: + schema = response.get('schema') + if schema: + exampleobj = get_example_for_schema(schema) + if exampleobj is None: + return None + return json.dumps(exampleobj, indent=2) class MatrixUnits(Units): def _load_swagger_meta(self, api, group_name): @@ -455,7 +472,7 @@ class MatrixUnits(Units): if not good_response and code == 200: good_response = res description = res.get("description", "") - example = res.get("examples", {}).get("application/json", "") + example = get_example_for_response(res) endpoint["responses"].append({ "code": code, "description": description, From b12b38d680f69f3d04bbf979e939ecf89ca29db3 Mon Sep 17 00:00:00 2001 From: Richard van der Hoff Date: Fri, 14 Oct 2016 11:47:34 +0100 Subject: [PATCH 12/33] regrandfather json-formatted example params --- templating/matrix_templates/units.py | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/templating/matrix_templates/units.py b/templating/matrix_templates/units.py index ed18d972..5205186e 100644 --- a/templating/matrix_templates/units.py +++ b/templating/matrix_templates/units.py @@ -390,7 +390,18 @@ def get_example_for_param(param): schema = param.get('schema') if not schema: return None - return json.dumps(get_example_for_schema(schema), indent=2) + + # allow examples for the top-level object to be in formatted json + exampleobj = None + if 'example' in schema: + exampleobj = schema['example'] + if isinstance(exampleobj, basestring): + return exampleobj + + if exampleobj is None: + exampleobj = get_example_for_schema(schema) + + return json.dumps(exampleobj, indent=2) def get_example_for_response(response): """Returns a stringified example for a response""" From 57611ec523056a20982523ab69df0a4e6da1747b Mon Sep 17 00:00:00 2001 From: Richard van der Hoff Date: Fri, 14 Oct 2016 12:20:00 +0100 Subject: [PATCH 13/33] More example formatting improvements Generate more realistic example integers, and do some sanity checks on other examples. --- templating/matrix_templates/units.py | 40 +++++++++++++++++++++------- 1 file changed, 31 insertions(+), 9 deletions(-) diff --git a/templating/matrix_templates/units.py b/templating/matrix_templates/units.py index 5205186e..6305dc91 100644 --- a/templating/matrix_templates/units.py +++ b/templating/matrix_templates/units.py @@ -372,16 +372,31 @@ def get_example_for_schema(schema): if 'example' in schema: example = schema['example'] return example - if 'properties' in schema: + + proptype = schema['type'] + + if proptype == 'object': + if 'properties' not in schema: + raise Exception('"object" property has neither properties nor example') res = OrderedDict() for prop_name, prop in schema['properties'].iteritems(): logger.debug("Parsing property %r" % prop_name) prop_example = get_example_for_schema(prop) res[prop_name] = prop_example return res - if 'items' in schema: + + if proptype == 'array': + if 'items' not in schema: + raise Exception('"array" property has neither items nor example') return [get_example_for_schema(schema['items'])] - return schema.get('type', '??') + + if proptype == 'integer': + return 0 + + if proptype == 'string': + return proptype + + raise Exception("Don't know to make an example %s" % proptype) def get_example_for_param(param): """Returns a stringified example for a parameter""" @@ -418,9 +433,14 @@ def get_example_for_response(response): if exampleobj is None: schema = response.get('schema') if schema: + if schema['type'] == 'file': + # no example for 'file' responses + return None exampleobj = get_example_for_schema(schema) + if exampleobj is None: return None + return json.dumps(exampleobj, indent=2) class MatrixUnits(Units): @@ -512,29 +532,31 @@ class MatrixUnits(Units): qps = [] body = "" for param in single_api.get("parameters", []): + paramname = param.get("name") try: example = get_example_for_param(param) if not example: logger.warn( - "The parameter %s is missing an example." % - param["name"]) + "The parameter %s is missing an example.", + paramname + ) continue if param["in"] == "path": path_template = path_template.replace( - "{%s}" % param["name"], urllib.quote(example) + "{%s}" % paramname, urllib.quote(example) ) elif param["in"] == "body": body = example elif param["in"] == "query": if type(example) == list: for value in example: - qps.append((param["name"], value)) + qps.append((paramname, value)) else: - qps.append((param["name"], example)) + qps.append((paramname, example)) except Exception, e: - raise Exception("Error handling parameter %s" % param["name"], + raise Exception("Error handling parameter %s" % paramname, e) query_string = "" if len(qps) == 0 else "?"+urllib.urlencode(qps) From dfbe4164907867db6273617561f17564c5da55ef Mon Sep 17 00:00:00 2001 From: Richard van der Hoff Date: Fri, 14 Oct 2016 15:26:12 +0100 Subject: [PATCH 14/33] Better types for additionalProps recurse down the definitions for additionalProps, so that the types are better --- event-schemas/schema/m.direct | 2 + templating/matrix_templates/units.py | 223 +++++++++------------------ 2 files changed, 79 insertions(+), 146 deletions(-) diff --git a/event-schemas/schema/m.direct b/event-schemas/schema/m.direct index b8a9cfc2..8cbbf38f 100644 --- a/event-schemas/schema/m.direct +++ b/event-schemas/schema/m.direct @@ -12,6 +12,8 @@ properties: additionalProperties: type: array title: User ID + items: + type: string type: object type: enum: diff --git a/templating/matrix_templates/units.py b/templating/matrix_templates/units.py index 6305dc91..66d5758d 100644 --- a/templating/matrix_templates/units.py +++ b/templating/matrix_templates/units.py @@ -119,42 +119,27 @@ def get_json_schema_object_fields(obj, enforce_title=False): "get_json_schema_object_fields: Object %s isn't an object." % obj ) - logger.debug("Processing object with title '%s'", obj.get("title")) + obj_title = obj.get("title") - if enforce_title and not obj.get("title"): + logger.debug("Processing object with title '%s'", obj_title) + + if enforce_title and not obj_title: # Force a default titile of "NO_TITLE" to make it obvious in the # specification output which parts of the schema are missing a title - obj["title"] = 'NO_TITLE' + obj_title = 'NO_TITLE' additionalProps = obj.get("additionalProperties") props = obj.get("properties") if additionalProps and not props: # not "really" an object, just a KV store - additionalProps = inherit_parents(additionalProps) - - logger.debug("%s is a pseudo-object", obj.get("title")) + logger.debug("%s is a pseudo-object", obj_title) key_type = additionalProps.get("x-pattern", "string") - - value_type = additionalProps["type"] - if value_type == "object": - nested_objects = get_json_schema_object_fields( - additionalProps, - enforce_title=True, - ) - value_type = nested_objects[0]["title"] - tables = [x for x in nested_objects if not x.get("no-table")] - else: - key_type = "string" - tables = [] - - tables = [{ - "title": "{%s: %s}" % (key_type, value_type), - "no-table": True - }]+tables - - logger.debug("%s done: returning %s", obj.get("title"), tables) - return tables + res = process_data_type(additionalProps) + return { + "type": "{%s: %s}" % (key_type, res["type"]), + "tables": res["tables"], + } if not props: props = obj.get("patternProperties") @@ -169,14 +154,13 @@ def get_json_schema_object_fields(obj, enforce_title=False): # Sometimes you just want to specify that a thing is an object without # doing all the keys. if not props: - return [{ - "title": obj.get("title"), - "no-table": True - }] + return { + "type": obj_title, + "tables": [], + } required_keys = set(obj.get("required", [])) - obj_title = obj.get("title") first_table_rows = [] tables = [] @@ -184,9 +168,14 @@ def get_json_schema_object_fields(obj, enforce_title=False): try: logger.debug("Processing property %s.%s", obj_title, key_name) required = key_name in required_keys - res = process_prop(key_name, props[key_name], required) + res = process_data_type(props[key_name], required) - first_table_rows.append(res["row"]) + first_table_rows.append({ + "key": key_name, + "type": res["type"], + "required": required, + "desc": res["desc"], + }) tables.extend(res["tables"]) logger.debug("Done property %s" % key_name) @@ -202,87 +191,64 @@ def get_json_schema_object_fields(obj, enforce_title=False): "rows": first_table_rows, }) - return tables + return { + "type": obj_title, + "tables": tables, + } -def process_prop(key_name, prop, required): + +# process a data type definition. returns a dictionary with the keys: +# type: stringified type name +# desc: description +# enum_desc: description of permissible enum fields +# is_object: true if the data type is an object +# tables: list of additional table definitions +def process_data_type(prop, required=False, enforce_title=True): prop = inherit_parents(prop) - value_type = None - desc = prop.get("description", "") prop_type = prop['type'] tables = [] - - logger.debug("%s is a %s", key_name, prop_type) + enum_desc = None + is_object = False if prop_type == "object": - nested_objects = get_json_schema_object_fields( + res = get_json_schema_object_fields( prop, - enforce_title=True, + enforce_title=enforce_title, ) - value_type = nested_objects[0]["title"] - value_id = value_type + prop_type = res["type"] + tables = res["tables"] + is_object = True - tables += [x for x in nested_objects if not x.get("no-table")] elif prop_type == "array": - items = inherit_parents(prop["items"]) - # if the items of the array are objects then recurse - if items["type"] == "object": - nested_objects = get_json_schema_object_fields( - items, - enforce_title=True, + nested = process_data_type(prop["items"]) + prop_type = "[%s]" % nested["type"] + tables = nested["tables"] + enum_desc = nested["enum_desc"] + + if prop.get("enum"): + if len(prop["enum"]) > 1: + prop_type = "enum" + enum_desc = ( + "One of: %s" % json.dumps(prop["enum"]) ) - value_id = nested_objects[0]["title"] - value_type = "[%s]" % value_id - tables += nested_objects else: - value_type = items["type"] - if isinstance(value_type, list): - value_type = " or ".join(value_type) - value_id = value_type - value_type = "[%s]" % value_type - array_enums = items.get("enum") - if array_enums: - if len(array_enums) > 1: - value_type = "[enum]" - desc += ( - " One of: %s" % json.dumps(array_enums) - ) - else: - desc += ( - " Must be '%s'." % array_enums[0] - ) - else: - value_type = prop_type - value_id = prop_type - if prop.get("enum"): - if len(prop["enum"]) > 1: - value_type = "enum" - if desc: - desc += " " - desc += ( - "One of: %s" % json.dumps(prop["enum"]) - ) - else: - if desc: - desc += " " - desc += ( - "Must be '%s'." % prop["enum"][0] - ) - if isinstance(value_type, list): - value_type = " or ".join(value_type) + enum_desc = ( + "Must be '%s'." % prop["enum"][0] + ) + + if isinstance(prop_type, list): + prop_type = " or ".join(prop_type) - if required: - desc = "**Required.** " + desc + rq = "**Required.**" if required else None + desc = " ".join(x for x in [rq, prop.get("description"), enum_desc] if x) return { - "row": { - "key": key_name, - "type": value_type, - "id": value_id, - "required": required, - "desc": desc, - }, + "type": prop_type, + "desc": desc, + "enum_desc": enum_desc, + "is_object": is_object, "tables": tables, } @@ -309,62 +275,28 @@ def deduplicate_tables(tables): return filtered def get_tables_for_schema(schema): - schema = inherit_parents(schema) - tables = get_json_schema_object_fields(schema) - return deduplicate_tables(tables) + pv = process_data_type(schema, enforce_title=False) + return deduplicate_tables(pv["tables"]) -def get_tables_for_response(api, schema): - schema = inherit_parents(schema) - resp_type = schema.get("type") - - if resp_type is None: - raise KeyError("Response definition for api '%s' missing 'type' field" - % (api)) - - resp_title = schema.get("title", "") - resp_description = schema.get("description", "") - - logger.debug("Found a 200 response for this API; type %s" % resp_type) - - if resp_type == "object": - tables = get_json_schema_object_fields( - schema, - enforce_title=False, - ) - - else: - nested_items = [] - if resp_type == "array": - items = inherit_parents(schema["items"]) - if items["type"] == "object": - nested_items = get_json_schema_object_fields( - items, - enforce_title=True, - ) - value_id = nested_items[0]["title"] - resp_type = "[%s]" % value_id - else: - raise Exception("Unsupported array response type [%s] for %s" % - (items["type"], api)) +def get_tables_for_response(schema): + pv = process_data_type(schema, enforce_title=False) + tables = deduplicate_tables(pv["tables"]) + # make up the first table, with just the 'body' row in, unless the response + # is an object, in which case there's little point in having one. + if not pv["is_object"]: tables = [{ - "title": resp_title, + "title": None, "rows": [{ "key": "", - "type": resp_type, - "desc": resp_description, + "type": pv["type"], + "desc": pv["desc"], }] - }] + nested_items + }] + tables - res = deduplicate_tables(tables) + logger.debug("response: %r" % tables) - if len(res) == 0: - logger.warn( - "This API appears to have no response table. Are you " + - "sure this API returns no parameters?" - ) - - return res + return tables def get_example_for_schema(schema): """Returns a python object representing a suitable example for this object""" @@ -514,7 +446,6 @@ class MatrixUnits(Units): if good_response: if "schema" in good_response: endpoint["res_tables"] = get_tables_for_response( - "%s %s" % (method, path), good_response["schema"] ) if "headers" in good_response: From 33191e5555fff9a40fcec9c2feef16f40a943847 Mon Sep 17 00:00:00 2001 From: Richard van der Hoff Date: Fri, 14 Oct 2016 15:26:59 +0100 Subject: [PATCH 15/33] Better examples --- templating/matrix_templates/units.py | 107 +++++++++++++-------------- 1 file changed, 51 insertions(+), 56 deletions(-) diff --git a/templating/matrix_templates/units.py b/templating/matrix_templates/units.py index 66d5758d..051b38d7 100644 --- a/templating/matrix_templates/units.py +++ b/templating/matrix_templates/units.py @@ -400,33 +400,60 @@ class MatrixUnits(Units): } } logger.info(" ------- Endpoint: %s %s ------- " % (method, path)) - for param in single_api.get("parameters", []): - param_loc = param["in"] - if param_loc == "body": - self._handle_body_param(param, endpoint) - continue + path_template = api.get("basePath", "").rstrip("/") + path + example_query_params = [] + example_body = "" + + for param in single_api.get("parameters", []): + # even body params should have names, otherwise the active docs don't work. param_name = param["name"] - # description - desc = param.get("description", "") - if param.get("required"): - desc = "**Required.** " + desc + try: + param_loc = param["in"] - # assign value expected for this param - val_type = param.get("type") # integer/string + if param_loc == "body": + self._handle_body_param(param, endpoint) + example_body = get_example_for_param(param) + continue - if param.get("enum"): - val_type = "enum" - desc += ( - " One of: %s" % json.dumps(param.get("enum")) - ) + # description + desc = param.get("description", "") + if param.get("required"): + desc = "**Required.** " + desc - endpoint["req_param_by_loc"].setdefault(param_loc, []).append({ - "key": param_name, - "type": val_type, - "desc": desc - }) + # assign value expected for this param + val_type = param.get("type") # integer/string + + if param.get("enum"): + val_type = "enum" + desc += ( + " One of: %s" % json.dumps(param.get("enum")) + ) + + endpoint["req_param_by_loc"].setdefault(param_loc, []).append({ + "key": param_name, + "type": val_type, + "desc": desc + }) + + example = get_example_for_param(param) + if example is None: + continue + + if param_loc == "path": + path_template = path_template.replace( + "{%s}" % param_name, urllib.quote(example) + ) + elif param_loc == "query": + if type(example) == list: + for value in example: + example_query_params.append((param_name, value)) + else: + example_query_params.append((param_name, example)) + + except Exception, e: + raise Exception("Error handling parameter %s" % param_name, e) # endfor[param] good_response = None @@ -458,42 +485,10 @@ class MatrixUnits(Units): }) endpoint["res_headers"] = headers - # calculate the example request - path_template = api.get("basePath", "").rstrip("/") + path - qps = [] - body = "" - for param in single_api.get("parameters", []): - paramname = param.get("name") - try: - example = get_example_for_param(param) - - if not example: - logger.warn( - "The parameter %s is missing an example.", - paramname - ) - continue - - if param["in"] == "path": - path_template = path_template.replace( - "{%s}" % paramname, urllib.quote(example) - ) - elif param["in"] == "body": - body = example - elif param["in"] == "query": - if type(example) == list: - for value in example: - qps.append((paramname, value)) - else: - qps.append((paramname, example)) - except Exception, e: - raise Exception("Error handling parameter %s" % paramname, - e) - - query_string = "" if len(qps) == 0 else "?"+urllib.urlencode(qps) - if body: + query_string = "" if len(example_query_params) == 0 else "?"+urllib.urlencode(example_query_params) + if example_body: endpoint["example"]["req"] = "%s %s%s HTTP/1.1\nContent-Type: application/json\n\n%s" % ( - method.upper(), path_template, query_string, body + method.upper(), path_template, query_string, example_body ) else: endpoint["example"]["req"] = "%s %s%s HTTP/1.1\n\n" % ( From d41438605d10ccd23e5b3cbe0eafe358d846886d Mon Sep 17 00:00:00 2001 From: Richard van der Hoff Date: Fri, 14 Oct 2016 17:56:26 +0100 Subject: [PATCH 16/33] Use matrix.org for swagger UI In the swagger UI, default to matrix.org rather than localhost, to make the UI more useful. --- scripts/dump-swagger.py | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/scripts/dump-swagger.py b/scripts/dump-swagger.py index ab534ce0..3b40bdb5 100755 --- a/scripts/dump-swagger.py +++ b/scripts/dump-swagger.py @@ -63,7 +63,8 @@ output = { "basePath": "/", "consumes": ["application/json"], "produces": ["application/json"], - "host": "localhost:8008", + "host": "matrix.org:8448", + "schemes": ["https"], "info": { "title": "Matrix Client-Server API", "version": release_label, From 2ec43a59484e2a3cc75d0919d57f4968195985d4 Mon Sep 17 00:00:00 2001 From: Richard van der Hoff Date: Fri, 14 Oct 2016 17:57:07 +0100 Subject: [PATCH 17/33] Add continuserv index Oops, forgot this when adding support for the API docs to continuserv in bfb65d8. --- scripts/continuserv/index.html | 15 +++++++++++++++ 1 file changed, 15 insertions(+) create mode 100644 scripts/continuserv/index.html diff --git a/scripts/continuserv/index.html b/scripts/continuserv/index.html new file mode 100644 index 00000000..f698c5b3 --- /dev/null +++ b/scripts/continuserv/index.html @@ -0,0 +1,15 @@ + + + + + From b6c59c137ac6aa51b6e5d1ceda04532b3a2a8b9a Mon Sep 17 00:00:00 2001 From: Richard van der Hoff Date: Mon, 17 Oct 2016 00:37:19 +0100 Subject: [PATCH 18/33] Add an entry to the FAQ about disabling e2e --- supporting-docs/guides/2015-08-19-faq.md | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/supporting-docs/guides/2015-08-19-faq.md b/supporting-docs/guides/2015-08-19-faq.md index 8e8c5528..b3ab172e 100644 --- a/supporting-docs/guides/2015-08-19-faq.md +++ b/supporting-docs/guides/2015-08-19-faq.md @@ -590,6 +590,28 @@ You can also run Vector, the code that Riot.im uses, on your own server. It's a There are several, but they don't have all the features that synapse has. Check the list of clients on [matrix.org](http://matrix.org/docs/projects/try-matrix-now.html#clients). +##### Why can't end-to-end encryption be turned off? + +When encryption is enabled in a room, a flag is set in the room state, so that +all clients know to encrypt any messages they send. The room state stores +information about the room like the topic, the avatar, and the membership list. + +Imagine if encryption could be turned off the same way as it is turned +on. Anyone with admin rights in the room could clear the flag and then messages +would start being transmitted unencrypted. It would be very easy for a user to +miss the change in configuration, and accidentally send a sensitive message +without encryption. + +Worse yet, anyone with sysadmin access to a server could also clear the flag +(remember that the main reason for using e2e encryption is that we don't trust +the sysadmins), and could then easily read any sensitive content which was +sent. + +The solution we have taken for now is to make clients ignore any requests to +disable encryption. We might experiment with ways to improve this in the future +- for instance, by alerting the user next time they try to send a message in +the room if encryption has been disabled. + | ### QUESTIONS TO BE ANSWERED! From e77dc0bd4c60e6d9628c1a66e4b321d0df77b87d Mon Sep 17 00:00:00 2001 From: Richard van der Hoff Date: Tue, 18 Oct 2016 16:01:01 +0100 Subject: [PATCH 19/33] Add E2E implementation guide --- .../guides/2016-10-18-e2e_implementation.rst | 658 ++++++++++++++++++ 1 file changed, 658 insertions(+) create mode 100644 supporting-docs/guides/2016-10-18-e2e_implementation.rst diff --git a/supporting-docs/guides/2016-10-18-e2e_implementation.rst b/supporting-docs/guides/2016-10-18-e2e_implementation.rst new file mode 100644 index 00000000..79ff5109 --- /dev/null +++ b/supporting-docs/guides/2016-10-18-e2e_implementation.rst @@ -0,0 +1,658 @@ +Implementing End-to-End Encryption in Matrix clients +==================================================== + +This guide is intended for authors of Matrix clients who wish to add +support for end-to-end encryption. It is highly recommended that readers +be familiar with the Matrix protocol and the use of access tokens before +proceeding. + +The libolm library +------------------ + +End-to-end encryption in Matrix is based on the Olm and Megolm +cryptographic ratchets. The recommended starting point for any client +authors is with the `libolm `__ library, +which contains implementations of all of the cryptographic primitives +required. The library itself is written in C/C++, but is architected in +a way which makes it easy to write wrappers for higher-level languages. + +Devices +------- + +We have a particular meaning for “device”. As a user, I might have +several devices (a desktop client, some web browsers, an Android device, +an iPhone, etc). When I first use a client, it should register itself as +a new device. If I log out and log in again as a different user, the +client must register as a new device. Critically, the client must create +a new set of keys (see below) for each “device”. + +The longevity of devices will depend on the client. In the web client, +we create a new device every single time you log in. In a mobile client, +it might be acceptable to reuse the device if a login session expires, +**provided** the user is the same. **Never** share keys between +different users. + +Devices are identified by their ``device_id`` (which is unique within +the scope of a given user). By default, the ``/login`` and ``/register`` +endpoints will auto-generate a ``device_id`` and return it in the +response; a client is also free to generate its own ``device_id`` or, as +above, reuse a device, in which case the client should pass the +``device_id`` in the request body. + +The lifetime of devices and ``access_token``\ s (technically: chains of +``refresh_token``\ s and ``access_token``\ s), are closely related. In +the simple case where a new device is created each time you log in, +there is a one-to-one mapping between a ``device_id`` and an +``access_token`` chain. If a client reuses a ``device_id`` when logging +in, there will be several ``access_token`` chains associated with a +given ``device_id`` - but still, we would expect only one of these to be +active at once (though we do not currently enforce that in Synapse). + +Keys used in End-to-End encryption +---------------------------------- + +There are a number of keys involved in encrypted communication: a +summary of them follows. + +Ed25519 fingerprint key pair +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Ed25519 is a public-key cryptographic system for signing messages. In +Matrix, each device has an Ed25519 key pair which serves to identify +that device. The private part of the key pair should never leave the +device, but the public part is published to the Matrix network. + +Curve25519 identity key pair +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Curve25519 is a public-key cryptographic system which can be used to +establish a shared secret. In Matrix, each device has a long-lived +Curve25519 identity key which is used to establish Olm sessions with +that device. Again, the private key should never leave the device, but +the public part is signed with the Ed25519 fingerprint key and published +to the network. + +Theoretically we should rotate the Curve25519 identity key from time to +time, but we haven't implemented this yet. + +Curve25519 one-time keys +~~~~~~~~~~~~~~~~~~~~~~~~ + +As well as the identity key, each device creates a number of Curve25519 +key pairs which are also used to establish Olm sessions, but can only be +used once. Once again, the private part remains on the device. + +At startup, Alice creates a number of one-time key pairs, and publishes +them to her homeserver. If Bob wants to establish an Olm session with +Alice, he needs to claim one of Alice’s one-time keys, and creates a new +one of his own. Those two keys, along with Alice’s and Bob’s identity +keys, are used in establishing an Olm session between Alice and Bob. + +Megolm encryption keys +~~~~~~~~~~~~~~~~~~~~~~ + +The Megolm key is used to encrypt group messages (in fact it is used to +derive an AES-256 key, and an HMAC-SHA-256 key). It is initialised with +random data. Each time a message is sent, a hash calculation is done on +the Megolm key to derive the key for the next message. It is therefore +possible to share the current state of the Megolm key with a user, +allowing them to decrypt future messages but not past messages. + +Ed25519 Megolm signing key pair +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +When a sender creates a Megolm session, he also creates another Ed25519 +signing key pair. This is used to sign messages sent via that Megolm +session, to authenticate the sender. Once again, the private part of the +key remains on the device. The public part is shared with other devices +in the room alongside the encryption key. + +Creating and registering device keys +------------------------------------ + +This process only happens once, when a device first starts. + +It must create the Ed25519 fingerprint key pair and the Curve25519 +identity key pair. This is done by calling ``olm_create_account`` in +libolm. The (base64-encoded) keys are retrieved by calling +``olm_account_identity_keys``. The account should be stored for future +use. + +It should then publish these keys to the homeserver. To do this, it +should construct a JSON object as follows: + +.. code:: json + + { + "algorithms": ["m.olm.v1.curve25519-aes-sha2", "m.megolm.v1.aes-sha2"], + "device_id": "", + "keys": { + "curve25519:": "", + "ed25519:": "" + }, + "user_id: " + } + +The object should be formatted as `Canonical +JSON `__, +then signed with ``olm_account_sign``; the signature should be added to +the JSON as ``signatures..ed25519:``. + +The signed JSON is then uploaded via +``POST /_matrix/client/unstable/keys/upload``. + +Creating and registering one-time keys +-------------------------------------- + +At first start, and at regular intervals +thereafter\ [#]_, the client should check how +many one-time keys the homeserver has stored for it, and, if necessary, +generate and upload some more. + +.. [#] Every 10 minutes is suggested. + +The number of one-time keys currently stored is returned by +``POST /_matrix/client/unstable/keys/upload``. (Post an empty JSON object +``{}`` if you don’t want to upload the device keys.) + +The maximum number of active keys supported by libolm is returned by +``olm_account_max_number_of_one_time_keys``. The client should try to +maintain about half this number on the homeserver. + +To generate new one-time keys: + +* Call ``olm_account_generate_one_time_keys`` to generate new keys + +* Call ``olm_account_one_time_keys`` to retrieve the unpublished keys. This + returns a JSON-formatted object with the single property ``curve25519``, + which is itself an object mapping key id to base64-encoded Curve25519 + key. For example: + + .. code:: json + + { + "curve25519": { + "AAAAAA": "wo76WcYtb0Vk/pBOdmduiGJ0wIEjW4IBMbbQn7aSnTo", + "AAAAAB": "LRvjo46L1X2vx69sS9QNFD29HWulxrmW11Up5AfAjgU" + } + } + +* Construct a JSON object as follows: + + .. code:: json + + { + "one_time_keys": { + "curve25519:": "", + ... + } + } + +* Upload the object via ``POST /_matrix/client/unstable/keys/upload``. (Unlike + the device keys, the one-time keys are **not** signed. + +* Call ``olm_account_mark_keys_as_published`` to tell the olm library not to + return the same keys from a future call to ``olm_account_one_time_keys``\. + +Configuring a room to use encryption +------------------------------------ + +To enable encryption in a room, a client should send a state event of +type ``m.room.encryption``, and content ``{ "algorithm": +"m.megolm.v1.aes-sha2" }``. + +Handling an ``m.room.encryption`` state event +--------------------------------------------- + +When a client receives an ``m.room.encryption`` event as above, it +should set a flag to indicate that messages sent in the room should be +encrypted. + +This flag should **not** be cleared if a later ``m.room.encryption`` +event changes the configuration. This is to avoid a situation where a +MITM can simply ask participants to disable encryption. In short: once +encryption is enabled in a room, it can never be disabled. + +Handling an ``m.room.encrypted`` event +-------------------------------------- + +Encrypted events have a type of ``m.room.encrypted``. They have a +content property ``algorithm`` which gives the encryption algorithm in +use, as well as other properties specific to the algorithm. + +The encrypted payload is a JSON object with the properties ``type`` +(giving the decrypted event type), and ``content`` (giving the decrypted +content). Depending on the algorithm in use, the payload may contain +additional keys. + +There are currently two defined algorithms: + +``m.olm.v1.curve25519-aes-sha2`` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Encrypted events using this algorithm should have a ``sender_key`` and a +``ciphertext`` property. + +The ``sender_key`` property of the event content gives the Curve25519 +identity key of the sender. Clients should maintain a list of known Olm +sessions for each device they speak to; it is recommended to index them +by Curve25519 identity key. + +Olm messages are encrypted separately for each recipient device. +``ciphertext`` is an object mapping from the Curve25519 identity key for +the recipient device. The receiving client should, of course, look for +its own identity key in this object. (If it isn't listed, the message +wasn't sent for it, and the client can't decrypt it; it should show an +error instead, or similar). + +This should result in an object with the properties ``type`` and +``body``. Messages of type '0' are 'prekey' messages which are used to +establish a new Olm session between two devices; type '1' are normal +messages which are used once a message has been received on the session. + +When a message (of either type) is received, a client should first +attempt to decrypt it with each of the known sessions for that sender. +There are two steps to this: + +- If (and only if) ``type==0``, the client should call + ``olm_matches_inbound_session`` with the session and ``body``. This + returns a flag indicating whether the message was encrypted using + that session. + +- The client calls ``olm_decrypt``, with the session, ``type``, and + ``body``. If this is successful, it returns the plaintext of the + event. + +If the client was unable to decrypt the message using any known sessions +(or if there are no known sessions yet), **and** the message had type 0, +**and** ``olm_matches_inbound_session`` wasn't true for any existing +sessions, then the client can try establishing a new session. This is +done as follows: + +- Call ``olm_create_inbound_session_from`` using the olm account, and + the ``sender_key`` and ``body`` of the message. + +- If the session was established successfully: + + - call ``olm_remove_one_time_keys`` to ensure that the same + one-time-key cannot be reused. + + - Call ``olm_decrypt`` with the new session + + - Store the session for future use + +At the end of this, the client will hopefully have successfully +decrypted the payload. + +As well as the ``type`` and ``content`` properties, the payload should +contain a ``keys`` property, which should be an object with a property +ed25519. The client should check that the value of this property matches +the sender's fingerprint key when `marking the event as verified`_ [#]_. + +.. [#] This prevents an attacker publishing someone else's curve25519 keys as + their own and subsequently claiming to have sent messages which they didn't + (see + https://github.com/vector-im/vector-web/issues/2215#issuecomment-247630155). + + +``m.megolm.v1.aes-sha2`` +~~~~~~~~~~~~~~~~~~~~~~~~ + +Encrypted events using this algorithm should have ``sender_key``, +``session_id`` and ``ciphertext`` content properties. If the +``room_id``, ``sender_key`` and ``session_id`` correspond to a known +Megolm session (see `below`__), the ciphertext can be +decrypted by passing the ciphertext into ``olm_group_decrypt``. + +__ `m.room_key`_ + +The client should check that the sender's fingerprint key matches the +``keys.ed25519`` property of the event which established the Megolm session +when `marking the event as verified`_. + +.. _`m.room_key`: + +Handling an ``m.room_key`` event +-------------------------------- + +These events contain key data to allow decryption of other messages. +They are sent to specific devices, so they appear in the ``to_device`` +section of the response to ``GET /_matrix/client/r0/sync``. They will +also be encrypted, so will need decrypting as above before they can be +seen. + +The event content will contain an 'algorithm' property, indicating the +encryption algorithm the key data is to be used for. Currently, this +will always be ``m.megolm.v1.aes-sha2``. + +Room key events for Megolm will also have ``room_id``, ``session_id``, and +``session_key`` keys. They are used to establish a Megolm session. The +``room_id`` identifies which room the session will be used in. The ``room_id``, +together with the ``sender_key`` of the ``room_key`` event before it was +decrypted, and the ``session_id``, uniquely identify a Megolm session. If they +do not represent a known session, the client should start a new inbound Megolm +session by calling ``olm_init_inbound_group_session`` with the ``session_key``. + +The client should remember the value of the keys property of the payload +of the encrypted ``m.room_key`` event and store it with the inbound +session. This is used as above when marking the event as verified. + +.. _`download the device list`: + +Downloading the device list for users in the room +------------------------------------------------- + +Before an encrypted message can be sent, it is necessary to retrieve the +list of devices for each user in the room. This can be done proactively, +or deferred until the first message is sent. The information is also +required to allow users to `verify or block devices`__. + +__ `blocking`_ + +The client should build a JSON query object as follows: + +.. code:: json + + { + "": {}, + ... + } + +Each member in the room should be included in the query. This is then +sent via ``POST /_matrix/client/unstable/keys/query.`` + +The result includes, for each listed user id, a map from device ID to an +object containing information on the device, as follows: + +.. code:: json + + { + "algorithms": [...], + "device_id": "", + "keys": { + "curve25519:": "", + "ed25519:": "" + }, + "signatures": { + "": { + "ed25519:": "" + }, + }, + "unsigned": { + "device_display_name": "" + }, + "user_id: " + } + +The client should first check the signature on this object. To do this, +it should remove the ``signatures`` and ``unsigned`` properties, format +the remainder as Canonical JSON, and pass the result into +``olm_ed25519_verify``, using the Ed25519 key for the ``key`` parameter, +and the corresponding signature for the ``signature`` parameter. If the +signature check fails, no further processing should be done on the +device. + +The client should check if the ``user_id``/``device_ie`` correspond to a device +it had seen previously. If it did, the client **must** check that the Ed25519 +key hasn't changed. Again, if it has changed, no further processing should be +done on the device. + +Otherwise the client stores the information about this device. + +Sending an encrypted event +-------------------------- + +When sending a message in a room `configured to use +encryption`__, a client first checks to see if it has +an active outbound Megolm session. If not, it first `creates one +as per below`__. + +__ `Configuring a room to use encryption`_ +__ `Starting a Megolm session`_ + +It then builds an encryption payload as follows: + +.. code:: json + + { + "type": "", + "content": "", + "room_id": "" + } + +and calls ``olm_group_encrypt`` to encrypt the payload. This is then packaged +into event content as follows: + +.. code:: json + + { + "algorithm": "m.megolm.v1.aes-sha2", + "sender_key": "", + "ciphertext": "", + "session_id": "", + "device_id": "" + } + +Finally, the encrypted event is sent to the room with ``POST +/_matrix/client/r0/rooms//send/m.room.encrypted/``. + +Starting a Megolm session +~~~~~~~~~~~~~~~~~~~~~~~~~ + +When a message is first sent in an encrypted room, the client should +start a new outbound Megolm session. This should **not** be done +proactively, to avoid proliferation of unnecessary Megolm sessions. + +To create the session, the client should call +``olm_init_outbound_group_session``, and store the details of the +outbound session for future use. + +The client should then call ``olm_outbound_group_session_id`` to get the +unique ID of the new session, and ``olm_outbound_group_session_key`` to +retrieve the current ratchet key and index. It should store these +details as an inbound session, just as it would when `receiving them via +an m.room_key event`__. + +__ `m.room_key`_ + +The client must then share the keys for this session with each device in the +room. It must therefore `download the device list`_ if it hasn't already done +so, and for each device in the room which has not been `blocked`__, the client +should: + +__ `blocking`_ + +* Build a content object as follows: + + .. code:: json + + { + "algorithm": "m.megolm.v1.aes-sha2", + "room_id": "", + "session_id": "", + "session_key": "" + } + +- Encrypt the content as an ``m.room_key`` event using Olm, as below. + +Once all of the key-sharing event contents have been assembled, the +events should be sent to the corresponding devices via +``PUT /_matrix/client/unstable/sendToDevice/m.room.encrypted/``. + +Encrypting an event with Olm +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Olm is not used for encrypting room events, as it requires a separate +copy of the ciphertext for each device, and because the receiving device +can only decrypt received messages once. However, it is used for +encrypting key-sharing events for Megolm. + +When encrypting an event using Olm, the client should: + +- Build an encryption payload as follows: + + .. code:: json + + { + "type": "", + "content": "", + "sender_device": "", + "keys": { + "ed25519": "" + } + } + +- Check if it has an existing Olm session; if it does not, `start a new + one`__. If it has several (as may happen due to + races when establishing sessions), it should use the one with the + first session_id when sorted by their ASCII codepoints (ie, 'A' + would be before 'Z', which would be before 'a'). + + __ `Starting an Olm session`_ + +- Encrypt the payload by calling ``olm_encrypt``. + +- Package the payload into event content as follows: + + .. code:: json + + { + "algorithm": "m.olm.v1.curve25519-aes-sha2", + "sender_key": "", + "ciphertext": "" + } + +Starting an Olm session +~~~~~~~~~~~~~~~~~~~~~~~ + +To start a new Olm session with another device, a client must first +claim one of the other device's one-time keys. To do this, it should +create a query object as follows: + +.. code:: json + + { + "": { + "": "curve25519", + ... + }, + ... + } + +and send this via ``POST /_matrix/client/unstable/keys/claim``. Claims +for multiple devices should be aggregated into a single request. + +This will return a result as follows: + +.. code:: json + + { + "": { + "": { + "curve25519:": "" + }, + ... + }, + ... + } + +The client should then pass this key, along with the Curve25519 Identity +key for the remote device, into ``olm_create_outbound_session``. + +Handling membership changes +--------------------------- + +The client should monitor rooms which are configured to use encryption for +membership changes. + +When a member leaves a room, the client should invalidate any active outbound +Megolm session, to ensure that a new session is used next time the user sends a +message. + +When a new member joins a room, the client should first `download the device +list`_ for the new member, if it doesn't already have it. + +After giving the user an opportunity to `block`__ any suspicious devices, the +client should share the keys for the outbound Megolm session with all the new +member's devices. This is done in the same way as `creating a new session`__, +except that there is no need to start a new Megolm session: due to the design +of the Megolm ratchet, the new user will only be able to decrypt messages +starting from the current state. The recommended method is to maintain a list +of members who are waiting for the session keys, and share them when the user +next sends a message. + +__ `blocking`_ +__ `Starting a Megolm session`_ + +Sending New Device announcements +-------------------------------- + +When a user logs in on a new device, it is necessary to make sure that +other devices in any rooms with encryption enabled are aware of the new +device. This is done as follows. + +Once the initial call to the ``/sync`` API completes, the client should +iterate through each room where encryption is enabled. For each user +(including the client's own user), it should build a content object as +follows: + +.. code:: json + + { + "device_id": "", + "rooms": ["", "", ... ] + } + +Once all of these have been constructed, they should be sent to all of the +relevant user's devices (using the wildcard ``*`` in place of the +``device_id``) via ``PUT +/_matrix/client/unstable/sendToDevice/m.new_device/.`` + +Handling an m.new_device event +------------------------------- + +As with ``m.room_key`` events, these will appear in the ``to_device`` +section of the ``/sync`` response. + +The client should `download the device list`_ of the sender, to get the details +of the new device. + +The event content will contain a ``rooms`` property, as well as the +``device_id`` of the new device. For each room in the list, the client +should check if encryption is enabled, and if the sender of the event is +a member of that room. If so, the client should share the keys for the +outbound Megolm session with the new device, in the same way as +`handling a new user in the room`__. + +__ `Handling membership changes`_ + +.. _`blocking`: + +Blocking / Verifying devices +---------------------------- + +It should be possible for a user to mark each device belonging to +another user as 'Blocked' or 'Verified'. + +When a user chooses to block a device, this means that no further +encrypted messages should be shared with that device. In short, it +should be excluded when sharing room keys when `starting a new Megolm +session <#_p5d1esx6gkrc>`__. Any active outbound Megolm sessions whose +keys have been shared with the device should also be invalidated so that +no further messages are sent over them. + +Verifying a device involves ensuring that the device belongs to the +claimed user. Currently this must be done by showing the user the +Ed25519 fingerprint key for the device, and prompting the user to verify +out-of-band that it matches the key shown on the other user's device. + +.. _`marking the event as verified`: + +Marking events as 'verified' +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Once a device has been verified, it is possible to verify that events +have been sent from a particular device. See the section on `Handling an +m.room.encrypted event`_ for notes on how to do this +for each algorithm. Events sent from a verified device can be decorated +in the UI to show that they have been sent from a verified device. From 21888b554285c1dd58d6288d34a5f5cf67fdfd7b Mon Sep 17 00:00:00 2001 From: Richard van der Hoff Date: Tue, 18 Oct 2016 16:05:17 +0100 Subject: [PATCH 20/33] e2e guide: formatting tweaks --- supporting-docs/guides/2016-10-18-e2e_implementation.rst | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/supporting-docs/guides/2016-10-18-e2e_implementation.rst b/supporting-docs/guides/2016-10-18-e2e_implementation.rst index 79ff5109..c89d558c 100644 --- a/supporting-docs/guides/2016-10-18-e2e_implementation.rst +++ b/supporting-docs/guides/2016-10-18-e2e_implementation.rst @@ -6,6 +6,8 @@ support for end-to-end encryption. It is highly recommended that readers be familiar with the Matrix protocol and the use of access tokens before proceeding. +.. contents:: + The libolm library ------------------ @@ -608,8 +610,8 @@ relevant user's devices (using the wildcard ``*`` in place of the ``device_id``) via ``PUT /_matrix/client/unstable/sendToDevice/m.new_device/.`` -Handling an m.new_device event -------------------------------- +Handling an ``m.new_device`` event +---------------------------------- As with ``m.room_key`` events, these will appear in the ``to_device`` section of the ``/sync`` response. From 703b782ea1aa9281d5bd5b2d22b7b33e9512a87f Mon Sep 17 00:00:00 2001 From: Richard van der Hoff Date: Tue, 18 Oct 2016 16:06:22 +0100 Subject: [PATCH 21/33] e2e guide: remove refs to refresh tokens refresh tokens are d34d --- supporting-docs/guides/2016-10-18-e2e_implementation.rst | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/supporting-docs/guides/2016-10-18-e2e_implementation.rst b/supporting-docs/guides/2016-10-18-e2e_implementation.rst index c89d558c..4f323630 100644 --- a/supporting-docs/guides/2016-10-18-e2e_implementation.rst +++ b/supporting-docs/guides/2016-10-18-e2e_implementation.rst @@ -41,12 +41,11 @@ response; a client is also free to generate its own ``device_id`` or, as above, reuse a device, in which case the client should pass the ``device_id`` in the request body. -The lifetime of devices and ``access_token``\ s (technically: chains of -``refresh_token``\ s and ``access_token``\ s), are closely related. In +The lifetime of devices and ``access_token``\ s are closely related. In the simple case where a new device is created each time you log in, there is a one-to-one mapping between a ``device_id`` and an -``access_token`` chain. If a client reuses a ``device_id`` when logging -in, there will be several ``access_token`` chains associated with a +``access_token``. If a client reuses a ``device_id`` when logging +in, there will be several ``access_token``\ s associated with a given ``device_id`` - but still, we would expect only one of these to be active at once (though we do not currently enforce that in Synapse). From 05ca311be37d1e49c55a19c3dd8ac3dcb242d073 Mon Sep 17 00:00:00 2001 From: Richard van der Hoff Date: Tue, 18 Oct 2016 18:03:20 +0100 Subject: [PATCH 22/33] Put the E2E guide under guides --- supporting-docs/guides/2016-10-18-e2e_implementation.rst | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/supporting-docs/guides/2016-10-18-e2e_implementation.rst b/supporting-docs/guides/2016-10-18-e2e_implementation.rst index 79ff5109..60a5e15b 100644 --- a/supporting-docs/guides/2016-10-18-e2e_implementation.rst +++ b/supporting-docs/guides/2016-10-18-e2e_implementation.rst @@ -1,3 +1,9 @@ +--- +layout: post +title: End-to-End Encryption implementation guide +categories: guides +--- + Implementing End-to-End Encryption in Matrix clients ==================================================== From 157e51fbc9d64d999db8a423a49cedd5aaa9e22f Mon Sep 17 00:00:00 2001 From: Richard van der Hoff Date: Tue, 18 Oct 2016 20:36:36 +0100 Subject: [PATCH 23/33] E2E impl guide: check ids in device query Update the E2E impl guide to note that the user_id and device_id returned from a device query need to be checked. --- supporting-docs/guides/2016-10-18-e2e_implementation.rst | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/supporting-docs/guides/2016-10-18-e2e_implementation.rst b/supporting-docs/guides/2016-10-18-e2e_implementation.rst index 7ea0fd72..bc66f21a 100644 --- a/supporting-docs/guides/2016-10-18-e2e_implementation.rst +++ b/supporting-docs/guides/2016-10-18-e2e_implementation.rst @@ -399,13 +399,19 @@ and the corresponding signature for the ``signature`` parameter. If the signature check fails, no further processing should be done on the device. -The client should check if the ``user_id``/``device_ie`` correspond to a device +The client must also check that the ``user_id`` and ``device_id`` fields in the +object match those in the top-level map [#]_. + +The client should check if the ``user_id``/``device_id`` correspond to a device it had seen previously. If it did, the client **must** check that the Ed25519 key hasn't changed. Again, if it has changed, no further processing should be done on the device. Otherwise the client stores the information about this device. +.. [#] This prevents a malicious or compromised homeserver replacing the keys + for the device with those of another. + Sending an encrypted event -------------------------- From 4368134970b341c54ad85ed2bfda43f0c40a5953 Mon Sep 17 00:00:00 2001 From: Richard van der Hoff Date: Tue, 18 Oct 2016 20:41:16 +0100 Subject: [PATCH 24/33] Remove spurious backslashes --- supporting-docs/guides/2016-10-18-e2e_implementation.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/supporting-docs/guides/2016-10-18-e2e_implementation.rst b/supporting-docs/guides/2016-10-18-e2e_implementation.rst index 7ea0fd72..3a5c0026 100644 --- a/supporting-docs/guides/2016-10-18-e2e_implementation.rst +++ b/supporting-docs/guides/2016-10-18-e2e_implementation.rst @@ -377,8 +377,8 @@ object containing information on the device, as follows: "algorithms": [...], "device_id": "", "keys": { - "curve25519:": "", - "ed25519:": "" + "curve25519:": "", + "ed25519:": "" }, "signatures": { "": { From 657525d0f4d6a0734c9e5474265d64519cc5a6b4 Mon Sep 17 00:00:00 2001 From: Richard van der Hoff Date: Wed, 19 Oct 2016 17:06:52 +0100 Subject: [PATCH 25/33] E2E impl guide: Document unknown key-share mitigations Document the fields to be added to Olm and the checks to be done to mitigate the unknown key-share attacks. --- .../guides/2016-10-18-e2e_implementation.rst | 36 ++++++++++++++----- 1 file changed, 28 insertions(+), 8 deletions(-) diff --git a/supporting-docs/guides/2016-10-18-e2e_implementation.rst b/supporting-docs/guides/2016-10-18-e2e_implementation.rst index 3a5c0026..83f4229d 100644 --- a/supporting-docs/guides/2016-10-18-e2e_implementation.rst +++ b/supporting-docs/guides/2016-10-18-e2e_implementation.rst @@ -292,15 +292,30 @@ At the end of this, the client will hopefully have successfully decrypted the payload. As well as the ``type`` and ``content`` properties, the payload should -contain a ``keys`` property, which should be an object with a property -ed25519. The client should check that the value of this property matches -the sender's fingerprint key when `marking the event as verified`_ [#]_. +contain a number of other properties. Each of these should be checked as +follows [#]_. -.. [#] This prevents an attacker publishing someone else's curve25519 keys as - their own and subsequently claiming to have sent messages which they didn't - (see - https://github.com/vector-im/vector-web/issues/2215#issuecomment-247630155). +``sender`` + The user ID of the sender. The client should check that this matches the + ``sender`` in the event. +``recipient`` + The user ID of the recipient. The client should check that this matches the + local user ID. + +``keys`` + an object with a property ``ed25519``, The client should check that the + value of this property matches the sender's fingerprint key when `marking + the event as verified`_\ . + +``recipient_keys`` + + an object with a property ``ed25519``. The client should check that the + value of this property matches its own fingerprint key. + +.. [#] These tests prevent an attacker publishing someone else's curve25519 + keys as their own and subsequently claiming to have sent messages which they + didn't. ``m.megolm.v1.aes-sha2`` ~~~~~~~~~~~~~~~~~~~~~~~~ @@ -503,10 +518,15 @@ When encrypting an event using Olm, the client should: { "type": "", "content": "", + "sender": "", "sender_device": "", "keys": { "ed25519": "" - } + }, + "recipient": "", + "recipient_keys": { + "ed25519": "" + }, } - Check if it has an existing Olm session; if it does not, `start a new From c576a72673367267a8a7c14e87ab75cdf86885fc Mon Sep 17 00:00:00 2001 From: Richard van der Hoff Date: Fri, 21 Oct 2016 13:40:25 +0100 Subject: [PATCH 26/33] E2E impl guide: Add details on rotating megolm sessions --- .../guides/2016-10-18-e2e_implementation.rst | 39 ++++++++++++++++--- 1 file changed, 34 insertions(+), 5 deletions(-) diff --git a/supporting-docs/guides/2016-10-18-e2e_implementation.rst b/supporting-docs/guides/2016-10-18-e2e_implementation.rst index 3a5c0026..ad78e8c9 100644 --- a/supporting-docs/guides/2016-10-18-e2e_implementation.rst +++ b/supporting-docs/guides/2016-10-18-e2e_implementation.rst @@ -208,6 +208,9 @@ To enable encryption in a room, a client should send a state event of type ``m.room.encryption``, and content ``{ "algorithm": "m.megolm.v1.aes-sha2" }``. +.. |m.room.encryption| replace:: ``m.room.encryption`` +.. _`m.room.encryption`: + Handling an ``m.room.encryption`` state event --------------------------------------------- @@ -220,6 +223,14 @@ event changes the configuration. This is to avoid a situation where a MITM can simply ask participants to disable encryption. In short: once encryption is enabled in a room, it can never be disabled. +The event should contain an ``algorithm`` property which defines which +encryption algorithm should be used for encryption. Currently only +``m.megolm.v1-aes-sha2`` is permitted here. + +The event may also include other settings for how messages sent in the room +should be encrypted (for example, ``rotation_period_ms`` to define how often +the session should be replaced). + Handling an ``m.room.encrypted`` event -------------------------------------- @@ -409,15 +420,16 @@ Otherwise the client stores the information about this device. Sending an encrypted event -------------------------- -When sending a message in a room `configured to use -encryption`__, a client first checks to see if it has -an active outbound Megolm session. If not, it first `creates one -as per below`__. +When sending a message in a room `configured to use encryption`__, a client +first checks to see if it has an active outbound Megolm session. If not, it +first `creates one as per below`__. If an outbound session exists, it should +check if it is time to `rotate`__ it, and create a new one if so. __ `Configuring a room to use encryption`_ __ `Starting a Megolm session`_ +__ `Rotating Megolm sessions`_ -It then builds an encryption payload as follows: +The client then builds an encryption payload as follows: .. code:: json @@ -486,6 +498,23 @@ Once all of the key-sharing event contents have been assembled, the events should be sent to the corresponding devices via ``PUT /_matrix/client/unstable/sendToDevice/m.room.encrypted/``. +Rotating Megolm sessions +~~~~~~~~~~~~~~~~~~~~~~~~ + +Megolm sessions may not be reused indefinitely. + +The number of messages which can be sent before a session should be rotated is +given by the ``rotation_period_msgs`` property of the |m.room.encryption|_ +event, or ``100`` if that property isn't present. + +Similarly, the maximum age of a megolm session is given, in milliseconds, by +the ``rotation_period_ms`` property of the ``m.room.encryption`` +event. ``604800000`` (a week) is the recommended default here. + +Once either the message limit or time limit have been reached, the client +should start a new session before sending any more messages. + + Encrypting an event with Olm ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From 8641ef299e74f7535fe31add91efe7bb56c21f2b Mon Sep 17 00:00:00 2001 From: Mark Haines Date: Fri, 21 Oct 2016 15:25:33 +0100 Subject: [PATCH 27/33] Document the requirement that clients track the message_index --- supporting-docs/guides/2016-10-18-e2e_implementation.rst | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/supporting-docs/guides/2016-10-18-e2e_implementation.rst b/supporting-docs/guides/2016-10-18-e2e_implementation.rst index 30876df9..164888ab 100644 --- a/supporting-docs/guides/2016-10-18-e2e_implementation.rst +++ b/supporting-docs/guides/2016-10-18-e2e_implementation.rst @@ -326,6 +326,11 @@ Encrypted events using this algorithm should have ``sender_key``, Megolm session (see `below`__), the ciphertext can be decrypted by passing the ciphertext into ``olm_group_decrypt``. +In order to avoid replay attacks a client should remember the megolm +``message_index`` of each event they decrypt for each session. If the client +decrypts an event with the same ``message_index`` as one that it has already +decrypted using that session then it should fail decryption. + __ `m.room_key`_ The client should check that the sender's fingerprint key matches the From 6a5b66d2d8d802a770b6b7b4e44abd63d2d3d842 Mon Sep 17 00:00:00 2001 From: Mark Haines Date: Fri, 21 Oct 2016 15:48:44 +0100 Subject: [PATCH 28/33] Document the where the client gets the message index from --- supporting-docs/guides/2016-10-18-e2e_implementation.rst | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/supporting-docs/guides/2016-10-18-e2e_implementation.rst b/supporting-docs/guides/2016-10-18-e2e_implementation.rst index 164888ab..a215e6d2 100644 --- a/supporting-docs/guides/2016-10-18-e2e_implementation.rst +++ b/supporting-docs/guides/2016-10-18-e2e_implementation.rst @@ -327,9 +327,10 @@ Megolm session (see `below`__), the ciphertext can be decrypted by passing the ciphertext into ``olm_group_decrypt``. In order to avoid replay attacks a client should remember the megolm -``message_index`` of each event they decrypt for each session. If the client -decrypts an event with the same ``message_index`` as one that it has already -decrypted using that session then it should fail decryption. +``message_index`` returned by ``olm_group_decrypt`` of each event they decrypt +for each session. If the client decrypts an event with the same +``message_index`` as one that it has already decrypted using that session then +it should fail decryption. __ `m.room_key`_ From cbf94c88c2f823bbfbe385045ec92fa001e6e588 Mon Sep 17 00:00:00 2001 From: Mark Haines Date: Fri, 21 Oct 2016 15:50:54 +0100 Subject: [PATCH 29/33] Move the __ to where it should be --- supporting-docs/guides/2016-10-18-e2e_implementation.rst | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/supporting-docs/guides/2016-10-18-e2e_implementation.rst b/supporting-docs/guides/2016-10-18-e2e_implementation.rst index a215e6d2..02174c2f 100644 --- a/supporting-docs/guides/2016-10-18-e2e_implementation.rst +++ b/supporting-docs/guides/2016-10-18-e2e_implementation.rst @@ -326,13 +326,13 @@ Encrypted events using this algorithm should have ``sender_key``, Megolm session (see `below`__), the ciphertext can be decrypted by passing the ciphertext into ``olm_group_decrypt``. +__ `m.room_key`_ + In order to avoid replay attacks a client should remember the megolm ``message_index`` returned by ``olm_group_decrypt`` of each event they decrypt for each session. If the client decrypts an event with the same -``message_index`` as one that it has already decrypted using that session then -it should fail decryption. - -__ `m.room_key`_ +``message_index`` as one that it has already received using that session then +it should treat the message as invalid. The client should check that the sender's fingerprint key matches the ``keys.ed25519`` property of the event which established the Megolm session From f0f6ea0cb3073d34d06e1f158cfb95b30f9d6bbb Mon Sep 17 00:00:00 2001 From: Richard van der Hoff Date: Mon, 24 Oct 2016 13:52:34 +0100 Subject: [PATCH 30/33] E2e impl guide: sign one-time keys We now sign our one-time keys. --- .../guides/2016-10-18-e2e_implementation.rst | 91 +++++++++++++++---- 1 file changed, 71 insertions(+), 20 deletions(-) diff --git a/supporting-docs/guides/2016-10-18-e2e_implementation.rst b/supporting-docs/guides/2016-10-18-e2e_implementation.rst index 017838c4..a018a410 100644 --- a/supporting-docs/guides/2016-10-18-e2e_implementation.rst +++ b/supporting-docs/guides/2016-10-18-e2e_implementation.rst @@ -132,18 +132,18 @@ should construct a JSON object as follows: { "algorithms": ["m.olm.v1.curve25519-aes-sha2", "m.megolm.v1.aes-sha2"], - "device_id": "", + "device_id": "", "keys": { - "curve25519:": "", - "ed25519:": "" + "curve25519:": "", + "ed25519:": "" }, - "user_id: " + "user_id: " } The object should be formatted as `Canonical JSON `__, then signed with ``olm_account_sign``; the signature should be added to -the JSON as ``signatures..ed25519:``. +the JSON as ``signatures..ed25519:``. The signed JSON is then uploaded via ``POST /_matrix/client/unstable/keys/upload``. @@ -168,7 +168,7 @@ maintain about half this number on the homeserver. To generate new one-time keys: -* Call ``olm_account_generate_one_time_keys`` to generate new keys +* Call ``olm_account_generate_one_time_keys`` to generate new keys. * Call ``olm_account_one_time_keys`` to retrieve the unpublished keys. This returns a JSON-formatted object with the single property ``curve25519``, @@ -184,22 +184,60 @@ To generate new one-time keys: } } -* Construct a JSON object as follows: +* Each key should be signed with the account key. To do this: + + * Construct a JSON object as follows: + + .. code:: json + + { + "key": "" + } + + * Call ``olm_account_sign`` to calculate the signature. + + * Add the signature should be added to the JSON as + ``signatures..ed25519:``. + + * The complete key object should now look like: + + .. code:: json + + { + "key": "wo76WcYtb0Vk/pBOdmduiGJ0wIEjW4IBMbbQn7aSnTo", + "signatures": { + "@alice:example.com": { + "ed25519:JLAFKJWSCS": "dSO80A01XiigH3uBiDVx/EjzaoycHcjq9lfQX0uWsqxl2giMIiSPR8a4d291W1ihKJL/a+myXS367WT6NAIcBA" + } + } + } + + +* Aggregate all the signed one-time keys into a single JSON object as follows: .. code:: json { "one_time_keys": { - "curve25519:": "", + "signed_curve25519:": { + "key": "", + "signatures": { + "": { + "ed25519:": "" + } + } + }, + "signed_curve25519:": { + ... + }, ... } } -* Upload the object via ``POST /_matrix/client/unstable/keys/upload``. (Unlike - the device keys, the one-time keys are **not** signed. +* Upload the object via ``POST /_matrix/client/unstable/keys/upload``. * Call ``olm_account_mark_keys_as_published`` to tell the olm library not to - return the same keys from a future call to ``olm_account_one_time_keys``\. + return the same keys from a future call to ``olm_account_one_time_keys``. Configuring a room to use encryption ------------------------------------ @@ -407,20 +445,20 @@ object containing information on the device, as follows: { "algorithms": [...], - "device_id": "", + "device_id": "", "keys": { - "curve25519:": "", - "ed25519:": "" + "curve25519:": "", + "ed25519:": "" }, "signatures": { "": { - "ed25519:": "" + "ed25519:": "" }, }, "unsigned": { "device_display_name": "" }, - "user_id: " + "user_id: " } The client should first check the signature on this object. To do this, @@ -601,7 +639,7 @@ create a query object as follows: { "": { - "": "curve25519", + "": "signed_curve25519", ... }, ... @@ -617,15 +655,28 @@ This will return a result as follows: { "": { "": { - "curve25519:": "" + "signed_curve25519:": { + "key": "", + "signatures": { + "": { + "ed25519:": "" + } + } + }, }, ... }, ... } -The client should then pass this key, along with the Curve25519 Identity -key for the remote device, into ``olm_create_outbound_session``. +The client should first check the signatures on the signed key objects. As with +checking the signatures on the device keys, it should remove the ``signatures`` +property, format the remainder as Canonical JSON, and pass the result into +``olm_ed25519_verify``, using the Ed25519 device key for the ``key`` parameter. + +Provided the key object passes verification, the client should then pass the +key, along with the Curve25519 Identity key for the remote device, into +``olm_create_outbound_session``. Handling membership changes --------------------------- From e53e3ab01a8ab06184ecbce506bf5e920d661edf Mon Sep 17 00:00:00 2001 From: Richard van der Hoff Date: Mon, 24 Oct 2016 14:59:47 +0100 Subject: [PATCH 31/33] remove `unsigned` prop for verifying --- supporting-docs/guides/2016-10-18-e2e_implementation.rst | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/supporting-docs/guides/2016-10-18-e2e_implementation.rst b/supporting-docs/guides/2016-10-18-e2e_implementation.rst index a018a410..a754f7c6 100644 --- a/supporting-docs/guides/2016-10-18-e2e_implementation.rst +++ b/supporting-docs/guides/2016-10-18-e2e_implementation.rst @@ -671,8 +671,9 @@ This will return a result as follows: The client should first check the signatures on the signed key objects. As with checking the signatures on the device keys, it should remove the ``signatures`` -property, format the remainder as Canonical JSON, and pass the result into -``olm_ed25519_verify``, using the Ed25519 device key for the ``key`` parameter. +and (if present) ``unsigned`` properties, format the remainder as Canonical +JSON, and pass the result into ``olm_ed25519_verify``, using the Ed25519 device +key for the ``key`` parameter. Provided the key object passes verification, the client should then pass the key, along with the Curve25519 Identity key for the remote device, into From a5e12814efb7a687f0546450f1310aab69d424e8 Mon Sep 17 00:00:00 2001 From: Richard van der Hoff Date: Mon, 31 Oct 2016 12:01:37 +0000 Subject: [PATCH 32/33] Split appendices up Split appendices into multiple files --- specification/appendices.rst | 286 +--------------------- specification/appendices/test_vectors.rst | 171 +++++++++++++ specification/appendices/threat_model.rst | 140 +++++++++++ specification/targets.yaml | 2 + 4 files changed, 315 insertions(+), 284 deletions(-) create mode 100644 specification/appendices/test_vectors.rst create mode 100644 specification/appendices/threat_model.rst diff --git a/specification/appendices.rst b/specification/appendices.rst index c57c9fb0..4a106a3a 100644 --- a/specification/appendices.rst +++ b/specification/appendices.rst @@ -15,287 +15,5 @@ Appendices ========== -Security Threat Model ----------------------- - -Denial of Service -~~~~~~~~~~~~~~~~~ - -The attacker could attempt to prevent delivery of messages to or from the -victim in order to: - -* Disrupt service or marketing campaign of a commercial competitor. -* Censor a discussion or censor a participant in a discussion. -* Perform general vandalism. - -Threat: Resource Exhaustion -+++++++++++++++++++++++++++ - -An attacker could cause the victims server to exhaust a particular resource -(e.g. open TCP connections, CPU, memory, disk storage) - -Threat: Unrecoverable Consistency Violations -++++++++++++++++++++++++++++++++++++++++++++ - -An attacker could send messages which created an unrecoverable "split-brain" -state in the cluster such that the victim's servers could no longer derive a -consistent view of the chatroom state. - -Threat: Bad History -+++++++++++++++++++ - -An attacker could convince the victim to accept invalid messages which the -victim would then include in their view of the chatroom history. Other servers -in the chatroom would reject the invalid messages and potentially reject the -victims messages as well since they depended on the invalid messages. - -.. TODO-spec - Track trustworthiness of HS or users based on if they try to pretend they - haven't seen recent events, and fake a splitbrain... --M - -Threat: Block Network Traffic -+++++++++++++++++++++++++++++ - -An attacker could try to firewall traffic between the victim's server and some -or all of the other servers in the chatroom. - -Threat: High Volume of Messages -+++++++++++++++++++++++++++++++ - -An attacker could send large volumes of messages to a chatroom with the victim -making the chatroom unusable. - -Threat: Banning users without necessary authorisation -+++++++++++++++++++++++++++++++++++++++++++++++++++++ - -An attacker could attempt to ban a user from a chatroom with the necessary -authorisation. - -Spoofing -~~~~~~~~ - -An attacker could try to send a message claiming to be from the victim without -the victim having sent the message in order to: - -* Impersonate the victim while performing illicit activity. -* Obtain privileges of the victim. - -Threat: Altering Message Contents -+++++++++++++++++++++++++++++++++ - -An attacker could try to alter the contents of an existing message from the -victim. - -Threat: Fake Message "origin" Field -+++++++++++++++++++++++++++++++++++ - -An attacker could try to send a new message purporting to be from the victim -with a phony "origin" field. - -Spamming -~~~~~~~~ - -The attacker could try to send a high volume of solicited or unsolicited -messages to the victim in order to: - -* Find victims for scams. -* Market unwanted products. - -Threat: Unsolicited Messages -++++++++++++++++++++++++++++ - -An attacker could try to send messages to victims who do not wish to receive -them. - -Threat: Abusive Messages -++++++++++++++++++++++++ - -An attacker could send abusive or threatening messages to the victim - -Spying -~~~~~~ - -The attacker could try to access message contents or metadata for messages sent -by the victim or to the victim that were not intended to reach the attacker in -order to: - -* Gain sensitive personal or commercial information. -* Impersonate the victim using credentials contained in the messages. - (e.g. password reset messages) -* Discover who the victim was talking to and when. - -Threat: Disclosure during Transmission -++++++++++++++++++++++++++++++++++++++ - -An attacker could try to expose the message contents or metadata during -transmission between the servers. - -Threat: Disclosure to Servers Outside Chatroom -++++++++++++++++++++++++++++++++++++++++++++++ - -An attacker could try to convince servers within a chatroom to send messages to -a server it controls that was not authorised to be within the chatroom. - -Threat: Disclosure to Servers Within Chatroom -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -An attacker could take control of a server within a chatroom to expose message -contents or metadata for messages in that room. - - -Cryptographic Test Vectors --------------------------- - -To assist in the development of compatible implementations, the following test -values may be useful for verifying the cryptographic event signing code. - -Signing Key -~~~~~~~~~~~ - -The following test vectors all use the 32-byte value given by the following -Base64-encoded string as the seed for generating the ``ed25519`` signing key: - -.. code:: - - SIGNING_KEY_SEED = decode_base64( - "YJDBA9Xnr2sVqXD9Vj7XVUnmFZcZrlw8Md7kMW+3XA1" - ) - -In each case, the server name and key ID are as follows: - -.. code:: - - SERVER_NAME = "domain" - - KEY_ID = "ed25519:1" - -JSON Signing -~~~~~~~~~~~~ - -Given an empty JSON object: - -.. code:: json - - {} - -The JSON signing algorithm should emit the following signed data: - -.. code:: json - - { - "signatures": { - "domain": { - "ed25519:1": "K8280/U9SSy9IVtjBuVeLr+HpOB4BQFWbg+UZaADMtTdGYI7Geitb76LTrr5QV/7Xg4ahLwYGYZzuHGZKM5ZAQ" - } - } - } - -Given the following JSON object with data values in it: - -.. code:: json - - { - "one": 1, - "two": "Two" - } - -The JSON signing algorithm should emit the following signed JSON: - -.. code:: json - - { - "one": 1, - "signatures": { - "domain": { - "ed25519:1": "KqmLSbO39/Bzb0QIYE82zqLwsA+PDzYIpIRA2sRQ4sL53+sN6/fpNSoqE7BP7vBZhG6kYdD13EIMJpvhJI+6Bw" - } - }, - "two": "Two" - } - -Event Signing -~~~~~~~~~~~~~ - -Given the following minimally-sized event: - -.. code:: json - - { - "event_id": "$0:domain", - "origin": "domain", - "origin_server_ts": 1000000, - "signatures": {}, - "type": "X", - "unsigned": { - "age_ts": 1000000 - } - } - -The event signing algorithm should emit the following signed event: - -.. code:: json - - { - "event_id": "$0:domain", - "hashes": { - "sha256": "6tJjLpXtggfke8UxFhAKg82QVkJzvKOVOOSjUDK4ZSI" - }, - "origin": "domain", - "origin_server_ts": 1000000, - "signatures": { - "domain": { - "ed25519:1": "2Wptgo4CwmLo/Y8B8qinxApKaCkBG2fjTWB7AbP5Uy+aIbygsSdLOFzvdDjww8zUVKCmI02eP9xtyJxc/cLiBA" - } - }, - "type": "X", - "unsigned": { - "age_ts": 1000000 - } - } - -Given the following event containing redactable content: - -.. code:: json - - { - "content": { - "body": "Here is the message content", - }, - "event_id": "$0:domain", - "origin": "domain", - "origin_server_ts": 1000000, - "type": "m.room.message", - "room_id": "!r:domain", - "sender": "@u:domain", - "signatures": {}, - "unsigned": { - "age_ts": 1000000 - } - } - -The event signing algorithm should emit the following signed event: - -.. code:: json - - { - "content": { - "body": "Here is the message content", - }, - "event_id": "$0:domain", - "hashes": { - "sha256": "onLKD1bGljeBWQhWZ1kaP9SorVmRQNdN5aM2JYU2n/g" - }, - "origin": "domain", - "origin_server_ts": 1000000, - "type": "m.room.message", - "room_id": "!r:domain", - "sender": "@u:domain", - "signatures": { - "domain": { - "ed25519:1": "Wm+VzmOUOz08Ds+0NTWb1d4CZrVsJSikkeRxh6aCcUwu6pNC78FunoD7KNWzqFn241eYHYMGCA5McEiVPdhzBA" - } - }, - "unsigned": { - "age_ts": 1000000 - } - } +.. contents:: Table of Contents +.. sectnum:: diff --git a/specification/appendices/test_vectors.rst b/specification/appendices/test_vectors.rst new file mode 100644 index 00000000..e2b8fb58 --- /dev/null +++ b/specification/appendices/test_vectors.rst @@ -0,0 +1,171 @@ +.. Copyright 2015 OpenMarket Ltd +.. +.. Licensed under the Apache License, Version 2.0 (the "License"); +.. you may not use this file except in compliance with the License. +.. You may obtain a copy of the License at +.. +.. http://www.apache.org/licenses/LICENSE-2.0 +.. +.. Unless required by applicable law or agreed to in writing, software +.. distributed under the License is distributed on an "AS IS" BASIS, +.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +.. See the License for the specific language governing permissions and +.. limitations under the License. + + +Cryptographic Test Vectors +-------------------------- + +To assist in the development of compatible implementations, the following test +values may be useful for verifying the cryptographic event signing code. + +Signing Key +~~~~~~~~~~~ + +The following test vectors all use the 32-byte value given by the following +Base64-encoded string as the seed for generating the ``ed25519`` signing key: + +.. code:: + + SIGNING_KEY_SEED = decode_base64( + "YJDBA9Xnr2sVqXD9Vj7XVUnmFZcZrlw8Md7kMW+3XA1" + ) + +In each case, the server name and key ID are as follows: + +.. code:: + + SERVER_NAME = "domain" + + KEY_ID = "ed25519:1" + +JSON Signing +~~~~~~~~~~~~ + +Given an empty JSON object: + +.. code:: json + + {} + +The JSON signing algorithm should emit the following signed data: + +.. code:: json + + { + "signatures": { + "domain": { + "ed25519:1": "K8280/U9SSy9IVtjBuVeLr+HpOB4BQFWbg+UZaADMtTdGYI7Geitb76LTrr5QV/7Xg4ahLwYGYZzuHGZKM5ZAQ" + } + } + } + +Given the following JSON object with data values in it: + +.. code:: json + + { + "one": 1, + "two": "Two" + } + +The JSON signing algorithm should emit the following signed JSON: + +.. code:: json + + { + "one": 1, + "signatures": { + "domain": { + "ed25519:1": "KqmLSbO39/Bzb0QIYE82zqLwsA+PDzYIpIRA2sRQ4sL53+sN6/fpNSoqE7BP7vBZhG6kYdD13EIMJpvhJI+6Bw" + } + }, + "two": "Two" + } + +Event Signing +~~~~~~~~~~~~~ + +Given the following minimally-sized event: + +.. code:: json + + { + "event_id": "$0:domain", + "origin": "domain", + "origin_server_ts": 1000000, + "signatures": {}, + "type": "X", + "unsigned": { + "age_ts": 1000000 + } + } + +The event signing algorithm should emit the following signed event: + +.. code:: json + + { + "event_id": "$0:domain", + "hashes": { + "sha256": "6tJjLpXtggfke8UxFhAKg82QVkJzvKOVOOSjUDK4ZSI" + }, + "origin": "domain", + "origin_server_ts": 1000000, + "signatures": { + "domain": { + "ed25519:1": "2Wptgo4CwmLo/Y8B8qinxApKaCkBG2fjTWB7AbP5Uy+aIbygsSdLOFzvdDjww8zUVKCmI02eP9xtyJxc/cLiBA" + } + }, + "type": "X", + "unsigned": { + "age_ts": 1000000 + } + } + +Given the following event containing redactable content: + +.. code:: json + + { + "content": { + "body": "Here is the message content", + }, + "event_id": "$0:domain", + "origin": "domain", + "origin_server_ts": 1000000, + "type": "m.room.message", + "room_id": "!r:domain", + "sender": "@u:domain", + "signatures": {}, + "unsigned": { + "age_ts": 1000000 + } + } + +The event signing algorithm should emit the following signed event: + +.. code:: json + + { + "content": { + "body": "Here is the message content", + }, + "event_id": "$0:domain", + "hashes": { + "sha256": "onLKD1bGljeBWQhWZ1kaP9SorVmRQNdN5aM2JYU2n/g" + }, + "origin": "domain", + "origin_server_ts": 1000000, + "type": "m.room.message", + "room_id": "!r:domain", + "sender": "@u:domain", + "signatures": { + "domain": { + "ed25519:1": "Wm+VzmOUOz08Ds+0NTWb1d4CZrVsJSikkeRxh6aCcUwu6pNC78FunoD7KNWzqFn241eYHYMGCA5McEiVPdhzBA" + } + }, + "unsigned": { + "age_ts": 1000000 + } + } diff --git a/specification/appendices/threat_model.rst b/specification/appendices/threat_model.rst new file mode 100644 index 00000000..0dea62e0 --- /dev/null +++ b/specification/appendices/threat_model.rst @@ -0,0 +1,140 @@ +.. Copyright 2015 OpenMarket Ltd +.. +.. Licensed under the Apache License, Version 2.0 (the "License"); +.. you may not use this file except in compliance with the License. +.. You may obtain a copy of the License at +.. +.. http://www.apache.org/licenses/LICENSE-2.0 +.. +.. Unless required by applicable law or agreed to in writing, software +.. distributed under the License is distributed on an "AS IS" BASIS, +.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +.. See the License for the specific language governing permissions and +.. limitations under the License. + +Security Threat Model +---------------------- + +Denial of Service +~~~~~~~~~~~~~~~~~ + +The attacker could attempt to prevent delivery of messages to or from the +victim in order to: + +* Disrupt service or marketing campaign of a commercial competitor. +* Censor a discussion or censor a participant in a discussion. +* Perform general vandalism. + +Threat: Resource Exhaustion ++++++++++++++++++++++++++++ + +An attacker could cause the victims server to exhaust a particular resource +(e.g. open TCP connections, CPU, memory, disk storage) + +Threat: Unrecoverable Consistency Violations +++++++++++++++++++++++++++++++++++++++++++++ + +An attacker could send messages which created an unrecoverable "split-brain" +state in the cluster such that the victim's servers could no longer derive a +consistent view of the chatroom state. + +Threat: Bad History ++++++++++++++++++++ + +An attacker could convince the victim to accept invalid messages which the +victim would then include in their view of the chatroom history. Other servers +in the chatroom would reject the invalid messages and potentially reject the +victims messages as well since they depended on the invalid messages. + +.. TODO-spec + Track trustworthiness of HS or users based on if they try to pretend they + haven't seen recent events, and fake a splitbrain... --M + +Threat: Block Network Traffic ++++++++++++++++++++++++++++++ + +An attacker could try to firewall traffic between the victim's server and some +or all of the other servers in the chatroom. + +Threat: High Volume of Messages ++++++++++++++++++++++++++++++++ + +An attacker could send large volumes of messages to a chatroom with the victim +making the chatroom unusable. + +Threat: Banning users without necessary authorisation ++++++++++++++++++++++++++++++++++++++++++++++++++++++ + +An attacker could attempt to ban a user from a chatroom with the necessary +authorisation. + +Spoofing +~~~~~~~~ + +An attacker could try to send a message claiming to be from the victim without +the victim having sent the message in order to: + +* Impersonate the victim while performing illicit activity. +* Obtain privileges of the victim. + +Threat: Altering Message Contents ++++++++++++++++++++++++++++++++++ + +An attacker could try to alter the contents of an existing message from the +victim. + +Threat: Fake Message "origin" Field ++++++++++++++++++++++++++++++++++++ + +An attacker could try to send a new message purporting to be from the victim +with a phony "origin" field. + +Spamming +~~~~~~~~ + +The attacker could try to send a high volume of solicited or unsolicited +messages to the victim in order to: + +* Find victims for scams. +* Market unwanted products. + +Threat: Unsolicited Messages +++++++++++++++++++++++++++++ + +An attacker could try to send messages to victims who do not wish to receive +them. + +Threat: Abusive Messages +++++++++++++++++++++++++ + +An attacker could send abusive or threatening messages to the victim + +Spying +~~~~~~ + +The attacker could try to access message contents or metadata for messages sent +by the victim or to the victim that were not intended to reach the attacker in +order to: + +* Gain sensitive personal or commercial information. +* Impersonate the victim using credentials contained in the messages. + (e.g. password reset messages) +* Discover who the victim was talking to and when. + +Threat: Disclosure during Transmission +++++++++++++++++++++++++++++++++++++++ + +An attacker could try to expose the message contents or metadata during +transmission between the servers. + +Threat: Disclosure to Servers Outside Chatroom +++++++++++++++++++++++++++++++++++++++++++++++ + +An attacker could try to convince servers within a chatroom to send messages to +a server it controls that was not authorised to be within the chatroom. + +Threat: Disclosure to Servers Within Chatroom +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +An attacker could take control of a server within a chatroom to expose message +contents or metadata for messages in that room. diff --git a/specification/targets.yaml b/specification/targets.yaml index a157366f..841c9d61 100644 --- a/specification/targets.yaml +++ b/specification/targets.yaml @@ -33,6 +33,8 @@ targets: appendices: files: - appendices.rst + - appendices/threat_model.rst + - appendices/test_vectors.rst groups: # reusable blobs of files when prefixed with 'group:' modules: - modules/instant_messaging.rst From 3ee75af06fc4429fd5c1214b53bbe744842ec3ca Mon Sep 17 00:00:00 2001 From: Richard van der Hoff Date: Mon, 31 Oct 2016 12:36:47 +0000 Subject: [PATCH 33/33] Move 'Signing JSON' to appendices Canonical JSON and JSON signing in general are common to the C-S spec. Move them to the appendices instead of the S-S spec. --- specification/appendices/signing_json.rst | 167 ++++++++++++ specification/event_signing.rst | 306 ---------------------- specification/server_server_api.rst | 146 +++++++++++ specification/targets.yaml | 2 +- 4 files changed, 314 insertions(+), 307 deletions(-) create mode 100644 specification/appendices/signing_json.rst delete mode 100644 specification/event_signing.rst diff --git a/specification/appendices/signing_json.rst b/specification/appendices/signing_json.rst new file mode 100644 index 00000000..5536af5e --- /dev/null +++ b/specification/appendices/signing_json.rst @@ -0,0 +1,167 @@ +.. Copyright 2016 OpenMarket Ltd +.. +.. Licensed under the Apache License, Version 2.0 (the "License"); +.. you may not use this file except in compliance with the License. +.. You may obtain a copy of the License at +.. +.. http://www.apache.org/licenses/LICENSE-2.0 +.. +.. Unless required by applicable law or agreed to in writing, software +.. distributed under the License is distributed on an "AS IS" BASIS, +.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +.. See the License for the specific language governing permissions and +.. limitations under the License. + +Signing JSON +------------ + +Various points in the Matrix specification require JSON objects to be +cryptographically signed. This requires us to encode the JSON as a binary +string. Unfortunately the same JSON can be encoded in different ways by +changing how much white space is used or by changing the order of keys within +objects. + +Signing an object therefore requires it to be encoded as a sequence of bytes +using `Canonical JSON`_, computing the signature for that sequence and then +adding the signature to the original JSON object. + +Canonical JSON +~~~~~~~~~~~~~~ + +We define the canonical JSON encoding for a value to be the shortest UTF-8 JSON +encoding with dictionary keys lexicographically sorted by unicode codepoint. +Numbers in the JSON must be integers in the range ``[-(2**53)+1, (2**53)-1]``. + +We pick UTF-8 as the encoding as it should be available to all platforms and +JSON received from the network is likely to be already encoded using UTF-8. +We sort the keys to give a consistent ordering. We force integers to be in the +range where they can be accurately represented using IEEE double precision +floating point numbers since a number of JSON libraries represent all numbers +using this representation. + +.. code:: python + + import json + + def canonical_json(value): + return json.dumps( + value, + # Encode code-points outside of ASCII as UTF-8 rather than \u escapes + ensure_ascii=False, + # Remove unnecessary white space. + separators=(',',':'), + # Sort the keys of dictionaries. + sort_keys=True, + # Encode the resulting unicode as UTF-8 bytes. + ).encode("UTF-8") + +Grammar ++++++++ + +Adapted from the grammar in http://tools.ietf.org/html/rfc7159 removing +insignificant whitespace, fractions, exponents and redundant character escapes + +.. code:: + + value = false / null / true / object / array / number / string + false = %x66.61.6c.73.65 + null = %x6e.75.6c.6c + true = %x74.72.75.65 + object = %x7B [ member *( %x2C member ) ] %7D + member = string %x3A value + array = %x5B [ value *( %x2C value ) ] %5B + number = [ %x2D ] int + int = %x30 / ( %x31-39 *digit ) + digit = %x30-39 + string = %x22 *char %x22 + char = unescaped / %x5C escaped + unescaped = %x20-21 / %x23-5B / %x5D-10FFFF + escaped = %x22 ; " quotation mark U+0022 + / %x5C ; \ reverse solidus U+005C + / %x62 ; b backspace U+0008 + / %x66 ; f form feed U+000C + / %x6E ; n line feed U+000A + / %x72 ; r carriage return U+000D + / %x74 ; t tab U+0009 + / %x75.30.30.30 (%x30-37 / %x62 / %x65-66) ; u000X + / %x75.30.30.31 (%x30-39 / %x61-66) ; u001X + +Signing Details +~~~~~~~~~~~~~~~ + +JSON is signed by encoding the JSON object without ``signatures`` or keys grouped +as ``unsigned``, using the canonical encoding described above. The JSON bytes are then signed using the +signature algorithm and the signature is encoded using base64 with the padding +stripped. The resulting base64 signature is added to an object under the +*signing key identifier* which is added to the ``signatures`` object under the +name of the entity signing it which is added back to the original JSON object +along with the ``unsigned`` object. + +The *signing key identifier* is the concatenation of the *signing algorithm* +and a *key identifier*. The *signing algorithm* identifies the algorithm used +to sign the JSON. The currently supported value for *signing algorithm* is +``ed25519`` as implemented by NACL (http://nacl.cr.yp.to/). The *key identifier* +is used to distinguish between different signing keys used by the same entity. + +The ``unsigned`` object and the ``signatures`` object are not covered by the +signature. Therefore intermediate entities can add unsigned data such as +timestamps and additional signatures. + + +.. code:: json + + { + "name": "example.org", + "signing_keys": { + "ed25519:1": "XSl0kuyvrXNj6A+7/tkrB9sxSbRi08Of5uRhxOqZtEQ" + }, + "unsigned": { + "age_ts": 922834800000 + }, + "signatures": { + "example.org": { + "ed25519:1": "s76RUgajp8w172am0zQb/iPTHsRnb4SkrzGoeCOSFfcBY2V/1c8QfrmdXHpvnc2jK5BD1WiJIxiMW95fMjK7Bw" + } + } + } + +.. code:: python + + def sign_json(json_object, signing_key, signing_name): + signatures = json_object.pop("signatures", {}) + unsigned = json_object.pop("unsigned", None) + + signed = signing_key.sign(encode_canonical_json(json_object)) + signature_base64 = encode_base64(signed.signature) + + key_id = "%s:%s" % (signing_key.alg, signing_key.version) + signatures.setdefault(signing_name, {})[key_id] = signature_base64 + + json_object["signatures"] = signatures + if unsigned is not None: + json_object["unsigned"] = unsigned + + return json_object + +Checking for a Signature +~~~~~~~~~~~~~~~~~~~~~~~~ + +To check if an entity has signed a JSON object an implementation does the +following: + +1. Checks if the ``signatures`` member of the object contains an entry with + the name of the entity. If the entry is missing then the check fails. +2. Removes any *signing key identifiers* from the entry with algorithms it + doesn't understand. If there are no *signing key identifiers* left then the + check fails. +3. Looks up *verification keys* for the remaining *signing key identifiers* + either from a local cache or by consulting a trusted key server. If it + cannot find a *verification key* then the check fails. +4. Decodes the base64 encoded signature bytes. If base64 decoding fails then + the check fails. +5. Removes the ``signatures`` and ``unsigned`` members of the object. +6. Encodes the remainder of the JSON object using the `Canonical JSON`_ + encoding. +7. Checks the signature bytes against the encoded object using the + *verification key*. If this fails then the check fails. Otherwise the check + succeeds. diff --git a/specification/event_signing.rst b/specification/event_signing.rst deleted file mode 100644 index 8b8a703d..00000000 --- a/specification/event_signing.rst +++ /dev/null @@ -1,306 +0,0 @@ -.. Copyright 2016 OpenMarket Ltd -.. -.. Licensed under the Apache License, Version 2.0 (the "License"); -.. you may not use this file except in compliance with the License. -.. You may obtain a copy of the License at -.. -.. http://www.apache.org/licenses/LICENSE-2.0 -.. -.. Unless required by applicable law or agreed to in writing, software -.. distributed under the License is distributed on an "AS IS" BASIS, -.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -.. See the License for the specific language governing permissions and -.. limitations under the License. - -Signing Events --------------- - -Canonical JSON -~~~~~~~~~~~~~~ - -Matrix events are represented using JSON objects. If we want to sign JSON -events we need to encode the JSON as a binary string. Unfortunately the same -JSON can be encoded in different ways by changing how much white space is used -or by changing the order of keys within objects. Therefore we have to define an -encoding which can be reproduced byte for byte by any JSON library. - -We define the canonical JSON encoding for a value to be the shortest UTF-8 JSON -encoding with dictionary keys lexicographically sorted by unicode codepoint. -Numbers in the JSON must be integers in the range [-(2**53)+1, (2**53)-1]. - -We pick UTF-8 as the encoding as it should be available to all platforms and -JSON received from the network is likely to be already encoded using UTF-8. -We sort the keys to give a consistent ordering. We force integers to be in the -range where they can be accurately represented using IEEE double precision -floating point numbers since a number of JSON libraries represent all numbers -using this representation. - -.. code:: python - - import json - - def canonical_json(value): - return json.dumps( - value, - # Encode code-points outside of ASCII as UTF-8 rather than \u escapes - ensure_ascii=False, - # Remove unnecessary white space. - separators=(',',':'), - # Sort the keys of dictionaries. - sort_keys=True, - # Encode the resulting unicode as UTF-8 bytes. - ).encode("UTF-8") - -Grammar -+++++++ - -Adapted from the grammar in http://tools.ietf.org/html/rfc7159 removing -insignificant whitespace, fractions, exponents and redundant character escapes - -.. code:: - - value = false / null / true / object / array / number / string - false = %x66.61.6c.73.65 - null = %x6e.75.6c.6c - true = %x74.72.75.65 - object = %x7B [ member *( %x2C member ) ] %7D - member = string %x3A value - array = %x5B [ value *( %x2C value ) ] %5B - number = [ %x2D ] int - int = %x30 / ( %x31-39 *digit ) - digit = %x30-39 - string = %x22 *char %x22 - char = unescaped / %x5C escaped - unescaped = %x20-21 / %x23-5B / %x5D-10FFFF - escaped = %x22 ; " quotation mark U+0022 - / %x5C ; \ reverse solidus U+005C - / %x62 ; b backspace U+0008 - / %x66 ; f form feed U+000C - / %x6E ; n line feed U+000A - / %x72 ; r carriage return U+000D - / %x74 ; t tab U+0009 - / %x75.30.30.30 (%x30-37 / %x62 / %x65-66) ; u000X - / %x75.30.30.31 (%x30-39 / %x61-66) ; u001X - -Signing JSON -~~~~~~~~~~~~ - -We can now sign a JSON object by encoding it as a sequence of bytes, computing -the signature for that sequence and then adding the signature to the original -JSON object. - -Signing Details -+++++++++++++++ - -JSON is signed by encoding the JSON object without ``signatures`` or keys grouped -as ``unsigned``, using the canonical encoding described above. The JSON bytes are then signed using the -signature algorithm and the signature is encoded using base64 with the padding -stripped. The resulting base64 signature is added to an object under the -*signing key identifier* which is added to the ``signatures`` object under the -name of the server signing it which is added back to the original JSON object -along with the ``unsigned`` object. - -The *signing key identifier* is the concatenation of the *signing algorithm* -and a *key version*. The *signing algorithm* identifies the algorithm used to -sign the JSON. The currently support value for *signing algorithm* is -``ed25519`` as implemented by NACL (http://nacl.cr.yp.to/). The *key version* -is used to distinguish between different signing keys used by the same entity. - -The ``unsigned`` object and the ``signatures`` object are not covered by the -signature. Therefore intermediate servers can add unsigned data such as timestamps -and additional signatures. - - -.. code:: json - - { - "name": "example.org", - "signing_keys": { - "ed25519:1": "XSl0kuyvrXNj6A+7/tkrB9sxSbRi08Of5uRhxOqZtEQ" - }, - "unsigned": { - "age_ts": 922834800000 - }, - "signatures": { - "example.org": { - "ed25519:1": "s76RUgajp8w172am0zQb/iPTHsRnb4SkrzGoeCOSFfcBY2V/1c8QfrmdXHpvnc2jK5BD1WiJIxiMW95fMjK7Bw" - } - } - } - -.. code:: python - - def sign_json(json_object, signing_key, signing_name): - signatures = json_object.pop("signatures", {}) - unsigned = json_object.pop("unsigned", None) - - signed = signing_key.sign(encode_canonical_json(json_object)) - signature_base64 = encode_base64(signed.signature) - - key_id = "%s:%s" % (signing_key.alg, signing_key.version) - signatures.setdefault(signing_name, {})[key_id] = signature_base64 - - json_object["signatures"] = signatures - if unsigned is not None: - json_object["unsigned"] = unsigned - - return json_object - -Checking for a Signature -++++++++++++++++++++++++ - -To check if an entity has signed a JSON object a server does the following - -1. Checks if the ``signatures`` object contains an entry with the name of the - entity. If the entry is missing then the check fails. -2. Removes any *signing key identifiers* from the entry with algorithms it - doesn't understand. If there are no *signing key identifiers* left then the - check fails. -3. Looks up *verification keys* for the remaining *signing key identifiers* - either from a local cache or by consulting a trusted key server. If it - cannot find a *verification key* then the check fails. -4. Decodes the base64 encoded signature bytes. If base64 decoding fails then - the check fails. -5. Checks the signature bytes using the *verification key*. If this fails then - the check fails. Otherwise the check succeeds. - -Signing Events -~~~~~~~~~~~~~~ - -Signing events is a more complicated process since servers can choose to redact -non-essential parts of an event. Before signing the event it is encoded as -Canonical JSON and hashed using SHA-256. The resulting hash is then stored -in the event JSON in a ``hash`` object under a ``sha256`` key. - -.. code:: python - - def hash_event(event_json_object): - - # Keys under "unsigned" can be modified by other servers. - # They are useful for conveying information like the age of an - # event that will change in transit. - # Since they can be modifed we need to exclude them from the hash. - unsigned = event_json_object.pop("unsigned", None) - - # Signatures will depend on the current value of the "hashes" key. - # We cannot add new hashes without invalidating existing signatures. - signatures = event_json_object.pop("signatures", None) - - # The "hashes" key might contain multiple algorithms if we decide to - # migrate away from SHA-2. We don't want to include an existing hash - # output in our hash so we exclude the "hashes" dict from the hash. - hashes = event_json_object.pop("hashes", {}) - - # Encode the JSON using a canonical encoding so that we get the same - # bytes on every server for the same JSON object. - event_json_bytes = encode_canonical_json(event_json_bytes) - - # Add the base64 encoded bytes of the hash to the "hashes" dict. - hashes["sha256"] = encode_base64(sha256(event_json_bytes).digest()) - - # Add the "hashes" dict back the event JSON under a "hashes" key. - event_json_object["hashes"] = hashes - if unsigned is not None: - event_json_object["unsigned"] = unsigned - return event_json_object - -The event is then stripped of all non-essential keys both at the top level and -within the ``content`` object. Any top-level keys not in the following list -MUST be removed: - -.. code:: - - auth_events - depth - event_id - hashes - membership - origin - origin_server_ts - prev_events - prev_state - room_id - sender - signatures - state_key - type - -A new ``content`` object is constructed for the resulting event that contains -only the essential keys of the original ``content`` object. If the original -event lacked a ``content`` object at all, a new empty JSON object is created -for it. - -The keys that are considered essential for the ``content`` object depend on the -the ``type`` of the event. These are: - -.. code:: - - type is "m.room.aliases": - aliases - - type is "m.room.create": - creator - - type is "m.room.history_visibility": - history_visibility - - type is "m.room.join_rules": - join_rule - - type is "m.room.member": - membership - - type is "m.room.power_levels": - ban - events - events_default - kick - redact - state_default - users - users_default - -The resulting stripped object with the new ``content`` object and the original -``hashes`` key is then signed using the JSON signing algorithm outlined below: - -.. code:: python - - def sign_event(event_json_object, name, key): - - # Make sure the event has a "hashes" key. - if "hashes" not in event_json_object: - event_json_object = hash_event(event_json_object) - - # Strip all the keys that would be removed if the event was redacted. - # The hashes are not stripped and cover all the keys in the event. - # This means that we can tell if any of the non-essential keys are - # modified or removed. - stripped_json_object = strip_non_essential_keys(event_json_object) - - # Sign the stripped JSON object. The signature only covers the - # essential keys and the hashes. This means that we can check the - # signature even if the event is redacted. - signed_json_object = sign_json(stripped_json_object) - - # Copy the signatures from the stripped event to the original event. - event_json_object["signatures"] = signed_json_oject["signatures"] - return event_json_object - -Servers can then transmit the entire event or the event with the non-essential -keys removed. If the entire event is present, receiving servers can then check -the event by computing the SHA-256 of the event, excluding the ``hash`` object. -If the keys have been redacted, then the ``hash`` object is included when -calculating the SHA-256 instead. - -New hash functions can be introduced by adding additional keys to the ``hash`` -object. Since the ``hash`` object cannot be redacted a server shouldn't allow -too many hashes to be listed, otherwise a server might embed illict data within -the ``hash`` object. For similar reasons a server shouldn't allow hash values -that are too long. - -.. TODO - [[TODO(markjh): We might want to specify a maximum number of keys for the - ``hash`` and we might want to specify the maximum output size of a hash]] - [[TODO(markjh) We might want to allow the server to omit the output of well - known hash functions like SHA-256 when none of the keys have been redacted]] - diff --git a/specification/server_server_api.rst b/specification/server_server_api.rst index 5d7f2b17..1a036373 100644 --- a/specification/server_server_api.rst +++ b/specification/server_server_api.rst @@ -998,3 +998,149 @@ the following EDU:: messages: The messages to send. A map from user ID, to a map from device ID to message body. The device ID may also be *, meaning all known devices for the user. + + +Signing Events +-------------- + +Signing events is complicated by the fact that servers can choose to redact +non-essential parts of an event. + +Before signing the event, the ``unsigned`` and ``signature`` members are +removed, it is encoded as `Canonical JSON`_, and then hashed using SHA-256. The +resulting hash is then stored in the event JSON in a ``hash`` object under a +``sha256`` key. + +.. code:: python + + def hash_event(event_json_object): + + # Keys under "unsigned" can be modified by other servers. + # They are useful for conveying information like the age of an + # event that will change in transit. + # Since they can be modifed we need to exclude them from the hash. + unsigned = event_json_object.pop("unsigned", None) + + # Signatures will depend on the current value of the "hashes" key. + # We cannot add new hashes without invalidating existing signatures. + signatures = event_json_object.pop("signatures", None) + + # The "hashes" key might contain multiple algorithms if we decide to + # migrate away from SHA-2. We don't want to include an existing hash + # output in our hash so we exclude the "hashes" dict from the hash. + hashes = event_json_object.pop("hashes", {}) + + # Encode the JSON using a canonical encoding so that we get the same + # bytes on every server for the same JSON object. + event_json_bytes = encode_canonical_json(event_json_bytes) + + # Add the base64 encoded bytes of the hash to the "hashes" dict. + hashes["sha256"] = encode_base64(sha256(event_json_bytes).digest()) + + # Add the "hashes" dict back the event JSON under a "hashes" key. + event_json_object["hashes"] = hashes + if unsigned is not None: + event_json_object["unsigned"] = unsigned + return event_json_object + +The event is then stripped of all non-essential keys both at the top level and +within the ``content`` object. Any top-level keys not in the following list +MUST be removed: + +.. code:: + + auth_events + depth + event_id + hashes + membership + origin + origin_server_ts + prev_events + prev_state + room_id + sender + signatures + state_key + type + +A new ``content`` object is constructed for the resulting event that contains +only the essential keys of the original ``content`` object. If the original +event lacked a ``content`` object at all, a new empty JSON object is created +for it. + +The keys that are considered essential for the ``content`` object depend on the +the ``type`` of the event. These are: + +.. code:: + + type is "m.room.aliases": + aliases + + type is "m.room.create": + creator + + type is "m.room.history_visibility": + history_visibility + + type is "m.room.join_rules": + join_rule + + type is "m.room.member": + membership + + type is "m.room.power_levels": + ban + events + events_default + kick + redact + state_default + users + users_default + +The resulting stripped object with the new ``content`` object and the original +``hashes`` key is then signed using the JSON signing algorithm outlined below: + +.. code:: python + + def sign_event(event_json_object, name, key): + + # Make sure the event has a "hashes" key. + if "hashes" not in event_json_object: + event_json_object = hash_event(event_json_object) + + # Strip all the keys that would be removed if the event was redacted. + # The hashes are not stripped and cover all the keys in the event. + # This means that we can tell if any of the non-essential keys are + # modified or removed. + stripped_json_object = strip_non_essential_keys(event_json_object) + + # Sign the stripped JSON object. The signature only covers the + # essential keys and the hashes. This means that we can check the + # signature even if the event is redacted. + signed_json_object = sign_json(stripped_json_object) + + # Copy the signatures from the stripped event to the original event. + event_json_object["signatures"] = signed_json_oject["signatures"] + return event_json_object + +Servers can then transmit the entire event or the event with the non-essential +keys removed. If the entire event is present, receiving servers can then check +the event by computing the SHA-256 of the event, excluding the ``hash`` object. +If the keys have been redacted, then the ``hash`` object is included when +calculating the SHA-256 instead. + +New hash functions can be introduced by adding additional keys to the ``hash`` +object. Since the ``hash`` object cannot be redacted a server shouldn't allow +too many hashes to be listed, otherwise a server might embed illict data within +the ``hash`` object. For similar reasons a server shouldn't allow hash values +that are too long. + +.. TODO + [[TODO(markjh): We might want to specify a maximum number of keys for the + ``hash`` and we might want to specify the maximum output size of a hash]] + [[TODO(markjh) We might want to allow the server to omit the output of well + known hash functions like SHA-256 when none of the keys have been redacted]] + +.. _`Canonical JSON`: ../appendices.html#canonical-json diff --git a/specification/targets.yaml b/specification/targets.yaml index 841c9d61..b90201d1 100644 --- a/specification/targets.yaml +++ b/specification/targets.yaml @@ -20,7 +20,6 @@ targets: server_server: files: - server_server_api.rst - - { 1: event_signing.rst } version_label: "%SERVER_RELEASE_LABEL%" identity_service: files: @@ -33,6 +32,7 @@ targets: appendices: files: - appendices.rst + - appendices/signing_json.rst - appendices/threat_model.rst - appendices/test_vectors.rst groups: # reusable blobs of files when prefixed with 'group:'