From d1efd488b7c3d62472db063a86b7084403cc4ab8 Mon Sep 17 00:00:00 2001 From: Brendan Abolivier Date: Fri, 30 Aug 2019 14:43:17 +0100 Subject: [PATCH 1/9] Proposal for mandating lowercasing when processing e-mail address localparts --- proposals/xxxx-email-lowercase.md | 65 +++++++++++++++++++++++++++++++ 1 file changed, 65 insertions(+) create mode 100644 proposals/xxxx-email-lowercase.md diff --git a/proposals/xxxx-email-lowercase.md b/proposals/xxxx-email-lowercase.md new file mode 100644 index 00000000..cef94fa8 --- /dev/null +++ b/proposals/xxxx-email-lowercase.md @@ -0,0 +1,65 @@ +# Proposal for mandating lowercasing when processing e-mail address localparts + +[RFC822](https://tools.ietf.org/html/rfc822#section-3.4.7) mandates that +localparts in e-mail addresses must be processed with the original case +preserved. [The Matrix spec](https://matrix.org/docs/spec/appendices#pid-types) +doesn't mandate anything about processing e-mail addresses, other than the fact +that the domain part must be converted to lowercase, as domain names are case +insensitive. + +On the other hand, most major e-mail providers nowadays process the localparts +of e-mail addresses as case insensitive. Therefore, most users expect localparts +to be treated case insensitively, and get confused when it's not. Some users, +for example, get confused over the fact that registering a 3PID association for +`john.doe@example.com` doesn't mean that the association is valid for +`John.Doe@example.com`, and don't expect to be expected to remember the exact +case they used to initially register the association (and sometimes get locked +out of their account because of that). So far we've seen that confusion occur +and lead to troubles of various degrees over several deployments of Synapse and +Sydent. + +## Proposal + +This proposal suggests changing the specification of the e-mail 3PID type in +[the Matrix spec appendices](https://matrix.org/docs/spec/appendices#pid-types) +to mandate that any e-mail address must be entirely converted to lowercase +before any processing, instead of only its domain. + +## Other considered solutions + +A first look at this issue concluded that there was no need to add such a +mention to the spec, and that it can be considered as an implementation detail. +However, [MSC2134](https://github.com/matrix-org/matrix-doc/pull/2134) changes +this: because hashing functions are case sensitive, we need both clients and +identity servers to follow the same policy regarding case sensitivity. + +## Tradeoffs + +Implementing this MSC in identity servers would require the databases of +existing identity servers to be updated in a large part to convert the email +addresses of existing associations to lowercase, in order to avoid conflicts. +However, most of this update can usually be done by a single database query (or +a background job running at startup), so the UX improvement outweights this +trouble. + +## Potential issues + +Some users might already have two different accounts associated with the same +e-mail address but with different cases. This appears to happen in a small +number of cases, however, and can be dealt by the identity server's maintainer. + +For example, with Sydent, the process of dealing with such cases could look +like: + +1. list all MXIDs associated with a variant of the email address, and the + timestamp of that association +2. delete all associations except for the most recent one [0] +3. inform the user of the deletion by sending them an email notice to the email + address + +## Footnotes + +[0]: This is specific to Sydent because of a bug it has where v1 lookups are +already processed case insensitively, which means it will return the most recent +association for any case of the given email address, therefore keeping only this +association won't change the result of v1 lookups. From 60354f8cf90753667d797dd97a348550b5c121db Mon Sep 17 00:00:00 2001 From: Brendan Abolivier Date: Fri, 30 Aug 2019 14:47:02 +0100 Subject: [PATCH 2/9] MSC number --- proposals/{xxxx-email-lowercase.md => 2265-email-lowercase.md} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename proposals/{xxxx-email-lowercase.md => 2265-email-lowercase.md} (100%) diff --git a/proposals/xxxx-email-lowercase.md b/proposals/2265-email-lowercase.md similarity index 100% rename from proposals/xxxx-email-lowercase.md rename to proposals/2265-email-lowercase.md From 524ec52f73fded7b43a0ae44b1c9b2e4175319c2 Mon Sep 17 00:00:00 2001 From: Brendan Abolivier Date: Mon, 2 Sep 2019 13:41:07 +0100 Subject: [PATCH 3/9] Wording Co-Authored-By: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com> --- proposals/2265-email-lowercase.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/2265-email-lowercase.md b/proposals/2265-email-lowercase.md index cef94fa8..0e862272 100644 --- a/proposals/2265-email-lowercase.md +++ b/proposals/2265-email-lowercase.md @@ -12,7 +12,7 @@ of e-mail addresses as case insensitive. Therefore, most users expect localparts to be treated case insensitively, and get confused when it's not. Some users, for example, get confused over the fact that registering a 3PID association for `john.doe@example.com` doesn't mean that the association is valid for -`John.Doe@example.com`, and don't expect to be expected to remember the exact +`John.Doe@example.com`, and don't expect to have to remember the exact case they used to initially register the association (and sometimes get locked out of their account because of that). So far we've seen that confusion occur and lead to troubles of various degrees over several deployments of Synapse and From 552f71a9f9cb088f501c0b899704908e71ba0dd9 Mon Sep 17 00:00:00 2001 From: Brendan Abolivier Date: Mon, 2 Sep 2019 13:41:23 +0100 Subject: [PATCH 4/9] Wording Co-Authored-By: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com> --- proposals/2265-email-lowercase.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/2265-email-lowercase.md b/proposals/2265-email-lowercase.md index 0e862272..50bad406 100644 --- a/proposals/2265-email-lowercase.md +++ b/proposals/2265-email-lowercase.md @@ -28,7 +28,7 @@ before any processing, instead of only its domain. ## Other considered solutions A first look at this issue concluded that there was no need to add such a -mention to the spec, and that it can be considered as an implementation detail. +mention to the spec, and that it can be considered an implementation detail. However, [MSC2134](https://github.com/matrix-org/matrix-doc/pull/2134) changes this: because hashing functions are case sensitive, we need both clients and identity servers to follow the same policy regarding case sensitivity. From bddadfeb184b6546a2e74782d2786bfc168075ee Mon Sep 17 00:00:00 2001 From: Brendan Abolivier Date: Mon, 2 Sep 2019 13:41:33 +0100 Subject: [PATCH 5/9] Typo Co-Authored-By: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com> --- proposals/2265-email-lowercase.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/2265-email-lowercase.md b/proposals/2265-email-lowercase.md index 50bad406..166de458 100644 --- a/proposals/2265-email-lowercase.md +++ b/proposals/2265-email-lowercase.md @@ -39,7 +39,7 @@ Implementing this MSC in identity servers would require the databases of existing identity servers to be updated in a large part to convert the email addresses of existing associations to lowercase, in order to avoid conflicts. However, most of this update can usually be done by a single database query (or -a background job running at startup), so the UX improvement outweights this +a background job running at startup), so the UX improvement outweighs this trouble. ## Potential issues From 997360995ccc5d3d46b2d8983601afc10a0d47ac Mon Sep 17 00:00:00 2001 From: Brendan Abolivier Date: Mon, 2 Sep 2019 13:41:49 +0100 Subject: [PATCH 6/9] Wording Co-Authored-By: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com> --- proposals/2265-email-lowercase.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/2265-email-lowercase.md b/proposals/2265-email-lowercase.md index 166de458..3bc3c445 100644 --- a/proposals/2265-email-lowercase.md +++ b/proposals/2265-email-lowercase.md @@ -46,7 +46,7 @@ trouble. Some users might already have two different accounts associated with the same e-mail address but with different cases. This appears to happen in a small -number of cases, however, and can be dealt by the identity server's maintainer. +number of cases, however, and can be dealt with by the identity server's maintainer. For example, with Sydent, the process of dealing with such cases could look like: From 520c76a1cb8e8669ac624be3c36cc606aeb4cba6 Mon Sep 17 00:00:00 2001 From: Brendan Abolivier Date: Mon, 2 Sep 2019 13:56:01 +0100 Subject: [PATCH 7/9] Spell out that the proposal also concerns homeservers --- proposals/2265-email-lowercase.md | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/proposals/2265-email-lowercase.md b/proposals/2265-email-lowercase.md index 3bc3c445..5698a8c2 100644 --- a/proposals/2265-email-lowercase.md +++ b/proposals/2265-email-lowercase.md @@ -35,18 +35,19 @@ identity servers to follow the same policy regarding case sensitivity. ## Tradeoffs -Implementing this MSC in identity servers would require the databases of -existing identity servers to be updated in a large part to convert the email -addresses of existing associations to lowercase, in order to avoid conflicts. -However, most of this update can usually be done by a single database query (or -a background job running at startup), so the UX improvement outweighs this -trouble. +Implementing this MSC in identity servers and homeservers might require the +databases of existing instances to be updated in a large part to convert the +email addresses of existing associations to lowercase, in order to avoid +conflicts. However, most of this update can usually be done by a single database +query (or a background job running at startup), so the UX improvement outweighs +this trouble. ## Potential issues Some users might already have two different accounts associated with the same e-mail address but with different cases. This appears to happen in a small -number of cases, however, and can be dealt with by the identity server's maintainer. +number of cases, however, and can be dealt with by the identity server's or the +homeserver's maintainer. For example, with Sydent, the process of dealing with such cases could look like: From 6b0a8505ec4ef375d7b8dad0baeb01d12061bd09 Mon Sep 17 00:00:00 2001 From: Brendan Abolivier Date: Thu, 19 Sep 2019 17:34:25 +0100 Subject: [PATCH 8/9] Propose case folding instead of lowercasing --- proposals/2265-email-lowercase.md | 51 ++++++++++++++++++++++++++----- 1 file changed, 43 insertions(+), 8 deletions(-) diff --git a/proposals/2265-email-lowercase.md b/proposals/2265-email-lowercase.md index 5698a8c2..935e6f2c 100644 --- a/proposals/2265-email-lowercase.md +++ b/proposals/2265-email-lowercase.md @@ -1,4 +1,4 @@ -# Proposal for mandating lowercasing when processing e-mail address localparts +# Proposal for mandating case folding when processing e-mail address localparts [RFC822](https://tools.ietf.org/html/rfc822#section-3.4.7) mandates that localparts in e-mail addresses must be processed with the original case @@ -22,8 +22,13 @@ Sydent. This proposal suggests changing the specification of the e-mail 3PID type in [the Matrix spec appendices](https://matrix.org/docs/spec/appendices#pid-types) -to mandate that any e-mail address must be entirely converted to lowercase -before any processing, instead of only its domain. +to mandate that, before any processing, e-mail address localparts must go +through a full case folding based on [the unicode mapping +file](https://www.unicode.org/Public/8.0.0/ucd/CaseFolding.txt), on top of +having their domain lowercased. + +This means that `Strauß@Example.com` must be considered as being the same e-mail +address as `strauss@example.com`. ## Other considered solutions @@ -33,17 +38,24 @@ However, [MSC2134](https://github.com/matrix-org/matrix-doc/pull/2134) changes this: because hashing functions are case sensitive, we need both clients and identity servers to follow the same policy regarding case sensitivity. +An initial version of this proposal proposed to mandate lowercasing e-mail +addresses instead of case folding them, however it was pointed out that this +solution might not be the best and most future-proof one. + +Unicode normalisation was also looked at but judged unnecessary. + ## Tradeoffs Implementing this MSC in identity servers and homeservers might require the -databases of existing instances to be updated in a large part to convert the -email addresses of existing associations to lowercase, in order to avoid -conflicts. However, most of this update can usually be done by a single database -query (or a background job running at startup), so the UX improvement outweighs -this trouble. +databases of existing instances to be updated in a large part to case fold the +email addresses of existing associations, in order to avoid conflicts. However, +most of this update can usually be done by a background job running at startup, +so the UX improvement outweighs this trouble. ## Potential issues +### Conflicts with existing associations + Some users might already have two different accounts associated with the same e-mail address but with different cases. This appears to happen in a small number of cases, however, and can be dealt with by the identity server's or the @@ -58,6 +70,29 @@ like: 3. inform the user of the deletion by sending them an email notice to the email address +### Storing and querying + +Most database engines don't support case folding, therefore querying all +e-mail addresses matching a case folded e-mail address might not be trivial, +e.g. an identity server querying all associations for `strauss@example.com` when +processing a `/lookup` request would be expected to also get associations for +`Strauß@Example.com`. + +To address this issue, implementation maintainers are strongly encouraged to +make e-mail addresses go through a full case folding before storing them. + +### Implementing case folding + +The need for case folding in services on the Internet doesn't seem to be very +large currently (probably due to its young age), therefore there seem to be only +a few third-party implementation librairies out there. However, both +[Go](https://godoc.org/golang.org/x/text/cases#Fold), [Python +2](https://docs.python.org/2/library/stringprep.html#stringprep.map_table_b3) +and [Python 3](https://docs.python.org/3/library/stdtypes.html#str.casefold) +support it natively, and [a third-party JavaScript +implementation](https://github.com/ar-nelson/foldcase) exists which, although +young, seems to be working. + ## Footnotes [0]: This is specific to Sydent because of a bug it has where v1 lookups are From 2e2f1c1174e4487d79b23f2cb9604f62066dd0c6 Mon Sep 17 00:00:00 2001 From: Brendan Abolivier Date: Tue, 4 Feb 2020 16:50:50 +0000 Subject: [PATCH 9/9] Extend the scope of casefolding to the whole address --- proposals/2265-email-lowercase.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/proposals/2265-email-lowercase.md b/proposals/2265-email-lowercase.md index 935e6f2c..5a1db682 100644 --- a/proposals/2265-email-lowercase.md +++ b/proposals/2265-email-lowercase.md @@ -1,4 +1,4 @@ -# Proposal for mandating case folding when processing e-mail address localparts +# Proposal for mandating case folding when processing e-mail addresses [RFC822](https://tools.ietf.org/html/rfc822#section-3.4.7) mandates that localparts in e-mail addresses must be processed with the original case @@ -22,8 +22,8 @@ Sydent. This proposal suggests changing the specification of the e-mail 3PID type in [the Matrix spec appendices](https://matrix.org/docs/spec/appendices#pid-types) -to mandate that, before any processing, e-mail address localparts must go -through a full case folding based on [the unicode mapping +to mandate that, before any processing, e-mail addresses must go through a full +case folding based on [the unicode mapping file](https://www.unicode.org/Public/8.0.0/ucd/CaseFolding.txt), on top of having their domain lowercased.