diff --git a/proposals/2265-email-lowercase.md b/proposals/2265-email-lowercase.md new file mode 100644 index 00000000..5a1db682 --- /dev/null +++ b/proposals/2265-email-lowercase.md @@ -0,0 +1,101 @@ +# Proposal for mandating case folding when processing e-mail addresses + +[RFC822](https://tools.ietf.org/html/rfc822#section-3.4.7) mandates that +localparts in e-mail addresses must be processed with the original case +preserved. [The Matrix spec](https://matrix.org/docs/spec/appendices#pid-types) +doesn't mandate anything about processing e-mail addresses, other than the fact +that the domain part must be converted to lowercase, as domain names are case +insensitive. + +On the other hand, most major e-mail providers nowadays process the localparts +of e-mail addresses as case insensitive. Therefore, most users expect localparts +to be treated case insensitively, and get confused when it's not. Some users, +for example, get confused over the fact that registering a 3PID association for +`john.doe@example.com` doesn't mean that the association is valid for +`John.Doe@example.com`, and don't expect to have to remember the exact +case they used to initially register the association (and sometimes get locked +out of their account because of that). So far we've seen that confusion occur +and lead to troubles of various degrees over several deployments of Synapse and +Sydent. + +## Proposal + +This proposal suggests changing the specification of the e-mail 3PID type in +[the Matrix spec appendices](https://matrix.org/docs/spec/appendices#pid-types) +to mandate that, before any processing, e-mail addresses must go through a full +case folding based on [the unicode mapping +file](https://www.unicode.org/Public/8.0.0/ucd/CaseFolding.txt), on top of +having their domain lowercased. + +This means that `Strauß@Example.com` must be considered as being the same e-mail +address as `strauss@example.com`. + +## Other considered solutions + +A first look at this issue concluded that there was no need to add such a +mention to the spec, and that it can be considered an implementation detail. +However, [MSC2134](https://github.com/matrix-org/matrix-doc/pull/2134) changes +this: because hashing functions are case sensitive, we need both clients and +identity servers to follow the same policy regarding case sensitivity. + +An initial version of this proposal proposed to mandate lowercasing e-mail +addresses instead of case folding them, however it was pointed out that this +solution might not be the best and most future-proof one. + +Unicode normalisation was also looked at but judged unnecessary. + +## Tradeoffs + +Implementing this MSC in identity servers and homeservers might require the +databases of existing instances to be updated in a large part to case fold the +email addresses of existing associations, in order to avoid conflicts. However, +most of this update can usually be done by a background job running at startup, +so the UX improvement outweighs this trouble. + +## Potential issues + +### Conflicts with existing associations + +Some users might already have two different accounts associated with the same +e-mail address but with different cases. This appears to happen in a small +number of cases, however, and can be dealt with by the identity server's or the +homeserver's maintainer. + +For example, with Sydent, the process of dealing with such cases could look +like: + +1. list all MXIDs associated with a variant of the email address, and the + timestamp of that association +2. delete all associations except for the most recent one [0] +3. inform the user of the deletion by sending them an email notice to the email + address + +### Storing and querying + +Most database engines don't support case folding, therefore querying all +e-mail addresses matching a case folded e-mail address might not be trivial, +e.g. an identity server querying all associations for `strauss@example.com` when +processing a `/lookup` request would be expected to also get associations for +`Strauß@Example.com`. + +To address this issue, implementation maintainers are strongly encouraged to +make e-mail addresses go through a full case folding before storing them. + +### Implementing case folding + +The need for case folding in services on the Internet doesn't seem to be very +large currently (probably due to its young age), therefore there seem to be only +a few third-party implementation librairies out there. However, both +[Go](https://godoc.org/golang.org/x/text/cases#Fold), [Python +2](https://docs.python.org/2/library/stringprep.html#stringprep.map_table_b3) +and [Python 3](https://docs.python.org/3/library/stdtypes.html#str.casefold) +support it natively, and [a third-party JavaScript +implementation](https://github.com/ar-nelson/foldcase) exists which, although +young, seems to be working. + +## Footnotes + +[0]: This is specific to Sydent because of a bug it has where v1 lookups are +already processed case insensitively, which means it will return the most recent +association for any case of the given email address, therefore keeping only this +association won't change the result of v1 lookups.