Unable to Complete the Call as Dialed
Tim Bray: I’m not sure whether this free-TLD idea is a good or bad thing in the big picture
When I was a young’un, telephone area codes in North America had a zero or a one a the middle digit, and none of the exchanges in such area codes had such. This enabled telephone switching equipment to detect whether the number you were dialing was a local or long distance number without requiring a one to be dialed first. Eventually, phone numbers became scarce, and this was ditched.
This meant that the PBX equipment in a number of locations were unable to make calls to these new numbers, and had to be replaced.
The modern equivalent of this may be email addresses. Consider the fun that will occur when existing software is presented with email addresses that contain non-latin characters.
“Consider the fun that will occur when existing software is presented with email addresses that contain non-latin characters.”
It would probably make an excellent spam filter.
Posted by Snizz atWe’ve had internationalized domain names for a while now, so you already could have non-latin character in email addresses. You could already have 日本語.jp, for example, now they’ll be able to have 日本語.日本 (or something like that, I don’t actually speak, read or write Japanese).
Posted by Pierre Phaneuf at
Many mail clients already work just fine with IDNs. This has been tested for many years. Having IDNs as the TLD will possibly tickle bugs in a few poorly-written mail clients and servers, but not many. In specific, if a mail server can handle mail from sam@example.info, they will probably handle mail from sam@éxample.éxample as well (which is just sam@xn--xample-9ua.xn--xample-9ua).
Where it gets much dicier is sám@example.com. The IETF work on that idea can be found in the EAI WG.
Posted by Paul Hoffman atClarifications and comments:
1) While one can construct IRIs using the existing TLDs, such is relatively uncommon. By contrast all TLDs which consist of anything but two to four ASCII “word” characters are likely to be problematic.
2) While I would expect that the overwhelming majority of highly deployed software that deals directly with SMTP will be able to handle IDNA encoded IRI’s, it is no where near as clear to me that the typical PHP application into which enters email addresses in human readable form will be able to cope as well.
3) Humorously, the spam comment cuts both ways. If TLDs are expanded, there may be a short period where email addresses that go beyond regular expressions such as this one may be relatively safe from email harvesting crawlers.
Posted by Sam Ruby atWe already have this problem; even plus-addresses (foo+bar@example.com) trip some regex filters up, never mind all the wacky variants that rfc822 actually allows. (the full regexp is pretty funny)
I believe it was the BBC’s awful comment system that let me put in a plus-address at the registration stage but couldn’t send to that kind of address, leaving me with an address that was ‘registered’ but couldn’t be confirmed because the confirmation email didn’t escape their event horizon.
Fortunately, most mail servers aren’t half as bad at dealing with this as most webapps, and its a smaller ecosystem to fix.
Posted by Baz atMTAs implement RFC 821 (and may implement RFC2821).
Quoting RFC 2821, Section 2.3.5
([link]):
A domain (or domain name) consists of one or more dot-separated
components. These components ("labels" in DNS terminology [22]) are
restricted for SMTP purposes to consist of a sequence of letters,
digits, and hyphens drawn from the ASCII character set [1]. Domain
names are used as names of hosts and of other entities in the domain
name hierarchy. For example, a domain may refer to an alias (label
of a CNAME RR) or the label of Mail eXchanger records to be used to
deliver mail instead of representing a host name. See [22] and
section 5 of this specification.
Allowing the usage of character sets in (E)SMTP outside 7 bit ASCII will
involve a full fledged upgrade of all SMTP servers.
Another issue with the punycoding of domain names is that you still need to be
able to determine the original character set to be able to lookup the correct
MX record.
WRT Paul Hoffman’s comment, the situation is exactly the reverse of what
today’s situation is. I can create an email address with non ASCII characters
in the localpart (before the @), and they will pass through unchanged as long
as they are quoted appropriately. Non ASCII characters in the domainpart are
not likely to be well supported, if at all.
Many poorly written systems can’t even deal with TLDs that have more than 3 letters, like .info.
Posted by Fazal Majid at