Table of Contents
IDN allows you to use all the characters (or even non-characters such as ideograms) of all human scripts in your domain names. The DNS protocol allows non-ASCII characters for a very long time but DNS usage reduced the set of acceptable characters to a subset of US-ASCII.
To allow more characters, thus enabling you to have domain names properly written in your language, the IETF decided not to change the DNS protocol but rather to ask applications to transform IDN into US-ASCII.
The repertoire of characters used by IDNA is Unicode[1].
The current IETF standard is represented by four RFCs:
RFC 3490 (RFC means Request For Comments. The RFC are available on the IETF server.), "Internationalizing Domain Names in Applications (IDNA)" sets the base protocol.
As its name says, all the work have to be done by the applications. On the wire, in the zone file, you will find only US-ASCII.
RFC 3454, "Preparation of Internationalized Strings ("stringprep")" and RFC 3491, "Nameprep: A Stringprep Profile for Internationalized Domain Names (IDN)", describe the steps to take when receiving a domain name. IDNA applications will have to canonicalize the names to bring them to a common ("canonic") form before testing for unicity.
For instance, in German, "maße" and "masse" will be the same name, after nameprep canonicalization. In French, "CAFÉ" and "café" will be the same (but not "CAFE", which is not a proper spelling).
RFC 3492, "Punycode: A Bootstring encoding of Unicode
for Internationalized Domain Names in Applications (IDNA)"
specifies the encoding used. Unicode names are transformed into
US-ASCII names (ACE: ASCII Compatible Encoding) which start with the common prefix "xn--". For
instance, stéphane.org becomes
xn--stphane-cya.org.
You can try these transformations online at EUREG, IBM or Josefsson.
For a registry, if you want to register IDN, you will have to address some policy issues. For instance:
Do you accept that two variants (names that are more or less the same, according to the rules for a given language, but are different according to RFC 3490, are registered by different registrants?
Do you accept all the characters of Unicode or just a subset which fits your local language(s)?
These policy issues are discussed in the idn-reg-policy mailing list.
You can register IDN with no tools at all if you just store the ACE strings. But if you want to perform nameprep and punycode yourself, or if you want to implement bundles (the set of all names that are simple variants of the registered name), you will need to write some code.
GNU libidn, a free software implementation of IDN. Nothing to write, just use it.
For every question about generic NIC, please ask info@generic-nic.net.
(last rebuild by WML 2.0.11 (19-Aug-2006): Monday 10 November 2008)