Domain Name Character Rules: What Is and Is Not Allowed
Domain names look simple — letters and dots. The actual rules have edge cases that matter when you are registering programmatically, validating user input, or building anything that accepts domain names as data.
Standard gTLD Rules (RFC 1123)
For most gTLDs (.com, .net, .org, and most new gTLDs), the rules are:
- Characters allowed: letters (a-z), digits (0-9), hyphens (-)
- Length: 1 to 63 characters per label (the part between dots)
- Case: case-insensitive. Example.com and EXAMPLE.COM are the same domain
- Hyphens: allowed in the middle, not at the start or end of a label
- Special rule: labels cannot start with "xn--" unless they are a valid ACE (ASCII-Compatible Encoding) label for an IDN
So "my-product.com" is valid. "-myproduct.com" is not. "my--product.com" is valid (double hyphen is allowed in the middle, except for the specific "xn--" restriction).
What Is Not Allowed
- Spaces: not allowed anywhere
- Underscores: not allowed in hostnames per the DNS spec, though some registries permit them in labels (and underscore-prefixed labels are used for service records like _dmarc or _acme-challenge)
- Special characters: @, !, #, %, $, and all other symbols are not allowed in domain names
- Leading or trailing hyphens in any label
- Empty labels (double dots): "my..domain.com" is invalid
- Total length over 253 characters (sum of all labels and dots)
IDN: Internationalised Domain Names
IDNs allow non-ASCII characters — Arabic, Chinese, Cyrillic, Japanese, and others. These are encoded using Punycode, which converts the Unicode characters into an ASCII-compatible form.
The visible form "münchen.de" becomes the Punycode form "xnmnchen-3ya.de" at the DNS level. Both representations refer to the same domain.
Not all TLDs support IDN registrations. The registry must have IDN policies and tables specifying which Unicode characters are permitted. Most major gTLDs and many ccTLDs support IDNs, but support varies widely for new gTLDs.
Registry-Specific Restrictions
Beyond the basic RFC rules, individual registries add their own restrictions:
Reserved labels. Registries typically reserve common words. You cannot register "www.com" — "www" is reserved as a standard subdomain designation. Many registries also reserve words like "mail," "ftp," "localhost," and single-character labels.
Minimum length. Some TLDs require a minimum of 3 characters in the second-level label. .io allows two-character registrations. .com requires at least 1 character technically, but registries in practice typically require 2+.
Blocked categories. Some registry policies block registrations that match patterns: phone numbers, reserved words from ICANN policy, certain keyword categories. The exact list is registry-specific.
Validating Domain Names in Code
A simple regex for basic domain validation (not comprehensive):
const domainRegex = /^(?!-)[a-zA-Z0-9-]{1,63}(?<!-)(\.(?!-)[a-zA-Z0-9-]{1,63}(?<!-))*\.[a-zA-Z]{2,}$/;
This does not handle IDNs. For IDN support, you need a library that handles Punycode conversion and Unicode validation.
For production use, prefer a dedicated validation library:
- Node.js:
is-valid-domainpackage - Python:
validatorslibrary - Go: standard library's
netpackage includes domain validation utilities
Regex-based domain validation is fragile. Edge cases — single-label domains, extremely long TLDs, IDN labels — break simple patterns. Use a library with a maintained test suite.
RDAP and Character Encoding
RDAP uses JSON responses with UTF-8 encoding. Domain names in RDAP are returned in their Unicode form (not Punycode) when the domain is an IDN. This is relevant if you are parsing RDAP responses programmatically — an IDN domain returned from an RDAP query may contain characters outside the ASCII range.