- adraffy
 
ENSIP-15: ENS Name Normalization Standard
Abstract
This ENSIP standardizes Ethereum Name Service (ENS) name normalization process outlined in ENSIP-1 § Name Syntax.
Motivation
- Since ENSIP-1 (originally EIP-137) was finalized in 2016, Unicode has evolved from version 8.0.0 to 15.0.0 and incorporated many new characters, including complex emoji sequences.
 - ENSIP-1 does not state the version of Unicode.
 - ENSIP-1 implies but does not state an explicit flavor of IDNA processing.
 - UTS-46 is insufficient to normalize emoji sequences. Correct emoji processing is only possible with UTS-51.
 - Validation tests are needed to ensure implementation compliance.
 - The success of ENS has encouraged spoofing via the following techniques:
- Insertion of zero-width characters.
 - Using names which normalize differently between algorithms.
 - Using names which appear differently between applications and devices.
 - Substitution of confusable (look-alike) characters.
 - Mixing incompatible scripts.
 
 
Specification
- Unicode version 
15.1.0- Normalization is a living specification and should use the latest stable version of Unicode.
 
 spec.jsoncontains all necessary data for normalization.nf.jsoncontains all necessary data for Unicode Normalization Forms NFC and NFD.
Definitions
- Terms in bold throughout this document correspond with components of 
spec.json. - A string is a sequence of Unicode codepoints.
- Example: 
"abc"is61 62 63 
 - Example: 
 - An Unicode emoji is a single entity composed of one or more codepoints:
- An Emoji Sequence is the preferred form of an emoji, resulting from input that tokenized into an 
Emojitoken.- Example: 
💩︎︎ [1F4A9]→Emoji[1F4A9 FE0F]1F4A9 FE0Fis the Emoji Sequence.
 
 - Example: 
 spec.jsoncontains the complete list of valid Emoji Sequences.- Derivation defines which emoji are normalizable.
 - Not all Unicode emoji are valid.
‼ [203C] double exclamation mark→ error: Disallowed character🈁 [1F201] Japanese “here” button→Text["ココ"]
 
- An Emoji Sequence may contain characters that are disallowed:
👩❤️👨 [1F469 200D 2764 FE0F 200D 1F468] couple with heart: woman, man— contains ZWJ#️⃣ [23 FE0F 20E3] keycap: #— contains23 (#)🏴 [1F3F4 E0067 E0062 E0065 E006E E0067 E007F]— containsE00XX
 - An Emoji Sequence may contain other emoji:
- Example: 
❤️ [2764 FE0F] red heartis a substring of❤️🔥 [2764 FE0F 200D 1F525] heart on fire 
 - Example: 
 - Single-codepoint emoji may have various presentation styles on input:
- Default: 
❤ [2764] - Text: 
❤︎ [2764 FE0E] - Emoji: 
❤️ [2764 FE0F] 
 - Default: 
 - However, these all tokenize to the same Emoji Sequence.
 - All Emoji Sequence have explicit emoji-presentation.
 - The convention of ignoring presentation is difficult to change because:
- Presentation characters (
FE0FandFE0E) are Ignored - ENSIP-1 did not treat emoji differently from text
 - Registration hashes are immutable
 
 - Presentation characters (
 - Beautification can be used to restore emoji-presentation in normalized names.
 
 - An Emoji Sequence is the preferred form of an emoji, resulting from input that tokenized into an 
 
Algorithm
- Normalization is the process of canonicalizing a name before for hashing.
 - It is idempotent: applying normalization multiple times produces the same result.
 - For user convenience, leading and trailing whitespace should be trimmed before normalization, as all whitespace codepoints are disallowed. Inner characters should remain unmodified.
 - No string transformations (like case-folding) should be applied.
 
Normalize
- Tokenize — transform the label into 
TextandEmojitokens.- If there are no tokens, the label cannot be normalized.
 
 - Apply NFC to each 
Texttoken.- Example: 
Text["à"]→[61 300] → [E0]→Text["à"] 
 - Example: 
 - Strip 
FE0Ffrom eachEmojitoken. - Validate — check if the tokens are valid and obtain the Label Type.
- The Label Type and Restricted state may be presented to user for additional security.
 
 - Concatenate the tokens together.
- Return the normalized label.
 
 
Examples:
"_$A" [5F 24 41]→"_$a" [5F 24 61]— ASCII"E︎̃" [45 FE0E 303]→"ẽ" [1EBD]— Latin"𓆏🐸" [1318F 1F438]→"𓆏🐸" [1318F 1F438]— Restricted: Egyp"nı̇ck" [6E 131 307 63 6B]→ error: Disallowed character
Tokenize
Convert a label into a list of Text and Emoji tokens, each with a payload of codepoints.  The complete list of character types and emoji sequences can be found in spec.json.
- Allocate an empty codepoint buffer.
 - Find the longest Emoji Sequence that matches the remaining input.
- Example: 
👨🏻💻 [1F468 1F3FB 200D 1F4BB]- Match (1): 
👨️ [1F468] man - Match (2): 
👨🏻 [1F468 1F3FB] man: light skin tone - Match (4): 
👨🏻💻 [1F468 1F3FB 200D 1F4BB] man technologist: light skin tone— longest match! 
 - Match (1): 
 FE0Fis optional from the input during matching.- Example: 
👨❤️👨 [1F468 200D 2764 FE0F 200D 1F468]- Match: 
1F468 200D 2764 FE0F 200D 1F468— fully-qualified - Match: 
1F468 200D 2764 200D 1F468— missingFE0F - No match: 
1F468 FE0F 200D 2764 FE0F 200D 1F468— extraFE0F - No match: 
1F468 200D 2764 FE0F FE0F 200D 1F468— has (2)FE0F 
 - Match: 
 
- Example: 
 - This is equivalent to 
/^(emoji1|emoji2|...)/where\uFE0Fis replaced with\uFE0F?and*is replaced with\x2A. 
 - Example: 
 - If an Emoji Sequence is found:
- If the buffer is nonempty, emit a 
Texttoken, and clear the buffer. - Emit an 
Emojitoken with the fully-qualified matching sequence. - Remove the matched sequence from the input.
 
 - If the buffer is nonempty, emit a 
 - Otherwise:
- Remove the leading codepoint from the input.
 - Determine the character type:
- If Valid, append the codepoint to the buffer.
- This set can be precomputed from the union of characters in all groups and their NFD decompositions.
 
 - If Mapped, append the corresponding mapped codepoint(s) to the buffer.
 - If Ignored, do nothing.
 - Otherwise, the label cannot be normalized.
 
 - If Valid, append the codepoint to the buffer.
 
 - Repeat until all the input is consumed.
 - If the buffer is nonempty, emit a final 
Texttoken with its contents.- Return the list of emitted tokens.
 
 
Examples:
"xyz👨🏻" [78 79 7A 1F468 1F3FB]→Text["xyz"]+Emoji["👨🏻"]"A💩︎︎b" [41 FE0E 1F4A9 FE0E FE0E 62]→Text["a"]+Emoji["💩️"]+Text["b"]"a™️" [61 2122 FE0F]→Text["atm"]
Validate
Given a list of Emoji and Text tokens, determine if the label is valid and return the Label Type.  If any assertion fails, the name cannot be normalized.
- If only 
Emojitokens:- Return 
"Emoji" 
 - Return 
 - If a single 
Texttoken and every characters is ASCII (00..7F):5F (_) LOW LINEcan only occur at the start.- Must match 
/^_*[^_]*$/ - Examples: 
"___"and"__abc"are valid,"abc__"and"_abc_"are invalid. 
- Must match 
 - The 3rd and 4th characters must not both be 
2D (-) HYPHEN-MINUS.- Must not match 
/^..--/ - Examples: 
"ab-c"and"---a"are valid,"xn--"and----are invalid. 
 - Must not match 
 - Return 
"ASCII"- The label is free of Fenced and Combining Mark characters, and not confusable.
 
 
 - Concatenate all the tokens together.
5F (_) LOW LINEcan only occur at the start.- The first and last characters cannot be Fenced.
- Examples: 
"a’s"and"a・a"are valid,"’85"and"joneses’"and"・a・"are invalid. 
 - Examples: 
 - Fenced characters cannot be contiguous.
- Examples: 
"a・a’s"is valid,"6’0’’"and"a・・a"are invalid. 
 - Examples: 
 
 - The first character of every 
Texttoken must not be a Combining Mark. - Concatenate the 
Texttokens together. - Find the first Group that contain every text character:
- If no group is found, the label cannot be normalized.
 
 - If the group is not CM Whitelisted:
- Apply NFD to the concatenated text characters.
 - For every contiguous sequence of NSM characters:
- Each character must be unique.
- Example: 
"x̀̀" [78 300 300]has (2) grave accents. 
 - Example: 
 - The number of NSM characters cannot exceed Maximum NSM (4).
- Example: 
"إؐؑؒؓؔ" [625 610 611 612 613 614]has (6) NSM. 
 - Example: 
 
 - Each character must be unique.
 
 - Wholes — check if text characters form a confusable.
 - The label is valid.
- Return the name of the group as the Label Type.
 
 
Examples:
Emoji["💩️"]+Emoji["💩️"]→"Emoji"Text["abc$123"]→"ASCII"Emoji["🚀️"]+Text["à"]→"Latin"
Wholes
A label is whole-script confusable if a similarly-looking valid label can be constructed using one alternative character from a different group.  The complete list of Whole Confusables can be found in spec.json.  Each Whole Confusable has a set of non-confusing characters ("valid") and a set of confusing characters ("confused") where each character may be the member of one or more groups.
Example: Whole Confusable for "g"
| Type | Code | Form | Character  | Latn | Hani | Japn | Kore | Armn | Cher | Lisu |
| :-: | -: | :-: | :- | :-: | :-: | :-: | :-: | :-: | :-: | :-: |
| valid | 67 | g | LATIN SMALL LETTER G | A | A | A | A |
| confused | 581 | ց | ARMENIAN SMALL LETTER CO  | | | | | B |
| confused | 13C0 | Ꮐ | CHEROKEE LETTER NAH  | | | | | | C |
| confused | 13F3 | Ᏻ | CHEROKEE LETTER YU  |	| | | | | C |
| confused |  A4D6 | ꓖ | LISU LETTER GA | | | | | | | D |
- Allocate an empty character buffer.
 - Start with the set of ALL groups.
 - For each unique character in the label:
- If the character is Confused (a member of a Whole Confusable):
- Retain groups with Whole Confusable characters excluding the Confusable Extent of the matching Confused character.
 - If no groups remain, the label is not confusable.
 - The Confusable Extent is the fully-connected graph formed from different groups with the same confusable and different confusables of the same group.
- The mapping from Confused to Confusable Extent can be precomputed.
 
 - In the table above, Whole Confusable for 
"g", the rectangle formed by each capital letter is a Confusable Extent:Ais [g] ⊗ [Latin, Han, Japanese, Korean]Bis [ց] ⊗ [Armn]Cis [Ꮐ,Ᏻ] ⊗ [Cher]Dis [ꓖ] ⊗ [Lisu]
 - A Confusable Extent can span multiple characters and multiple groups.  Consider the (incomplete) Whole Confusable for 
"o":6F (o) LATIN SMALL LETTER O→ Latin, Han, Japanese, and Korean3007 (〇) IDEOGRAPHIC NUMBER ZERO→ Han, Japanese, Korean, and Bopomofo- Confusable Extent is [
o,〇] ⊗ [Latin, Han, Japanese, Korean, Bopomofo] 
 
 - If the character is Unique, the label is not confusable.
- This set can be precomputed from characters that appear in exactly one group and are not Confused.
 
 - Otherwise:
- Append the character to the buffer.
 
 
 - If the character is Confused (a member of a Whole Confusable):
 - If any Confused characters were found:
- If there are no buffered characters, the label is confusable.
 - If any of the remaining groups contain all of the buffered characters, the label is confusable.
 - Example: 
"0х" [30 445]30 (0) DIGIT ZERO- Not Confused or Unique, add to buffer.
 
445 (х) CYRILLIC SMALL LETTER HA- Confusable Extent is [
х,4B3 (ҳ) CYRILLIC SMALL LETTER HA WITH DESCENDER] ⊗ [Cyrillic] - Whole Confusable excluding the extent is [
78 (x) LATIN SMALL LETTER X, ...] → [Latin, ...] - Remaining groups: ALL ∩ [Latin, ...] → [Latin, ...]
 
- Confusable Extent is [
 - There was (1) buffered character:
- Latin also contains 
30→"0x" [30 78] 
 - Latin also contains 
 - The label is confusable.
 
 
 - The label is not confusable.
 
A label composed of confusable characters isn't necessarily confusable.
- Example: 
"тӕ" [442 4D5]442 (т) CYRILLIC SMALL LETTER TE- Confusable Extent is [
т] ⊗ [Cyrillic] - Whole Confusable excluding the extent is [
3C4 (τ) GREEK SMALL LETTER TAU] → [Greek] - Remaining groups: ALL ∩ [Greek] → [Greek]
 
- Confusable Extent is [
 4D5 (ӕ) CYRILLIC SMALL LIGATURE A IE- Confusable Extent is [
ӕ] ⊗ [Greek] - Whole Confusable excluding the extent is [
E6 (æ) LATIN SMALL LETTER AE] → [Latin] - Remaining groups: [Greek] ∩ [Latin] → ∅
 
- Confusable Extent is [
 - No groups remain so the label is not confusable.
 
 
Split
- Partition a name into labels, separated by 
2D (.) FULL STOP, and return the resulting array.- Example: 
"abc.123.eth"→["abc", "123", "eth"] 
 - Example: 
 - The empty string is 0-labels: 
""→[] 
Join
- Assemble an array of labels into a name, inserting 
2D (.) FULL STOPbetween each label, and return the resulting string.- Example: 
["abc", "123", "eth"]→"abc.123.eth" 
 - Example: 
 
Description of spec.json
- Groups (
"groups") — groups of characters that can constitute a label"name"— ASCII name of the group (or abbreviation if Restricted)- Examples: Latin, Japanese, Egyp
 
- Restricted (
"restricted") —trueif Excluded or Limited-Use script- Examples: Latin → 
false, Egyp →true 
 - Examples: Latin → 
 "primary"— subset of characters that define the group- Examples: 
"a"→ Latin,"あ"→ Japanese,"𓀀"→ Egyp 
- Examples: 
 "secondary"— subset of characters included with the group- Example: 
"0"→ Common but mixable with Latin 
- Example: 
 - CM Whitelist(ed) (
"cm") — (optional) set of allowed compound sequences in NFC- Each compound sequence is a character followed by one or more Combining Marks.
- Example: 
à̀̀→E0 300 300 
 - Example: 
 - Currently, every group that is CM Whitelist has zero compound sequences.
 - CM Whitelisted is effectively 
trueif[]otherwisefalse 
 - Each compound sequence is a character followed by one or more Combining Marks.
 
 - Ignored (
"ignored") — characters that are ignored during normalization- Example: 
34F (�) COMBINING GRAPHEME JOINER 
 - Example: 
 - Mapped (
"mapped") — characters that are mapped to a sequence of valid characters- Example: 
41 (A) LATIN CAPITAL LETTER A→[61 (a) LATIN SMALL LETTER A] - Example: 
2165 (Ⅵ) ROMAN NUMERAL SIX→[76 (v) LATIN SMALL LETTER V, 69 (i) LATIN SMALL LETTER I] 
 - Example: 
 - Whole Confusable (
"wholes") — groups of characters that look similar"valid"— subset of confusable characters that are allowed- Example: 
34 (4) DIGIT FOUR 
- Example: 
 - Confused (
"confused") — subset of confusable characters that confuse- Example: 
13CE (Ꮞ) CHEROKEE LETTER SE 
 - Example: 
 
 - Fenced (
"fenced") — characters that cannot be first, last, or contiguous- Example: 
2044 (⁄) FRACTION SLASH 
 - Example: 
 - Emoji Sequence(s) (
"emoji") — valid emoji sequences- Example: 
👨💻 [1F468 200D 1F4BB] man technologist 
 - Example: 
 - Combining Marks / CM (
"cm") — characters that are Combining Marks - Non-spacing Marks / NSM (
"nsm") — valid subset of CM with general category ("Mn"or"Me") - Maximum NSM (
"nsm_max") — maximum sequence length of unique NSM - Should Escape (
"escape") — characters that shouldn't be printed - NFC Check (
"nfc_check") — valid subset of characters that may require NFC 
Description of nf.json
"decomp"— mapping from a composed character to a sequence of (partially)-decomposed charactersUnicodeData.txtwhereDecomposition_Mappingexists and does not have a formatting tag
"exclusions"— set of characters for which the"decomp"mapping is not applied when forming a composition"ranks"— sets of characters with increasingCanonical_Combining_ClassUnicodeData.txtgrouped byCanonical_Combining_Class- Class 
0is not included 
"qc"— set of characters with propertyNFC_QCof valueNorMDerivedNormalizationProps.txt- NFC Check (from 
spec.json) is a subset of this set 
Derivation
- IDNA 2003
UseSTD3ASCIIRulesistrueVerifyDnsLengthisfalseTransitional_Processingisfalse- The following deviations are valid:
DF (ß) LATIN SMALL LETTER SHARP S3C2 (ς) GREEK SMALL LETTER FINAL SIGMA
 CheckHyphensisfalse(WHATWG URL Spec § 3.3)CheckBidiisfalse- ContextJ:
200C (�) ZERO WIDTH NON-JOINER(ZWNJ) is disallowed everywhere.200D (�) ZERO WIDTH JOINER(ZWJ) is only allowed in emoji sequences.
 - ContextO:
B7 (·) MIDDLE DOTis disallowed.375 (͵) GREEK LOWER NUMERAL SIGNis disallowed.5F3 (׳) HEBREW PUNCTUATION GERESHand5F4 (״) HEBREW PUNCTUATION GERSHAYIMare Greek.30FB (・) KATAKANA MIDDLE DOTis Fenced and Han, Japanese, Korean, and Bopomofo.- Some Extended Arabic Numerals are mapped:
6F0 (۰)→660 (٠) ARABIC-INDIC DIGIT ZERO6F1 (۱)→661 (١) ARABIC-INDIC DIGIT ONE6F2 (۲)→662 (٢) ARABIC-INDIC DIGIT TWO6F3 (۳)→663 (٣) ARABIC-INDIC DIGIT THREE6F7 (۷)→667 (٧) ARABIC-INDIC DIGIT SEVEN6F8 (۸)→668 (٨) ARABIC-INDIC DIGIT EIGHT6F9 (۹)→669 (٩) ARABIC-INDIC DIGIT NINE
 
 
 - Punycode is not decoded.
 - The following ASCII characters are valid:
24 ($) DOLLAR SIGN5F (_) LOW LINEwith restrictions
 - Only label separator is 
2E (.) FULL STOP- No character maps to this character.
 - This simplifies name detection in unstructured text.
 - The following alternatives are disallowed:
3002 (。) IDEOGRAPHIC FULL STOPFF0E (.) FULLWIDTH FULL STOPFF61 (。) HALFWIDTH IDEOGRAPHIC FULL STOP
 
 - Many characters are disallowed for various reasons:
- Nearly all punctuation are disallowed.
- Example: 
589 (։) ARMENIAN FULL STOP 
 - Example: 
 - All parentheses and brackets are disallowed.
- Example: 
2997 (⦗) LEFT BLACK TORTOISE SHELL BRACKET 
 - Example: 
 - Nearly all vocalization annotations are disallowed.
- Example: 
294 (ʔ) LATIN LETTER GLOTTAL STOP 
 - Example: 
 - Obsolete, deprecated, and ancient characters are disallowed.
- Example: 
463 (ѣ) CYRILLIC SMALL LETTER YAT 
 - Example: 
 - Combining, modifying, reversed, flipped, turned, and partial variations are disallowed.
- Example: 
218A (↊) TURNED DIGIT TWO 
 - Example: 
 - When multiple weights of the same character exist, the variant closest to "heavy" is selected and the rest disallowed.
- Example: 
🞡🞢🞣🞤✚🞥🞦🞧→271A (✚) HEAVY GREEK CROSS - This occasionally selects an emoji.
- Example: ✔️ or 
2714 (✔︎) HEAVY CHECK MARKis selected instead of2713 (✓) CHECK MARK 
 - Example: ✔️ or 
 
 - Example: 
 - Many visually confusable characters are disallowed.
- Example: 
131 (ı) LATIN SMALL LETTER DOTLESS I 
 - Example: 
 - Many ligatures, n-graphs, and n-grams are disallowed.
- Example: 
A74F (ꝏ) LATIN SMALL LETTER OO 
 - Example: 
 - Many esoteric characters are disallowed.
- Example: 
2376 (⍶) APL FUNCTIONAL SYMBOL ALPHA UNDERBAR 
 - Example: 
 
 - Nearly all punctuation are disallowed.
 - Many hyphen-like characters are mapped to 
2D (-) HYPHEN-MINUS:2010 (‐) HYPHEN2011 (‑) NON-BREAKING HYPHEN2012 (‒) FIGURE DASH2013 (–) EN DASH2014 (—) EM DASH2015 (―) HORIZONTAL BAR2043 (⁃) HYPHEN BULLET2212 (−) MINUS SIGN23AF (⎯) HORIZONTAL LINE EXTENSION23E4 (⏤) STRAIGHTNESSFE58 (﹘) SMALL EM DASH2E3A (⸺) TWO-EM DASH→"--"2E3B (⸻) THREE-EM DASH→"---"
 - Characters are assigned to Groups according to Unicode Script_Extensions.
 - Groups may contain multiple scripts:
- Only Latin, Greek, Cyrillic, Han, Japanese, and Korean have access to Common characters.
 - Latin, Greek, Cyrillic, Han, Japanese, Korean, and Bopomofo only permit specific Combining Mark sequences.
 - Han, Japanese, and Korean  have access to 
a-z. - Restricted groups are always single-script.
 - Unicode augmented script sets
 
 - Scripts Braille, Linear A, Linear B, and Signwriting are disallowed.
 27 (') APOSTROPHEis mapped to2019 (’) RIGHT SINGLE QUOTATION MARKfor convenience.- Ethereum symbol (
39E (Ξ) GREEK CAPITAL LETTER XI) is case-folded and Common. - Emoji:
- All emoji are fully-qualified.
 - Digits (
0-9) are not emoji. - Emoji mapped to non-emoji by IDNA cannot be used as emoji.
 - Emoji disallowed by IDNA with default text-presentation are disabled:
203C (‼️) double exclamation mark2049 (⁉️) exclamation question mark
 - Remaining emoji characters are marked as disallowed (for text processing).
 - All 
RGI_Emoji_ZWJ_Sequenceare enabled. - All 
Emoji_Keycap_Sequenceare enabled. - All 
RGI_Emoji_Tag_Sequenceare enabled. - All 
RGI_Emoji_Modifier_Sequenceare enabled. - All 
RGI_Emoji_Flag_Sequenceare enabled. Basic_Emojiof the form[X FE0F]are enabled.- Emoji with default emoji-presentation are enabled as 
[X FE0F]. - Remaining single-character emoji are enabled as 
[X FE0F](explicit emoji-presentation). - All singular Skin-color Modifiers are disabled.
 - All singular Regional Indicators are disabled.
 - Blacklisted emoji are disabled.
 - Whitelisted emoji are enabled.
 
 - Confusables:
- Nearly all Unicode Confusables
 - Emoji are not confusable.
 - ASCII confusables are case-folded.
- Example: 
61 (a) LATIN SMALL LETTER Aconfuses with13AA (Ꭺ) CHEROKEE LETTER GO 
 - Example: 
 
 
Backwards Compatibility
- 99% of names are still valid.
 - Preserves as much Unicode IDNA and WHATWG URL compatibility as possible.
 - Only valid emoji sequences are permitted.
 
Security Considerations
- Unicode presentation may vary between applications and devices.
- Unicode text is ultimately subject to font-styling and display context.
 - Unsupported characters (
�) may appear unremarkable. - Normalized single-character emoji sequences do not retain their explicit emoji-presentation and may display with text or emoji presentation styling.
❤︎— text-presentation and default-color❤︎— text-presentation and green-color❤️— emoji-presentation and green-color
 - Unsupported emoji sequences with ZWJ may appear indistinguishable from those without ZWJ.
💩💩 [1F4A9 1F4A9]💩💩 [1F4A9 200D 1F4A9]→ error: Disallowed character
 
 - Names composed of labels with varying bidi properties may appear differently depending on context.
- Normalization does not enforce single-directional names.
 - Names may be composed of labels of different directions but normalized labels are never bidirectional.
- [LTR].[RTL] 
bahrain.مصر - [LTR+RTL] 
bahrainمصر→ error: Illegal mixture: Latin + Arabic 
 - [LTR].[RTL] 
 
 - Not all normalized names are visually unambiguous.
 - This ENSIP only addresses single-character confusables.
- There exist confusable multi-character sequences:
"ஶ்ரீ" [BB6 BCD BB0 BC0]"ஸ்ரீ" [BB8 BCD BB0 BC0]
 - There exist confusable emoji sequences:
🚴 [1F6B4]and🚴🏻 [1F6B4 1F3FB]🇺🇸 [1F1FA 1F1F8]and🇺🇲 [1F1FA 1F1F2]♥ [2665] BLACK HEART SUITand❤ [2764] HEAVY BLACK HEART
 
 - There exist confusable multi-character sequences:
 
Copyright
Copyright and related rights waived via CC0.
Appendix: Reference Specifications
- EIP-137: Ethereum Domain Name Service
 - ENSIP-1: ENS
 - UAX-15: Normalization Forms
 - UAX-24: Script Property
 - UAX-29: Text Segmentation
 - UAX-31: Identifier and Pattern Syntax
 - UTS-39: Security Mechanisms
 - UAX-44: Character Database
 - UTS-46: IDNA Compatibility Processing
 - UTS-51: Emoji
 - RFC-3492: Punycode
 - RFC-5891: IDNA: Protocol
 - RFC-5892: The Unicode Code Points and IDNA
 - Unicode CLDR
 - WHATWG URL: IDNA
 
Appendix: Additional Resources
- Supported Groups
 - Supported Emoji
 - Additional Disallowed Characters
 - Ignored Characters
 - Should Escape Characters
 
Appendix: Validation Tests
A list of validation tests are provided with the following interpretation:
- Already Normalized: 
{name: "a"}→normalize("a")is"a" - Need Normalization: 
{name: "A", norm: "a"}→normalize("A")is"a" - Expect Error: 
{name: "@", error: true}→normalize("@")throws 
Annex: Beautification
Follow algorithm, except:
- Do not strip 
FE0FfromEmojitokens. - Replace 
3BE (ξ) GREEK SMALL LETTER XIwith39E (Ξ) GREEK CAPITAL LETTER XIif the label isn't Greek. - Example: 
normalize("‐Ξ1️⃣") [2010 39E 31 FE0F 20E3]is"-ξ1⃣" [2D 3BE 31 20E3] - Example: 
beautify("-ξ1⃣") [2D 3BE 31 20E3]"is"-Ξ1️⃣" [2D 39E 31 FE0F 20E3]