You can get the respective Unicode code point of a character that lies in the BMP (Basic Multilingual Plane) by simply using the String.prototype.codePointAt()
method, for example, like so:
// ES6+ const codePoint = '©'.codePointAt(0); const hexCodePoint = codePoint.toString(16); console.log(codePoint); // 169 console.log(hexCodePoint); // a9
You can verify the result by using String.fromCodePoint()
, for example, in the following way:
// using decimal code point String.fromCodePoint(169); // '©' // using hex code point String.fromCodePoint(0xa9); // '©'
This also works for characters that are beyond the BMP (Basic Multilingual Plane). For example, consider a character with "surrogate pair":
// ES6+ const codePoint = '😍'.codePointAt(0); const hexCodePoint = codePoint.toString(16); console.log(codePoint); // 128525 console.log(hexCodePoint); // 1f60d
You can verify the result by using String.fromCodePoint()
, like so:
// using decimal code point String.fromCodePoint(128525); // '😍' // using hex code point String.fromCodePoint(0x1f60d); // '😍'
This works because when you use the String.prototype.codePointAt()
method on a character composed of UTF-16 high and low surrogates (i.e. a surrogate pair), the following values are returned based on the argument you supply to the method:
Argument | Return Value |
---|---|
Position of high surrogate (e.g. codePointAt(0) ) |
Code point of the surrogate pair |
Position of low surrogate (e.g. codePointAt(1) ) |
Code point of the low surrogate only |
Position having no element | undefined |
Furthermore, this even works with ZWJ (zero-width joiner sequences). However, you would need to loop over the elements of ZWJ sequence to get code points of each element. You can do so by using the for...of
loop or Array.prototype.forEach()
(or anything which correctly iterates UTF-16 surrogates), and use codePointAt(0)
to get the code point of each element.
Consider, for example, ZWJ emoji "👨👩👧👦" ("family: man, woman, girl, boy") which is a combination of "👨 👩 👧 👦" (i.e. U+1F468
U+200D
U+1F469
U+200D
U+1F467
U+200D
U+1F466
). You can get the code points for each element in this ZWJ sequence in the following way:
// ES6+ const codePoints = []; const hexCodePoints = []; for (const element of '👨👩👧👦') { const codePoint = element.codePointAt(0); codePoints.push(codePoint); hexCodePoints.push(codePoint.toString(16)); } console.log(codePoints); // [128104, 8205, 128105, 8205, 128103, 8205, 128102] console.log(hexCodePoints); // ['1f468', '200d', '1f469', '200d', '1f467', '200d', '1f466']
You can verify the result by using String.fromCodePoint()
, like so:
// using decimal code points String.fromCodePoint(128104, 8205, 128105, 8205, 128103, 8205, 128102); // '👨👩👧👦' // using hex code points String.fromCodePoint(0x1f468, 0x200d, 0x1f469, 0x200d, 0x1f467, 0x200d, 0x1f466); // '👨👩👧👦'
This post was published by Daniyal Hamid. Daniyal currently works as the Head of Engineering in Germany and has 20+ years of experience in software engineering, design and marketing. Please show your love and support by sharing this post.