There is no single answer to "how long is this string". Counting bytes, UTF-16 code units, Unicode code points, and user-perceived grapheme clusters gives four different numbers โ and which one you want depends on whether you're sizing a database column, validating a tweet, building a progress bar or counting emoji.
This calculator shows all four counts simultaneously, with worked examples for tricky cases like compound emoji and combining diacritics. The grapheme cluster count uses Intl.Segmenter where supported and falls back to a regex approximation otherwise โ close enough for ad-hoc work.
It depends on what you mean by "character". Most emoji outside the BMP take 2 UTF-16 code units (.length = 2) and 4 UTF-8 bytes. Compound emoji like ๐ฉโ๐ป use a zero-width joiner sequence and have a UTF-16 length of 5 but a grapheme count of 1.
A code point is a Unicode value (like U+1F600 ๐). A user-perceived character (a "grapheme cluster") may be made of several code points joined together โ the family emoji ๐จโ๐ฉโ๐งโ๐ฆ is one grapheme but seven code points.
They're different encodings. UTF-8 uses 1โ4 bytes per code point and is efficient for ASCII. UTF-16 uses 2 or 4 bytes per code point and is efficient for CJK. For "cafรฉ" UTF-8 is 5 bytes, UTF-16 is 8 bytes.
It depends on the column collation. PostgreSQL VARCHAR(n) counts code points; MySQL VARCHAR(n) with utf8mb4 counts code points but the row size limit is in bytes. Use the UTF-8 byte count for "will it fit in a fixed-size field?" questions.
Explore the full suite of Text tools and 290+ other free utilities at Chunky Munster.