← All tools
// Text

String Length online

Get the exact character and byte length of any string in multiple encodings

String Length Calculator logo
by
CHUNKY
MUNSTER
// Input string
0
Characters (JS)
0
UTF-8 bytes
0
UTF-16 bytes
0
Code points
โ€”
ASCII-safe?

How to Use the String Length Calculator

  1. Paste or type your text into the input field.
  2. Read the four counts: JS length, code points, UTF-8 bytes, graphemes.
  3. Try a compound emoji to see how the four counts differ.
  4. Use the UTF-8 byte count for database VARCHAR sizing.

There is no single answer to "how long is this string". Counting bytes, UTF-16 code units, Unicode code points, and user-perceived grapheme clusters gives four different numbers โ€” and which one you want depends on whether you're sizing a database column, validating a tweet, building a progress bar or counting emoji.

How the String Length Calculator Works

This calculator shows all four counts simultaneously, with worked examples for tricky cases like compound emoji and combining diacritics. The grapheme cluster count uses Intl.Segmenter where supported and falls back to a regex approximation otherwise โ€” close enough for ad-hoc work.

Frequently Asked Questions

Why does my emoji count as 2, 4 or even 7 characters?

It depends on what you mean by "character". Most emoji outside the BMP take 2 UTF-16 code units (.length = 2) and 4 UTF-8 bytes. Compound emoji like ๐Ÿ‘ฉโ€๐Ÿ’ป use a zero-width joiner sequence and have a UTF-16 length of 5 but a grapheme count of 1.

What's the difference between code points and characters?

A code point is a Unicode value (like U+1F600 ๐Ÿ˜€). A user-perceived character (a "grapheme cluster") may be made of several code points joined together โ€” the family emoji ๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ is one grapheme but seven code points.

Why are UTF-8 and UTF-16 byte counts different?

They're different encodings. UTF-8 uses 1โ€“4 bytes per code point and is efficient for ASCII. UTF-16 uses 2 or 4 bytes per code point and is efficient for CJK. For "cafรฉ" UTF-8 is 5 bytes, UTF-16 is 8 bytes.

Which length should I use for a database VARCHAR limit?

It depends on the column collation. PostgreSQL VARCHAR(n) counts code points; MySQL VARCHAR(n) with utf8mb4 counts code points but the row size limit is in bytes. Use the UTF-8 byte count for "will it fit in a fixed-size field?" questions.

Explore the full suite of Text tools and 290+ other free utilities at Chunky Munster.