How big is a UTF-8 character?
Matthew Perez
Updated on April 02, 2026
How big is a UTF-8 character?
1 to 4 bytes
UTF-8 is based on 8-bit code units. Each character is encoded as 1 to 4 bytes. The first 128 Unicode code points are encoded as 1 byte in UTF-8.
How many characters can UTF-8 represent?
UTF-8 is capable of encoding all 1,112,064 valid character code points in Unicode using one to four one-byte (8-bit) code units. Code points with lower numerical values, which tend to occur more frequently, are encoded using fewer bytes.
How many bytes does it take to store a UTF-8 character?
4 bytes
UTF-8 is based on 8-bit code units. Each character is encoded as 1 to 4 bytes. The first 128 Unicode code points are encoded as 1 byte in UTF-8. These code points are the same as those in ASCII CCSID 367.
Does UTF-8 only use 128 values?
UTF-8 uses 1-4 bytes per character: one byte for ascii characters (the first 128 unicode values are the same as ascii). But that only requires 7 bits.
Can UTF-8 handle Chinese characters?
2 Answers. UTF-8 and UTF-16 encode exactly the same set of characters. It’s not that UTF-8 doesn’t cover Chinese characters and UTF-16 does.
How big is a single character?
An ASCII character in 8-bit ASCII encoding is 8 bits (1 byte), though it can fit in 7 bits. An ISO-8895-1 character in ISO-8859-1 encoding is 8 bits (1 byte). A Unicode character in UTF-8 encoding is between 8 bits (1 byte) and 32 bits (4 bytes).
How many characters is 2 bytes?
1 byte size of 8 bits can hold a single 8 bit character, hence 2 bytes can hold two 8 bit characters.
Does Unicode always have 2 bytes?
Unicode does not mean 2 bytes. Unicode defines code points that can be stored in many different ways (UCS-2, UTF-8, UTF-7, etc.). Encodings vary in simplicity and efficiency. Unicode has more than 65,535 (16 bits) worth of characters.
Is Arabic a UTF-8?
UTF-8 can store the full Unicode range, so it’s fine to use for Arabic.