How Base64 Text Encoding Works
> UNDERSTANDING TEXT ENCODING | TECHNICAL GUIDE
What is Base64 Encoding?
Base64 is a binary-to-text encoding scheme that converts binary data into an ASCII string format. It's commonly used to encode data that needs to be transmitted over media designed to handle text, such as email systems, URLs, and JSON/XML documents.
How Does Base64 Encoding Work?
Base64 encoding converts binary data into a set of 64 ASCII characters (A-Z, a-z, 0-9, +, /). The encoding process follows these steps:
- Convert to Binary: Each character is converted to its binary representation (8 bits)
- Group into 6-bit Chunks: The binary stream is divided into 6-bit groups
- Map to Base64 Characters: Each 6-bit group (0-63) maps to one of 64 ASCII characters
- Add Padding: If needed, '=' characters are added to make the output length a multiple of 4
Character Encodings: ASCII vs UTF-8 vs UTF-16
ASCII Encoding
ASCII (American Standard Code for Information Interchange) uses 7 bits to represent 128 characters, including English letters, digits, and common symbols. It's the most basic encoding and produces the smallest Base64 output for English text.
UTF-8 Encoding
UTF-8 is a variable-length encoding that uses 1-4 bytes per character. ASCII characters (0-127) use just 1 byte, making UTF-8 efficient for English text while supporting all Unicode characters (emojis, international characters, etc.). This is the most common encoding for modern web applications.
UTF-16 (Little Endian & Big Endian)
UTF-16 uses 2 or 4 bytes per character. The byte order (endianness) determines how multi-byte characters are stored:
- UTF-16 LE (Little Endian): Least significant byte first (common on Windows)
- UTF-16 BE (Big Endian): Most significant byte first (network byte order)
UTF-16 produces larger Base64 output than UTF-8 for ASCII/English text but may be more efficient for Asian languages.
Example: Encoding "Hello"
Original Text: Hello
Binary (ASCII): 01001000 01100101 01101100 01101100 01101111
6-bit Groups: 010010 | 000110 | 010101 | 101100 | 011011 | 000110 | 1111
Base64 Output: SGVsbG8=
Common Use Cases
- Embedding images and files in HTML/CSS (data URIs)
- Sending binary data in JSON or XML
- Email attachments (MIME encoding)
- Storing complex data in URLs or cookies
- Authentication tokens (JWT, Basic Auth)
- Encoding cryptographic keys and certificates
Why Base64 Increases Size
Base64 encoding increases data size by approximately 33%. This happens because we're converting 8-bit bytes into 6-bit chunks, requiring more characters to represent the same information. For every 3 bytes of input, Base64 produces 4 characters of output.