How Base64 Text Encoding Works

> UNDERSTANDING TEXT ENCODING | TECHNICAL GUIDE

What is Base64 Encoding?

Base64 is a binary-to-text encoding scheme that converts binary data into an ASCII string format. It's commonly used to encode data that needs to be transmitted over media designed to handle text, such as email systems, URLs, and JSON/XML documents.

How Does Base64 Encoding Work?

Base64 encoding converts binary data into a set of 64 ASCII characters (A-Z, a-z, 0-9, +, /). The encoding process follows these steps:

  1. Convert to Binary: Each character is converted to its binary representation (8 bits)
  2. Group into 6-bit Chunks: The binary stream is divided into 6-bit groups
  3. Map to Base64 Characters: Each 6-bit group (0-63) maps to one of 64 ASCII characters
  4. Add Padding: If needed, '=' characters are added to make the output length a multiple of 4

Character Encodings: ASCII vs UTF-8 vs UTF-16

ASCII Encoding

ASCII (American Standard Code for Information Interchange) uses 7 bits to represent 128 characters, including English letters, digits, and common symbols. It's the most basic encoding and produces the smallest Base64 output for English text.

UTF-8 Encoding

UTF-8 is a variable-length encoding that uses 1-4 bytes per character. ASCII characters (0-127) use just 1 byte, making UTF-8 efficient for English text while supporting all Unicode characters (emojis, international characters, etc.). This is the most common encoding for modern web applications.

UTF-16 (Little Endian & Big Endian)

UTF-16 uses 2 or 4 bytes per character. The byte order (endianness) determines how multi-byte characters are stored:

  • UTF-16 LE (Little Endian): Least significant byte first (common on Windows)
  • UTF-16 BE (Big Endian): Most significant byte first (network byte order)

UTF-16 produces larger Base64 output than UTF-8 for ASCII/English text but may be more efficient for Asian languages.

Example: Encoding "Hello"

Original Text: Hello

Binary (ASCII): 01001000 01100101 01101100 01101100 01101111

6-bit Groups: 010010 | 000110 | 010101 | 101100 | 011011 | 000110 | 1111

Base64 Output: SGVsbG8=

Common Use Cases

  • Embedding images and files in HTML/CSS (data URIs)
  • Sending binary data in JSON or XML
  • Email attachments (MIME encoding)
  • Storing complex data in URLs or cookies
  • Authentication tokens (JWT, Basic Auth)
  • Encoding cryptographic keys and certificates

Why Base64 Increases Size

Base64 encoding increases data size by approximately 33%. This happens because we're converting 8-bit bytes into 6-bit chunks, requiring more characters to represent the same information. For every 3 bytes of input, Base64 produces 4 characters of output.