Skip to content

James Allman | JA Technology Solutions LLC

Character Encoding Reference

Compare ASCII, Windows-1252, ISO-8859-1, and EBCDIC CP037 in a searchable 0–255 table. Look up any character or code point, see its byte representation in every encoding and UTF-8/16/32, and transcode between encodings. Decode legacy DBCS bytes (Shift-JIS, GBK, Big5, EUC-JP, EUC-KR).

Character Encoding Reference

Three-tab reference tool for character encoding work. The Reference table shows all 256 bytes across ASCII, Windows-1252 (CP1252), ISO-8859-1/Latin-1, and EBCDIC CP037 side by side, with control character names and a filter box. Character lookup accepts a character, a U+XXXX code point, or a decimal number and returns its byte representation in every supported encoding plus UTF-8, UTF-16 BE, and UTF-32 BE byte sequences with HTML entity. Transcode converts text or hex bytes between any two encodings. DBCS decode uses the browser TextDecoder to convert Shift-JIS, GBK, Big5, EUC-JP, or EUC-KR hex bytes to Unicode text. All processing is client-side.
Learn more ↓

Loading interactive explorer...

Before you use this output: EBCDIC support covers Code Page 037 (CP037) only. DBCS (Shift-JIS, GBK, Big5, EUC-JP, EUC-KR) is decode-only in this version; encoding back to DBCS is not supported. For production encoding conversion pipelines, get in touch.

Why Character Encoding Matters

Every text file and database field stores bytes, not characters. The encoding is the contract that says which byte value represents which character. When two systems use different encodings and nobody tells them, the bytes arrive intact but the characters come out wrong: an accented letter becomes a question mark, a currency symbol becomes garbage, or a Windows document opens as a wall of mojibake. These errors are not random. Each encoding has a fixed, deterministic table, and the right reference lets you trace exactly what went wrong.

ASCII, Code Pages, and Where They Diverge

ASCII covers bytes 0x00 through 0x7F, and every major encoding agrees on that range. Bytes 0x80 and above are where the disagreements start. ISO-8859-1 (Latin-1) maps those bytes to the same code point values, giving you 256 defined characters. Windows-1252 takes a different path: it reuses the range 0x80 through 0x9F for 27 additional characters (the Euro sign, smart quotes, em dash, trademark, and similar), leaving 5 positions permanently undefined. A file labeled ISO-8859-1 that actually contains Windows-1252 bytes looks fine for most Latin characters but shows garbage or question marks wherever those 0x80–0x9F bytes appear.

EBCDIC: IBM's Separate Track

IBM mainframes and IBM i systems use EBCDIC, which assigns completely different byte values to the same characters. The letter “A” is 0x41 in ASCII but 0xC1 in EBCDIC Code Page 037 (the US/Canada variant). If you read an IBM i flat file in binary mode without converting, every alphabetic character looks like a high-byte value, and the ASCII printable range (0x20–0x7E) maps to control characters and punctuation. The EBCDIC to ASCII Converter handles full file conversion; this reference tool lets you trace individual byte values and understand exactly what each byte means in each encoding.

UTF-8, UTF-16, and UTF-32

Unicode assigns a code point to every character in every writing system, but a code point is not a byte sequence. UTF-8 encodes code points as 1 to 4 bytes depending on their value; it is backward-compatible with ASCII for the 0x00–0x7F range. UTF-16 uses 2 bytes for most characters and 4 bytes (a surrogate pair) for anything above U+FFFF. UTF-32 always uses exactly 4 bytes. The Character lookup tab shows all three byte sequences for any character, which is useful when debugging protocol-level encoding mismatches or validating data pipeline output. For a deeper look at the EBCDIC side of this, the article EBCDIC and ASCII: A Practical Guide to Mainframe Data Conversion walks through the most common conversion scenarios.

Data Conversion Pipelines

This reference covers character-level inspection and simple transcoding. Production data migration from IBM i or mainframe systems involves character encoding as one layer inside a larger problem: selecting the right encoding per field, decoding packed decimal and binary fields that must not be character-converted, handling field-level exceptions, and validating output before it enters the target system. I build integration and ETL pipelines that handle these concerns field by field, and IBM i consulting for teams that need hands-on help navigating mainframe data formats.

All tools run entirely in your browser. Your data never leaves your machine. Need help? Ask James.