James Allman | JA Technology Solutions LLC
Packed Decimal and COMP-3: Reading Mainframe Numbers Correctly
The numeric fields in an IBM i or mainframe flat file look like garbage until you understand that two digits share every byte, the decimal point is nowhere in the file, and the sign lives in the last nibble. Get any of those three things wrong and your ETL produces numbers that are plausible enough to pass a casual inspection.
When you extract a flat file from an IBM i or mainframe system and open it in a text editor, the character fields look mostly readable. The numeric fields look like random binary noise: bytes with high bits set, values that shift as you scroll, nothing that resembles a price or a quantity. That is not a file transfer error. It is packed decimal, and it is working exactly as designed.
This article pairs with my earlier piece on EBCDIC and ASCII conversion, which covers the character encoding side of mainframe data. Packed decimal is the other half: the numeric encoding that ETL developers, integration engineers, and migration teams must handle separately, with different rules, after the character fields are converted. Get it wrong and your pipeline produces numbers that are off by powers of ten, have flipped signs, or are simply corrupted.
I have worked with IBM i and mainframe data for over 35 years. The numeric encoding mistakes I see are nearly always the same few, and they are almost always avoidable once you understand how the formats work.
What packed decimal is
Packed decimal encodes each decimal digit into a single nibble (4 bits, half a byte). Two digits fit per byte, which is why the format was created: storage was expensive on IBM System/360 hardware in the 1960s, and halving the bytes consumed by numeric data was worth the encoding complexity. The last nibble in the field is not a digit. It holds the sign: C (hex) or F (hex) for positive or unsigned, D (hex) for negative. The value +12345 fits in three bytes as hex 12 34 5C, where the trailing C marks it positive.
The size formula is: bytes = ceil((digits + 1) / 2). A COBOL COMP-3 field declared as PIC S9(7) holds seven digits and needs 4 bytes, ceil(8/2). A PIC S9(9) field needs 5 bytes. The sign nibble always occupies the lower half of the last byte, so there is always at least one byte even for a one-digit field. On IBM i, the RPG equivalent declares a packed field with its digit count and decimal positions directly in the file's DDS or in the data structure definition.
In COBOL terminology this is COMP-3 or COMPUTATIONAL-3. IBM i RPG calls it PACKED or simply the default numeric type in DDS. The bit pattern is identical across both platforms. A five-digit COMP-3 field on a z/OS mainframe and a five-digit packed field on an IBM i will have the same three-byte layout for the same value.
The implied decimal point: scale is not in the file
The single most damaging thing to get wrong with packed decimal is scale. The decimal point is not stored anywhere in the bytes. Two fields can contain the identical bytes 0x12345C and represent entirely different values depending on the copybook or DDS definition: one field might be PIC S9(5) COMP-3 meaning the integer 12345, while another field might be PIC S9(3)V99 COMP-3 meaning the value 123.45, with two implied decimal places.
That V in the COBOL PICTURE clause denotes the implied decimal position. It takes up no storage; it tells the runtime where to assume the decimal point falls when performing arithmetic or formatting output. If you parse a five-digit packed field as an integer and it should have two decimal places, every monetary value in your output is one hundred times too large. If your downstream system accepts the data without a range check, the error propagates silently.
Scale information lives only in the source schema: the COBOL copybook, the RPG data structure, or the IBM i DDS field definition. A file with no accompanying layout document is genuinely ambiguous for scale on every numeric field. Getting the scale right requires obtaining and reading the schema, not inspecting the data. The COBOL Copybook Explorer parses COBOL PIC clauses and shows the byte positions, digit counts, and implied decimal positions for every field in a record layout.
Zoned decimal: one digit per byte, sign in the last zone
Zoned decimal, called DISPLAY in COBOL and zoned in RPG, stores one digit per byte. The high nibble of each byte is the zone nibble, normally F (hex) in EBCDIC, and the low nibble holds the digit value 0 through 9. The value 12345 in EBCDIC zoned decimal is the bytes F1 F2 F3 F4 F5. So far this looks almost like the EBCDIC character encoding of the digit characters, which is why zoned decimal is sometimes described as looking like text.
The sign breaks that resemblance. The zone nibble of the last byte carries the sign using an overpunch convention inherited from punch card encoding: C for positive, D for negative. A positive 12345 ends in 0xC5, which in EBCDIC corresponds to the letter E. A negative 12345 ends in 0xD5, which corresponds to the letter N. If you apply a naive EBCDIC-to-ASCII conversion to a zoned decimal field and the number happens to be positive, the last byte becomes a letter in your output. If it is negative, a different letter. Either way, the resulting string is not a number.
After proper sign decode and stripping the zone nibbles, the result is the same integer you would get from a packed decimal field with the same digits. The choice between packed and zoned on a given field is a declaration in the schema, not a property of the data itself. Many IBM i files mix both: character fields in EBCDIC, key fields in packed decimal, display fields in zoned decimal, all in the same fixed-width record.
Why packed decimal exists for financial data
Storage cost drove the original design, but the format survived for a more important reason: packed decimal arithmetic is exact for decimal values. Binary floating-point formats like IEEE 754 cannot represent values such as 0.10 exactly. They approximate. For currency arithmetic, that approximation means that adding one hundred transactions each costing $0.10 can produce $9.999999... instead of $10.00, depending on the accumulation order. IBM's hardware performs decimal arithmetic directly on packed fields, bypassing that class of error entirely.
This matters in ETL and migration work. When you extract a packed decimal value from an IBM i or mainframe system and convert it to a double-precision float in your target system, you are introducing a representation error that did not exist in the source. For reporting or analytics this is usually acceptable. For financial reconciliation, accounts payable, or any calculation where cents must balance to the penny, it is not. The right target type is an exact decimal: DECIMAL or NUMERIC in SQL, BigDecimal in Java, decimal in Python, or an integer count of the smallest currency unit.
The Packed Decimal / COMP-3 Converter decodes individual packed and zoned values in your browser, showing the nibble layout and the decoded numeric result. It is useful for verifying your parse logic against known test values before processing a full file.
How packed decimal fields sit inside a fixed-width record
A typical IBM i or mainframe record layout mixes character fields, packed fields, and zoned fields at known byte offsets, with no delimiters between them. A transaction record might have a 10-byte customer number in EBCDIC characters at bytes 1-10, a 4-byte packed date in CYYMMDD format at bytes 11-14, a 5-byte packed amount in dollars with two implied decimal places at bytes 15-19, and a 1-byte status code in EBCDIC at byte 20. None of those boundaries are visible in the raw bytes.
The byte counts for packed fields follow the formula above. A 7-digit packed field occupies 4 bytes. A 9-digit packed field occupies 5 bytes. If you mistake the field boundaries, every field after the error is shifted by the difference. The miscount is silent: the parser reads the wrong bytes, interprets them according to the wrong type, and produces a result that may look numerically plausible for some records while being entirely wrong.
Reading the schema before writing any code is not optional. For COBOL systems, the copybook defines every field with its name, PIC clause, USAGE (COMP-3, DISPLAY, or other), and optionally REDEFINES clauses for variant records. For IBM i, the DDS source or SQL DDL specifies the field type, length, and decimal positions. The COBOL Copybook Explorer parses COBOL copybooks and displays the computed byte offset, byte length, digit count, and scale for every field, which removes the arithmetic from the parsing setup. For IBM i files, the Fixed-Width to CSV Converter applies a schema you supply to slice and label the fields.
Failure modes: what goes wrong
The most common mistake is applying EBCDIC-to-ASCII conversion to the entire record, including packed decimal fields. The conversion routine does not know which bytes are text and which are numeric. It maps every byte through the EBCDIC character table, which produces completely wrong values for packed bytes. The result looks like garbled text: characters with high-bit codes, unprintable bytes, symbols. Teams that hit this usually recognize it immediately because the data is obviously wrong. But if the record contains only a few packed fields, the character fields may look correct while the numeric fields are silently corrupted.
A subtler failure: treating a zoned decimal field as text. In EBCDIC, the digit bytes F1 through F9 are the characters 1 through 9. So a positive zoned decimal number looks exactly like EBCDIC text until you hit the last byte, which carries the overpunch sign. After EBCDIC-to-ASCII conversion, the first four bytes of a five-digit positive zoned field come out as the characters 1 through 4. The last byte comes out as a letter, E for positive or N for negative. The field looks like a number with a trailing letter, which some systems will silently drop (treating it as a positive integer), and others will reject as non-numeric.
Scale errors are the quietest failure mode. If your code decodes the nibbles correctly but applies no implied decimal conversion, a five-digit packed field for a dollar amount with two decimal places comes out as an integer 100 times too large. Because the magnitude is consistent across all records, downstream systems often accept the values. Reports show dollar amounts in the hundreds of millions for transactions that should be in the millions. If nobody is validating against an expected range, the error can persist for a long time. This is one of those things I check specifically when reviewing someone's mainframe extraction code.
Reading the schema, not the data
The practical workflow for consuming packed decimal data from IBM i or mainframe systems is straightforward, but it must start with the schema. Before touching the binary file, obtain the COBOL copybook, the RPG DDS source, or whatever record layout document the source team can provide. If no layout exists, ask the IBM i or mainframe administrator to generate one: IBM i's DSPFFD command produces a detailed field description for any database file, and COBOL copybooks for flat file extracts almost always exist somewhere in the source library.
Once you have the layout, annotate every field with its type (EBCDIC text, packed decimal, zoned decimal, binary integer, or date), its byte offset, its byte length, its digit count if numeric, and its scale. A packed decimal field declared as PIC S9(7)V99 COMP-3 is 5 bytes, 9 total digits, 2 decimal places. Apply the EBCDIC-to-ASCII conversion only to the text fields. Apply packed decimal decoding only to the COMP-3 fields. Apply zoned decimal decoding only to the DISPLAY numeric fields. Maintain the scale throughout the conversion and store it in a target type that preserves it, typically a SQL DECIMAL column with the matching precision and scale.
After the first conversion run, validate against known test values. Pick a set of records where you know what the values should be, ideally a parallel extract from the source system in a human-readable format, and compare field by field. Pay particular attention to negative values (sign nibble D rather than C), zero values (the sign nibble for zero is typically C or F depending on the source, not D), and fields near record boundaries where offset errors accumulate. My EBCDIC conversion article covers the parallel process for character fields and the file-transfer considerations that affect both.
Tools that help with inspection and debugging
The Packed Decimal / COMP-3 Converter decodes individual packed and zoned values from hex input and shows the nibble-by-nibble breakdown, the sign, and the decoded decimal value. It also encodes a decimal number into packed and zoned hex, which is useful for building test cases and validating your encoding logic. You can specify the number of decimal places to apply the scale correctly. It runs entirely in your browser.
For field layout, the COBOL Copybook Explorer parses a COBOL copybook and computes byte offsets, byte lengths, and decimal positions for every field in the record, including COMP-3 and DISPLAY numeric fields. It handles REDEFINES clauses and level-group nesting, which is where most manual offset calculations go wrong.
For character encoding questions on the same file, the EBCDIC to ASCII Converter handles the text fields, and the character encoding reference shows the byte-value mapping for the common EBCDIC code pages side by side.
These tools handle individual values and small test cases well. Production files, typically megabytes to gigabytes of fixed-width binary records with multiple field types, require purpose-built parsing code. The tools are most useful during the design and debugging phases, for validating your offset calculations and checking edge cases before the pipeline runs against live data. If you are dealing with a complex extraction where the tools are not enough, Ask James to describe what you are working with. That is usually how I get started on this kind of engagement.
Getting it into production
A production extraction pipeline for IBM i or mainframe flat files is not a character encoding problem. It is a record parsing problem that happens to include character encoding as one of several conversion steps. The pipeline needs to: read fixed-length records (or variable-length records if an RDW prefix is involved), apply the correct conversion to each field by type, maintain implied decimal scale throughout, handle multiple record types if the file has a type indicator byte, and validate the output before loading.
I build these pipelines as part of IBM i data migration, ETL development, and system integration work. The setup work, reading the layout, computing offsets, writing and testing the field-level conversions, is not especially difficult once you have done it before. The pitfalls are almost entirely in the details: an REDEFINES clause that means a field occupies the same bytes as another, a sign convention that differs between two subsystems on the same platform, a date field that needs a different calculation depending on its format code. None of those are invisible to someone who has worked with these systems regularly.
If the work is a one-time migration, a recurring integration, or just a file you need decoded quickly, see the IBM i consulting page for what that work typically looks like, or Ask James to describe your file and what you need out of it.