James Allman | JA Technology Solutions LLC

Data Migration Testing: Why Record-Level Comparison Matters

Row counts match. Reports look right. But until you compare every record field by field, you don't know what the migration actually did to your data.

Migration Is Transformation, Not Just Movement

Data migration sounds straightforward: move data from one system to another. In practice, it is one of the highest-risk activities in any technology project. You are not copying files between folders. You are transforming data across schemas, encodings, platforms, and business rules, all while preserving the meaning that your organization depends on every day.

A migration from an IBM i system to a cloud-based ERP, for example, involves converting EBCDIC-encoded fixed-width records into UTF-8 relational tables. Date fields stored as seven-digit numeric values become ISO date strings. Packed decimal fields become floating-point or scaled integers. Every one of those transformations is an opportunity for data to lose fidelity.

The stakes are real. Incorrect account balances, missing transaction history, broken customer relationships, compliance gaps: these are the consequences teams discover weeks or months after go-live when someone finally looks closely enough at the data.

Common Migration Scenarios

Data migration comes in many forms, but the testing challenges are remarkably consistent. Platform migrations move data between fundamentally different environments. Database migrations shift data between database engines, where data type mappings and SQL dialect differences introduce subtle conversion issues.

ERP replacements are among the most complex because they involve not just moving data but remapping it to an entirely different data model. System consolidation projects add deduplication and conflict resolution to the list of things that can go wrong.

In every one of these scenarios, the source and target systems represent data differently. The question is never whether transformation is needed. It always is. The question is whether the transformation preserved what the data actually means.

What Can Go Wrong

The failure modes in data migration are varied and often invisible at first glance. Truncation is one of the most common: a 50-character name field mapped to a 40-character field in the target quietly clips data without raising an error. Encoding issues turn accented characters into question marks. Decimal precision changes round financial values in ways that pass casual inspection but fail an audit.

Referential integrity breaks are particularly dangerous. If parent records migrate before child records, or if key mappings are inconsistent, you end up with orphaned records. Business logic embedded in the migration can introduce errors that look like valid data.

Data loss through filtering is another silent failure. Migration scripts often include WHERE clauses that exclude records the developer assumed were unnecessary. Months later, someone searches for a historical transaction and it simply does not exist.

Why "It Loaded Successfully" Is Not Validation

The most dangerous moment in a data migration is when someone announces that all records loaded without errors. Row counts match. The ETL job completed with a zero return code. The application starts up and displays data. Everyone exhales. But none of that means the data is correct.

Row counts confirm quantity, not quality. A table can have exactly 500,000 rows in both source and target while every single row contains at least one incorrect field value. A date that shifted by one day, a currency amount that lost two decimal places, a status code that mapped to the wrong value: none of these produce load errors.

Summary-level checks are useful as a first pass. They catch catastrophic failures. But they are blunt instruments. The only way to know the migration actually worked is to compare records individually.

Record-Level Comparison Explained

Compare every record, every field: only fixed when each row reconciles.

Record-level comparison is exactly what it sounds like: matching each record in the source to its corresponding record in the target using key fields, then comparing every mapped field value between the two.

The key word is "expected transformed value." You are not checking that the target is identical to the source. You are checking that the transformation rules were applied correctly. If the migration converts dates from CYYMMDD format to ISO 8601, then source value 1260115 should become 2026-01-15 in the target.

This approach produces three categories: matches, mismatches, and missing records. Matches confirm the migration worked. Mismatches identify exactly which fields differ. Missing records reveal data that exists in one system but not the other.

What Good Migration Testing Looks Like

Effective migration testing starts with source-to-target reconciliation at multiple levels. High-level checks confirm row counts and aggregates. Record-level comparison validates individual rows. Field-level diff identifies exactly which columns differ. Profiling the source data before migration (examining column types, null rates, unique values, and outlier patterns) establishes a baseline that post-migration profiling can be compared against. The free Data Profiler on this site produces this kind of column-level analysis for CSV and TSV data.

Exception reporting is what turns raw comparison results into actionable information. A report showing that 12% of address records have truncated city names tells you something specific and fixable. A report showing 500,000 rows with 60,000 exceptions tells you very little without further breakdown.

Traceability matters too. When a mismatch is found, you need to trace it back through the migration pipeline to the specific transformation step that introduced the error.

The Role of Automation

You cannot manually compare 500,000 records. Migration testing must be automated because migrations are iterative. You will run the migration multiple times during development, each time fixing issues found in the previous run. Every iteration needs the same comprehensive comparison.

Automated comparison tools read source and target extracts, match records by key, compare all mapped fields, and generate exception reports, typically in minutes for datasets that would take a human team weeks.

The tooling does not need to be expensive. Purpose-built comparison scripts, open-source diff utilities, and lightweight web-based tools can handle many scenarios effectively.

Encoding and Format Traps

Platform migrations between IBM i and distributed systems are especially prone to encoding and format issues. EBCDIC to ASCII conversion affects every text field. Packed decimal fields must be unpacked and converted without losing precision.

Fixed-width record formats must be parsed into discrete fields before loading into relational tables. A single misaligned byte offset corrupts every field after the error, and the corruption often looks like plausible data.

Decimal handling deserves particular attention. A packed decimal field defined as (9,2) on IBM i holds values with exactly two decimal places. If the target field is floating-point, you may introduce rounding artifacts that produce material discrepancies across millions of records.

Tools That Help

I have built several free browser-based tools for migration testing scenarios. The Data Diff tool performs field-by-field comparison of two datasets, matching records by key columns. The EBCDIC Converter handles encoding translation for IBM i migrations. The Fixed-Width Converter parses positional record formats. The CSV & Delimited Text Converter normalizes extracts from different systems (including those using non-standard separators like 0xFE) before comparison.

These tools run entirely in the browser. No data is uploaded to any server, which matters when working with production data during migration testing.

When to Bring in Help

Migration projects often start with confidence and end with anxiety. The data is more complex than expected. The transformation rules have more edge cases than anticipated. The testing timeline is compressed.

With over 35 years of experience across IBM i, distributed systems, and modern cloud platforms, I have designed and executed data migrations where record-level comparison was the standard. Whether you need help building a migration testing framework, validating a migration in progress, or planning one that has not started yet, I can help you get it right.

Ask James to start a conversation about your migration project.

Have thoughts on this article? Share them.

Try the free Data Diff tool: compare two datasets field by field with key-field joining, right in your browser. Browse free tools →

Contact

Ready to discuss your environment?

If any of the topics in this article apply to your organization, I would be glad to have that conversation.

Call +1-888-882-8735 toll-free or +1-512-870-8510 direct.

Or send a message, schedule a call, or ask Ask James.

More Insights