James Allman | JA Technology Solutions LLC

ETL: The Invisible Backbone of Enterprise Data

How data actually moves between the systems your business depends on, and what goes wrong when it is not done well.

Every time your merchandising system sends pricing updates to your POS terminals, every time your warehouse receives an electronic purchase order, every time your finance team runs a reconciliation report that pulls data from three different platforms, there is an ETL process behind it. Extract, Transform, Load. Data is pulled from a source, reshaped to fit the destination, and delivered where it needs to go.

ETL is one of the most critical and least visible functions in enterprise computing. When it works, nobody notices. When it fails, operations stop, reports are wrong, invoices don't match, and people scramble to figure out what happened to the data.

This article is written for business decision-makers who want to understand what ETL is, why it matters, when it is needed, and what to look for in the people and processes responsible for it.

What ETL Actually Means

Extract, transform, load: three stages, each with their own failure modes.

ETL stands for Extract, Transform, Load. Each step is distinct and has its own challenges.

Extract is pulling data out of a source system. That source might be a DB2 for i database on an IBM i system, a PostgreSQL database behind a web application, flat files from an EDI trading partner, spreadsheets from a vendor, or an API response from an external platform. The extraction step must understand the source data format, handle access securely, and deal with the realities of production systems that are in use while the extraction is happening.

Transform is reshaping the data to fit the destination. This is where most of the complexity lives. Field names change. Data types differ. Business rules must be applied. A product code in one system may not match the product code in another. Dates are formatted differently. Currency values need conversion. Records that exist in the source may not have a corresponding entry in the destination, and vice versa. Transformation logic often encodes years of accumulated business knowledge about how data from one system maps to another.

Load is delivering the transformed data to the destination system. This might mean inserting rows into a database, writing formatted files for another system to pick up, posting records through an API, or updating existing records in place. The load step must handle errors gracefully, support rollback when something goes wrong, and often needs to run without disrupting the destination system's normal operations.

Two Kinds of ETL

One-time migration ends. An ongoing pipeline keeps running: different lifecycle, different design.

ETL work falls into two broad categories, and the difference matters significantly for how it should be planned, built, and managed.

One-time ETL is data migration. You are moving data from an old system to a new one, converting formats, mapping fields, validating that nothing was lost or corrupted, and loading the result into the target platform. This happens during system replacements, platform migrations, database conversions, and consolidation projects. Once the migration is complete and validated, the ETL process is retired.

Permanent ETL is integration infrastructure. These are processes that run on a schedule or in real time, continuously moving data between systems that must stay synchronized. Examples include nightly inventory updates from a merchandising system to a warehouse, real-time transaction feeds from POS to finance, weekly data exchanges with wholesale distributors like UNFI, and EDI document processing for invoices and purchase orders. These ETL processes become part of the operational infrastructure and must be maintained, monitored, and supported indefinitely.

The distinction matters because the engineering, testing, and support requirements are fundamentally different. A one-time migration needs thorough validation and reconciliation but can be retired after go-live. A permanent integration pipeline needs monitoring, error handling, alerting, and ongoing support as the source and destination systems evolve over time.

Why ETL Fails

Most ETL pain comes from gaps that surface only at scale or under unusual data.

ETL failures are among the most common and most damaging problems in enterprise computing. They are also among the most preventable.

The most frequent cause is insufficient understanding of the source data. The source system has been running for years or decades. It contains edge cases, historical data that does not conform to current business rules, null values where none were expected, duplicate records, and undocumented fields that turned out to be important. If the extraction step does not account for the full reality of the source data, the transformation will produce incorrect results.

Transformation logic that is too simplistic is the second major cause. Mapping field A to field B seems straightforward until you discover that field A means different things depending on the value of field C, or that the source system uses a code table that has been modified over the years and no longer matches the documentation. Transformation must account for how the data actually exists, not how it was supposed to exist. Profiling the source data before writing transformation logic (examining column types, null rates, value distributions, and outliers) catches these surprises early. The free Data Profiler on this site does exactly this for CSV and TSV data.

Inadequate validation is the third. Data that looks correct in a spot check may contain systematic errors that only become visible when the full dataset is loaded and reconciled. Row counts don't match. Totals don't balance. Records that should be unique are duplicated. Without thorough validation and reconciliation as a formal step in the process, these errors reach production.

Finally, ETL processes built without proper error handling will fail silently. A record that cannot be transformed is skipped without logging. A load that partially completes is not rolled back. A nightly process that fails at 2 AM is not noticed until someone runs a report the next morning and gets wrong numbers. Robust ETL requires logging, alerting, and recoverability.

ETL and EDI

EDI (Electronic Data Interchange) is a standardized format for exchanging business documents between trading partners. Invoices, purchase orders, shipping notices, and other transaction documents flow between companies in EDI format. Processing these documents is, at its core, an ETL problem.

An incoming EDI purchase order must be extracted from the communication channel, the EDI envelope must be parsed, the document content must be transformed into the format your internal systems expect, and the data must be loaded into your order management or ERP platform. An outgoing EDI invoice reverses the process: extracting data from your financial system, transforming it into the EDI format your trading partner expects, and transmitting it.

Organizations that handle significant EDI volume (common in grocery retail, warehousing, and distribution) are running permanent ETL pipelines whether they think of them that way or not. The reliability, accuracy, and error handling of these pipelines directly affects invoicing accuracy, order fulfillment, and trading partner relationships.

ETL and Database Migration

Database migration is one of the most demanding ETL scenarios. Moving data from DB2 for i to PostgreSQL, from Informix to SQL Server, or from any legacy database to a modern platform requires deep understanding of both the source and destination systems.

Schema differences are just the beginning. Data types that exist in one database may not have direct equivalents in another. Character encoding differences can corrupt text data. Stored procedures, triggers, and computed columns in the source may embed business logic that must be preserved in the transformation. Sequences, identity columns, and key generation strategies differ between platforms.

Java and JDBC provide a significant advantage here. Because JDBC offers a uniform interface to virtually any relational database, the extraction and loading code can connect to both the source and destination databases through the same API. This makes it possible to build migration tools that are portable across database platforms and reusable across projects. I prefer Java for legacy data extraction where possible, even when the source system is IBM i, because the cross-platform portability and database connectivity reduce the risk of building tools that are tied to a single environment.

Validation and reconciliation must be built into every database migration. Row counts, checksum comparisons, referential integrity checks, and business-level reconciliation (do the totals balance, do the reports match, does the application produce the same results with the migrated data) are not optional steps. They are the difference between a successful migration and one that introduces data quality problems that take months to discover.

ETL and PCI Compliance

Any ETL process that handles payment card data falls under PCI DSS requirements. This includes extracting transaction records from POS systems, transforming financial data for reconciliation, and loading records into finance or reporting platforms.

PCI compliance for ETL means the data must be encrypted in transit and at rest. Access to the ETL processes and the systems they connect must be controlled and audited. Logs must be maintained. Temporary files created during transformation must be handled securely and cleaned up. The ETL infrastructure itself (the servers, the network paths, the credentials) must be within scope of the PCI assessment.

For organizations in retail, grocery, and finance, this is not a theoretical concern. ETL processes that move payment data are audit targets. Building them without PCI awareness creates compliance gaps that may not be discovered until an audit or, worse, a breach.

Building ETL the Right Way

Well-designed ETL processes share several characteristics regardless of whether they are one-time migrations or permanent integration pipelines.

They are documented. The source and destination systems, the transformation logic, the business rules applied, and the validation criteria are recorded clearly enough that someone other than the original developer can understand and maintain them.

They are testable. Each step (extraction, transformation, loading) can be run and verified independently. Test data exists. Expected results are defined. Validation is automated, not manual.

They are recoverable. When something fails (and it will), the process can identify where it failed, what data was affected, and how to recover without re-running the entire process from scratch.

They are monitored. Permanent ETL pipelines have logging, alerting, and dashboards that show whether the process ran successfully, how many records were processed, and whether any errors occurred. No ETL process should fail silently.

And they are built by people who understand both the source and destination systems, not just the tooling in between. The most sophisticated ETL framework in the world will produce wrong results if the person building the transformation does not understand the business meaning of the data they are moving.

My Experience with ETL

ETL work has been a central part of my consulting practice for decades. I have built one-time migration tools for database conversions, platform transitions, and system replacements. I have built permanent integration pipelines that move data between IBM i systems, Linux-based platforms, cloud services, and external trading partners on daily, hourly, and real-time schedules.

My approach favors Java and JDBC for extraction and loading where possible, because the cross-platform database connectivity means the same tooling works regardless of whether the source is DB2 for i, Informix, PostgreSQL, or SQL Server. For IBM i environments, this means extraction can happen through JDBC directly rather than requiring RPG programs to export data, which simplifies the process and reduces dependency on platform-specific code.

I have particular experience with ETL in grocery retail environments, where the systems involved (merchandising platforms like GSS, Retalix HQ, and BRData, fresh department tools like Invatron PLUM, ordering systems like Upshop MAGIC CGO, warehouse logistics like Power Enterprise, and wholesale distributor interfaces like UNFI) all need to exchange data reliably and accurately.

Whether you need a one-time data migration, a permanent integration pipeline, help with EDI processing, or an assessment of existing ETL processes that have become fragile or undocumented, I can help. With over 35 years of enterprise software development experience spanning IBM i, Java, Linux, and cross-platform integration, I bring the breadth of understanding needed to work across every layer of the ETL process.

Making Good Decisions About ETL

If your organization is planning a system migration, building new integrations, or struggling with data quality issues that trace back to how data moves between systems, ETL is the conversation you need to have.

The questions that matter: Do you know what ETL processes are currently running in your environment? Are they documented? Are they monitored? Are they built to handle errors and recover gracefully? Do the people maintaining them understand both the source and destination systems? And if you are planning a migration or new integration, do you have ETL expertise on the team that understands the full scope of the work?

If you are not confident in the answers, that is exactly the kind of conversation I have with clients. An honest assessment of your data movement infrastructure, whether it is for a migration project or ongoing operations, can prevent expensive problems before they happen. For a hands-on look at what modernized data can do, see it in action on this site.

Have thoughts on this article? Share them.

Try the free ETL tools: data diff, format converters, and file parsers for data pipeline work. Browse free tools →

Contact

Need help with data movement or integration?

Whether it is a one-time migration, an ongoing data pipeline, or EDI processing that needs to be more reliable, I build the ETL that keeps your systems connected.

Call +1-888-882-8735 toll-free or +1-512-870-8510 direct.

Or send a message, schedule a call, or ask Ask James.

ETL & Data Pipelines Integration EDI See It In Action More Insights