Skip to the content.

ETL flow testing

This page highlights several options for testing ETL flows.

Getting started

Starting ETL flow testing is possible in small steps: first, we will validate if ETL flows execute; then we will add additional steps after which we will introduce validations. It will be these validations that we will scale and grow fast using configuration based automation components.

Simple ETL flow execution

Is it possible to run the ETL flow?

Add the necessary cleaning steps

Use delete/truncate statements to perform the necessary cleaning upfront:

Add the necessary initialization steps

Insert relevant initialization values in tables or configuration files

Add a source data file transfer for loading into the target table

Transfer the source file to be loaded:

Add a check for the data loaded

Check if the data has been loaded correctly from the source file:

Integration with technical ETL frameworks

Use case: Check if any file is still queued for processing. This is to avoid to that clashes in processing occur:

Transfer the source data file for processing:

Wait for the file to be picked up by the technical ETL framework:

If the file has been picked up, the ETL flow can be started:

Run end-to-end chains of ETL flows

Run multiple ETL flows one after the other across the different repositories:

Creating reusable scripts for testing multiple ETL flows

Reuse a common script for testing multiple ETL flows:

Note: Additional parameters can be applicable depending on the script design

Execute using a parameter file

ETL1.conf
---------
file=source.csv
path=/sourcePath
project=DStage1
Job=ETL1
Etc.
bin/iesi-launch.sh -script commonScript1 -env <arg> -paramfile /path/ETL1.conf

Organize the configuration files using a predefined structure

Folder structure
----------------
base
  * ETL1
    * Data
	  * clean.sql
	  * run.sh
	  * checkload.sql
  * ETL2
    * Data
	  * clean.sql
	  * run.sh
	  * checkload.sql
bin/iesi-launch.sh -script commonScript1 -env <arg> -paramlist ETL=ETL1