Skip to the content.

Query based testing

Query based testing is a popular approach for automating data-driven testing. It consists of executing a query and comparing its actual outcome to an expected outcome.

It requires both functional and technical skills to be designed. Functional skills are needed to define the appropriate tests for the data. Technical skills are needed to write these tests in a technical format (SQL, data expressions, etc.). And it is this technical format that will allow automating the execution and evaluation.

Instead of manually testing a couple of data points, it becomes possible to easily scale the number of tests and execute them regularly. For this reason, this technique is used a lot for regression testing.

This approach can be applied to multiple situations:

Are query based executions still relevant?

In the current context of big data solutions, one can wonder if query based execution are still relevant. They are still around and will continue to do so. After introduction of new technologies and patterns, at some point query based interfaces are added and maintained.

sqltimeline source: https://conferences.oreilly.com/strata/strata-eu/public/schedule/detail/74048; authors: Elliot West, Jaydene Green

we refer to query based interfaces rather that just SQL given the growth in other structure query languages across the data technology landscape.

Test examples

Examples of tests include:

Techniques

Typical techniques for performing query based testing include:

There is however not one technique that will be able to provide full coverage of data tests.

Therefore, the best strategy is to design an approach that makes use of each technique where most effective but also efficient in setup and execution. Test automation is all about reducing risks through execution of a high number of test cases. Scale the right test cases and issues are more likely to be detected in an earlier phase.

In the context of query based testing, an end-to-end test script will help to keep the system that is tested under control:

Use of minus queries

A minus query is a query that uses minus type operator (minus, except, intersect, etc.) to subtract one result from another giving the difference between both.

(add intersection diagram)

Minus queries compare two query results, input and output, allowing to verify inside a system if actual and expected outcome match:

The advantage is that the system will perform the work, not the testing application. The data is not duplicated. Only the test in the query set is performed. This is a disadvantage at the same time since that all data needs to be accessible by this same system.

This can result in different strategies if the data resides on different systems:

Distributed query engine

(add presto)

Good practices

Creating and comparing data structure statistics

Typical examples here include: