
Structuring logical tests
In a previous post, I argued that we should not categorize tests into blackbox and whitebox. However, since a lot of people know these terms, maybe “whiteboxing blackbox tests” would make more sense for them than “structuring logical tests”.
Let me put you into context to clarify what I exactly mean by doing this.
When the creation of tests happens between different teams, it is common that the teams want to make sure everything is covered and those tests are correct, especially if there is no trust between the teams or team members or if the people that created the tests have left the team already.
Rather than sharing test case definitions (some teams would not even have a written track of them), and having to understand the logic for integration or end to end tests, some people want to get automated checks to verify that everything is covered. Here is when the structuring of logical tests happen: they want to turn logical tests into structure tests.
The most typical example is when someone wants to get the code coverage percentage of integration or of end to end tests. While this is possible, it is very expensive and heavy to do, and it makes just about relative sense to do so.
How do we turn a logical test into a structural test?
First, you need to instrument the code. This means that the code will be modified to be compiled and saved alongside the line numbers and other sets of instructions to measure its coverage when each line is executed as a result of a call from a different piece of code.
Then you need to have that instrumented code available for the test package to interact with it. This could be done with some sort of packaging tool such as docker, building them directly together locally or building everything in the cloud. Whichever way you do this, it means you need to use quite a lot of space, since you need space for both pieces of code as well as all the instrumentation.
Once you execute the tests, you should be able to see the results of the number of lines that were hit during the execution of the tests. However, what does that mean as a quality measure?
Is code coverage the ultimate quality measure?
We talk about code coverage when we deal with unit tests, because with unit tests we are checking the structure of the code. In that scenario, it makes sense to use this as a quality measuring tool. We know what the code looks like or will look like when we write the tests, we want to check the application on a structural level and validate it this way. Therefore, we validate that all lines or branches of the code are being executed when tests run, and if there are issues they should arise from that execution. We can also validate the quality of our tests, as if we have missed pieces of code from being executed, we are likely missing tests too.
When we deal with integration tests, what we want to check is the integration between the different components. While this is commonly done knowing the structure of the code (sort of unit tests that use integrations directly instead of mocking them), the best practice does not require this. Generally, it is more convenient to know the list of API calls or services that our system provides and needs and make sure those calls are all well tested.
When we deal with end to end tests, we typically want to check the user behavior/flow. We don’t really care how this is happening, but we want to know that it happens. We are checking that the logic of the application is valid, not the correctness of the code.
It makes no sense to check the lines of code covered from either of those types because what we are testing is a different concept altogether.
Furthermore, as we go higher in the test pyramid, we should be having less number of tests. So measuring them against lines of code is against this principle, since we could miss most of them by the time we are working with end to end tests. At the same time, they should have been checked by the unit tests by now.
The only case that we might want to check lines of code during integration testing is when we are not doing mocking during unit testing and we are working with some kind of hybrid solution, between unit and integration testing, which is not the best practice, but it is done sometimes.
What should we be doing instead?
I think it’s wonderful that everyone cares for quality and is involved in the quality process, but rather than reinventing the wheel, there are other alternatives that you could explore.
For starters, make sure the teams agree on a list of test cases. If there is one, feel free to review it and check what’s missing. If there is none, agree with the team in charge of that testing to create one together (and please, don’t write it on a spreadsheet). Make sure you have test cases written ahead of time for the following up features. And, *sight*, I hate to say this, but if this is hard, try using some sort of BDD solution, but make sure you are using it well and with its intended purpose. It might be boring to write down everything you want to test and review it with the entire team, but it is proven to work, when done correctly.
With a list of test cases, you should agree to create automation to understand the test coverage (not code coverage), as in, how many tests were automated or executed. Now that you know what each of them should cover, you should know if that coverage will be good, even before the tests are written.
You could even take it a step further and automate these checks from the test definition, or even automate the test definitions from existing test automation. You could even use some intelligent system for all of this.
Finally, make sure you have some sort of “contract” tests to validate previous test level. Keeping these tests separated from your normal feature tests (similar as contract tests would be), you could have some tests that cover basic scenarios that are a must. However, you need to be very careful with these and use them sparingly, so they don’t end up turning into the problem that we are stating in this post.
At the end of the day, it might make more sense to “align the stars” than to try to test it all just in case, but that’s well..another story.