Int4 Suite Machine Learning

Int4 Suite is designed to automatically capture and process high volumes of test cases. It’s not uncommon to create millions of test cases for hundreds of interfaces in a typical project, thanks to the Robotic Crawlers and Import features.

Following the most simple approach, capturing months of test data for a given interface should give a close to 100% coverage of business scenarios possible for a given interface. While such an approach can be deemed safe, it might be superfluous. Eventually, if the interface does not operate correctly (regardless of the reason) using all these test cases for testing would result in a high volume of repeated errors.

One simple option to reduce the number of test cases is to use randomized selection of test cases possible in the Robotic Crawlers. But then, we have no certainty that the randomized sample maintains the same coverage.

In order to mitigate the above risk, Int4 Suite introduces test case pre-selection using Machine Learning and Test Case Analysis Report to help reduce the number of test cases, while keeping the coverage.

Technical Considerations

The intention of the introduced solution is to provide a “better than random” test case selection for massive testing scenarios. In order to achieve that, we selected an algorithm called LDA (Latent Dirichlet allocation) for our purpose. Prior to application of the algorithm we convert all the test case data into set of tokens uniquely representing the test case data and we feed that information into the algorithm. It’s logic, based on statistics, groups test cases into classes (topics) which contain messages that are statistically similar. Next, we select possibly same amounts of messages from these groups, so that we end up with a set of messages that represent the sample given their uniqueness distribution. We also make sure to select both the perfect group representatives and the group outliers to maximize the differentiation.

While the LDA algorithm is field-proven and guarantees statistically correct results, it’s inner workings are not human-readable and thus difficult to explain. So, in order to ensure results can be explained to non-technical stakeholders, Int4 Suite provides analytic tools alongside the LDA implementation.

Complexity related to configuration of the algorithm.

Our solution is based on the SAP S/4HANA database implementation of the LDA algorithm. This algorithm has a number of control parameters that influence it’s results. The most important parameter is the number of topics (classes) that the algorithm should identify. Then, there are more detailed parameters that tweak the algorithm behaviour.

In order to perfectly tweak the algorithm, one needs to configure them and experiment, until a satisfactory result is obtained. In real life scenarios, engineers tuning these parameters use trial-and-error approaches, automating scans going thru many sets of these parameters.

Since the engineering approach is complex and intervened, we chosen a different one. Our implementation provides good-enough default parameters to control the LDA algorithm. To ensure the explainability of the algorithm and transparency of the result, we provided Test Case Analysis Report that allows to analyze the data from the perspective of relevant business dimensions.

Combining the two enables a good view on the algorithm results and allows for manual intervention post LDA application to reduce test case numbers at least tenfold, while ensuring 100% coverage with minimal manual effort.