IaC Defect Prediction

Context

Infrastructure-as-code (IaC) is a DevOps practice that facilitates the management and provisioning of infrastructure by utilizing machine-readable files known as IaC scripts. Similarly to other types of source code artifacts, these scripts are susceptible to defects that may hinder their functionality.

Objective

We conjecture that Program Dependence Graph (PDG) metrics may provide insights into the defectiveness of IaC scripts and, based on such a conjecture, we propose to develop and empirically evaluate a new defect prediction model based on PDG metrics.

Method

We extracted 11 PDG metrics from 139 open-source Ansible projects and train five machine learners to assess their capabilities in a within-project scenario, other than comparing them with a state-of-the-art defect predictor relying on structural and process IaC-oriented metrics. Finally, we assessed the performance of a combined model that mixes together PDG and existing IaC-oriented metrics.

Results

The most occurring predictors are MAXPDGVERTICES, EDGESTOVERTICESRATIO, EDGESCOUNT, and VERTICESCOUNT. Program Dependence Graph metricsbased models trained using RANDOM FOREST and DECISION TREE perform statistically better than those relying on the remaining classifiers. PDG metrics-based models correctly predicted the number of bugs over 20% more than Delta and Process metrics-based models. Finally, PDG metrics can improve the performance of Delta and Process metrics. However, such metrics have negligible effects on models employing ICO metrics.

Information

Category: Defect Prediction Pipeline
Client: University
Project date: March-September 2023
GitHub: github.com/GerardoIuliano/IaC_defect_prediction_using_pdg_metrics
Paper: link

Technologies

Python