Context
Infrastructure-as-code (IaC) is a DevOps practice that facilitates the management and provisioning of infrastructure by utilizing machine-readable files known as IaC scripts. Similarly to other types of source code artifacts, these scripts are susceptible to defects that may hinder their functionality.
Objective
We conjecture that Program Dependence Graph (PDG) metrics may provide insights into the defectiveness of IaC scripts and, based on such a conjecture, we propose to develop and empirically evaluate a new defect prediction model based on PDG metrics.
Method
We extracted 11 PDG metrics from 139 open-source Ansible projects and train five machine learners to assess their capabilities in a within-project scenario, other than comparing them with a state-of-the-art defect predictor relying on structural and process IaC-oriented metrics. Finally, we assessed the performance of a combined model that mixes together PDG and existing IaC-oriented metrics.
Results
The most occurring predictors are MAXPDGVERTICES, EDGESTOVERTICESRATIO, EDGESCOUNT, and VERTICESCOUNT. Program Dependence Graph metricsbased models trained using RANDOM FOREST and DECISION TREE perform statistically better than those relying on the remaining classifiers. PDG metrics-based models correctly predicted the number of bugs over 20% more than Delta and Process metrics-based models. Finally, PDG metrics can improve the performance of Delta and Process metrics. However, such metrics have negligible effects on models employing ICO metrics.
Information
- Category: Defect Prediction Pipeline
- Client: University
- Project date: March-September 2023
- GitHub: github.com/GerardoIuliano/IaC_defect_prediction_using_pdg_metrics
- Paper: link
Technologies
- Python