A Comprehensive Hybrid Model for Language-Independent Defect Prediction in Microservices Architecture
DOI:
https://doi.org/10.51983/ajcst-2023.12.2.3763Keywords:
Software Defect Prediction, Machine Learning Approach, Predictive Accuracy, Hybrid Model, BiD-LSTM, BERT, ASTsAbstract
The transformation of software development from monolithic frameworks to microservices-based architectures, focusing on the challenges of creating a unified defect prediction model that spans various programming languages in practice of automating integration of code modification into a single codebase. It proposes a hybrid machine learning approach to enhance defect prediction accuracy by integrating different data sources and algorithms. The goal is to create a language and project-independent model. The hybrid model combines Bi-Directional LSTM (BiD-LSTM) networks and Attention mechanisms, static code metrics, and BERT-based language models. BiLSTM-Attention captures temporal dependencies within Abstract Syntax Trees (ASTs), static code metrics provide insights into software complexity, and BERT interprets textual context for a holistic understanding of code snippets. The research methodology involves quantitative techniques, starting with a literature review to establish the theoretical foundation. An empirical study follows, encompassing data gathering, feature crafting and pre-processing, model building, training and evaluation, validation and analysis and conclusions. The research’s insights aim to improve defect prediction techniques, contributing to software engineering’s pursuit of better quality and reliability.
References
L. Li and H. Leung, "Mining Static Code Metrics for a Robust Prediction of Software Defect-Proneness," in International Symposium on Empirical Software Engineering and Measurement, pp. 207-214, Sep. 2011. DOI: 10.1109/ESEM.2011.29.
"ISO/IEC 9126-1:2001 - Software engineering - Product quality - Part 1: Quality model," Accessed: Jul. 23, 2023. [Online]. Available: https://www.iso.org/standard/22749.html.
G. Giray, K. E. Bennin, Ö. Köksal, Ö. Babur and B. Tekinerdogan, "On the use of deep learning in software defect prediction," Journal of Systems and Software, Vol. 195, pp. 111537, Jan. 2023, DOI: 10.1016/j.jss.2022.111537.
C. L. Prabha and N. Shivakumar, "Software Defect Prediction Using Machine Learning Techniques," in 4th International Conference on Trends in Electronics and Informatics (ICOEI) (48184), pp. 728-733, Jun. 2020. DOI: 10.1109/ICOEI48184.2020.9142909.
S. Motogna, D. Lupsa and I. Ciuciu, "An NLP Approach to Software Quality Models Evaluation," in On the Move to Meaningful Internet Systems: OTM 2018 Workshops, C. Debruyne, H. Panetto, W. Guédria, P. Bollen, I. Ciuciu, and R. Meersman, Eds., in Lecture Notes in Computer Science, Cham: Springer International Publishing, pp. 207-217, 2019. DOI: 10.1007/978-3-030-11683-5_24.
S. Omri and C. Sinz, "Deep Learning for Software Defect Prediction: A Survey," in Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops, in ICSEW’20, New York, NY, USA: Association for Computing Machinery, pp. 209-214, Sep. 2020. DOI: 10.1145/3387940.3391463.
C. Mendez et al., "Open source barriers to entry, revisited: a sociotechnical perspective," in Proceedings of the 40th International Conference on Software Engineering, in ICSE ’18, New York, NY, USA: Association for Computing Machinery, pp. 1004-1015, May 2018. DOI: 10.1145/3180155.3180241.
A. Alami, M. L. Cohn and A. Wasowski, "Why does code review work for open source software communities?," in Proceedings of the 41st International Conference on Software Engineering, in ICSE ’19, Montreal, Quebec, Canada: IEEE Press, pp. 1073-1083, May 2019. DOI: 10.1109/ICSE.2019.00111.
T. Menzies et al., "Local versus Global Lessons for Defect Prediction and Effort Estimation," IEEE Transactions on Software Engineering, Vol. 39, No. 6, pp. 822-834, Jun. 2013. DOI: 10.1109/TSE.2012.83.
I. Ibarguren, J. M. Pérez, J. Mugerza, D. Rodriguez and R. Harrison, "The Consolidated Tree Construction algorithm in imbalanced defect prediction datasets," in IEEE Congress on Evolutionary Computation (CEC), pp. 2656-2660, Jun. 2017. DOI: 10.1109/CEC.2017.7969629.
X. Y. Jing, F. Wu, X. Dong and B. Xu, "An Improved SDA Based Defect Prediction Framework for Both Within-Project and Cross-Project Class-Imbalance Problems," IEEE Transactions on Software Engineering, Vol. 43, No. 4, pp. 321-339, Apr. 2017, DOI: 10.1109/TSE.2016.2597849.
A. Okutan and O. T. Yıldız, "Software defect prediction using Bayesian networks," Empir Software Eng., Vol. 19, No. 1, pp. 154-181, Feb. 2014, DOI: 10.1007/s10664-012-9218-8.
C. Nalini and T. Murali Krishna, "An Efficient Software Defect Prediction Model Using Neuro Evalution Algorithm based on Genetic Algorithm," in Second International Conference on Inventive Research in Computing Applications (ICIRCA), pp. 135-138, Jul. 2020. DOI: 10.1109/ICIRCA48905.2020.9182869.
W. Zheng et al., "The impact factors on the performance of machine learning-based vulnerability detection: A comparative study," Journal of Systems and Software, Vol. 168, pp. 110659, Oct. 2020. DOI: 10.1016/j.jss.2020.110659.
C. Lang, J. Li and T. Kobayashi, "Software Defect Prediction via Multi-Channel Convolutional Neural Network," in IEEE 21st International Conference on Software Quality, Reliability and Security (QRS), pp. 543-554, Dec. 2021. DOI: 10.1109/QRS54544.2021.00065.
R. B. Bahaweres, D. Jumral, I. Hermadi, A. I. Suroso and Y. Arkeman, "Hybrid Software Defect Prediction Based on LSTM (Long Short Term Memory) and Word Embedding," in 2nd International Conference On Smart Cities, Automation & Intelligent Computing Systems (ICON-SONICS), pp. 70-75, Oct. 2021. DOI: 10.1109/ICON-SONICS53103.2021.9617182.
A. Lear et al., "Ensemble Machine Learning Model for Software Defect Prediction," Vol. 2, pp. 11-21, Jul. 2021.
I. H. Laradji, M. Alshayeb, and L. Ghouti, "Software defect prediction using ensemble learning on selected features," Information and Software Technology, Vol. 58, pp. 388-402, Feb. 2015, DOI: 10.1016/j. infsof.2014.07.005.
L. Qiao, X. Li, Q. Umer, and P. Guo, "Deep learning based software defect prediction," Neurocomputing, Vol. 385, pp. 100-110, Apr. 2020, DOI: 10.1016/j.neucom.2019.11.067.
M. J. Siers and M. Z. Islam, "Software defect prediction using a cost-sensitive decision forest and voting, and a potential solution to the class imbalance problem," Information Systems, Vol. 51, pp. 62-71, Jul. 2015. DOI: 10.1016/j.is.2015.02.006.
C. Jin and S. W. Jin, "Prediction approach of software fault-proneness based on hybrid artificial neural network and quantum particle swarm optimization," Applied Soft Computing, Vol. 35, pp. 717-725, Oct. 2015, DOI: 10.1016/j.asoc.2015.07.006.
P. He, B. Li, X. Liu, J. Chen and Y. Ma, "An empirical study on software defect prediction with a simplified metric set," Information and Software Technology, Vol. 59, pp. 170-190, Mar. 2015. DOI: 10.1016/j.infsof.2014.11.006.
L. Chen, B. Fang, Z. Shang, and Y. Tang, "Negative samples reduction in cross-company software defects prediction," Information and Software Technology, Vol. 62, pp. 67-77, Jun. 2015. DOI: 10.1016/j.infsof.2015.01.014.
T. Shailesh, A. Nayak, and D. Prasad, "Performance Prediction of Configurable softwares using Machine learning approach," in 4th International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT), IEEE, Mangalore, India, pp. 7-10, Sep. 2018. DOI: 10.1109/iCATccT44854.2018.9001957.
R. Malhotra, L. Bahl, S. Sehgal, and P. Priya, "Empirical comparison of machine learning algorithms for bug prediction in open source software," in International Conference on Big Data Analytics and Computational Intelligence (ICBDAC), pp. 40-45, Mar. 2017. DOI: 10.1109/ICBDACI.2017.8070806.
H. D. Tran, L. T. M. Hanh, and N. T. Binh, "Combining feature selection, feature learning and ensemble learning for software fault prediction," in 11th International Conference on Knowledge and Systems Engineering (KSE), pp. 1-8, Oct. 2019. DOI: 10.1109/KSE.2019.8919292.
A. Souri, A. S. Mohammed, M. Yousif Potrus, M. H. Malik, F. Safara, and M. Hosseinzadeh, "Formal Verification of a Hybrid Machine Learning-Based Fault Prediction Model in Internet of Things Applications," IEEE Access, Vol. 8, pp. 23863-23874, 2020, DOI: 10.1109/ACCESS.2020.2967629.
E. Elahi, S. Kanwal, and A. N. Asif, "A new Ensemble approach for Software Fault Prediction," in 17th International Bhurban Conference on Applied Sciences and Technology (IBCAST), pp. 407-412, Jan. 2020. DOI: 10.1109/IBCAST47879.2020.9044596.
J. Ge, J. Liu, and W. Liu, "Comparative Study on Defect Prediction Algorithms of Supervised Learning Software Based on Imbalanced Classification Data Sets," in 19th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), pp. 399-406, Jun. 2018. DOI: 10.1109/SNPD.2018.8441143.
Z. Xu et al., "A comprehensive comparative study of clustering-based unsupervised defect prediction models," Journal of Systems and Software, Vol. 172, pp. 110862, Feb. 2021, DOI: 10.1016/j.jss.2020.110862.
V. Walunj, G. Gharibi, R. Alanazi, and Y. Lee, "Defect prediction using deep learning with Network Portrait Divergence for software evolution," Empir Software Eng., Vol. 27, No. 5, pp. 118, Jun. 2022, DOI: 10.1007/s10664-022-10147-0.
H. Liang, Y. Yu, L. Jiang, and Z. Xie, "Seml: A Semantic LSTM Model for Software Defect Prediction," IEEE Access, Vol. 7, pp. 83812-83824, 2019, DOI: 10.1109/ACCESS.2019.2925313.
H. Alsolai and M. Roper, "A Systematic Review of Feature Selection Techniques in Software Quality Prediction," in International Conference on Electrical and Computing Technologies and Applications (ICECTA), pp. 1-5, Nov. 2019. DOI: 10.1109/ICECTA48151.2019.8959566.
A. Kaur, K. Kaur, and H. Kaur, "An investigation of the accuracy of code and process metrics for defect prediction of mobile applications," in 4th International Conference on Reliability, Infocom Technologies and Optimization (ICRITO) (Trends and Future Directions), pp. 1-6, Sep. 2015. DOI: 10.1109/ICRITO.2015.7359220.
P. M. Pardalos, V. Rasskazova, and M. N. Vrahatis, Eds., Black Box Optimization, Machine Learning, and No-Free Lunch Theorems, in Springer Optimization and Its Applications, Cham: Springer International Publishing, Vol. 170, 2021. DOI: 10.1007/978-3-030-66515-9.
T. Mori and N. Uchihira, "Balancing the trade-off between accuracy and interpretability in software defect prediction," Empir Software Eng., Vol. 24, No. 2, pp. 779-825, Apr. 2019, DOI: 10.1007/s10664-018-9638-1.
E. A. Felix and S. P. Lee, "Integrated Approach to Software Defect Prediction," IEEE Access, Vol. 5, pp. 21524-21547, 2017, DOI: 10.1109/ACCESS.2017.2759180.
P. He, B. Li, X. Liu, J. Chen, and Y. Ma, "An Empirical Study on Software Defect Prediction with a Simplified Metric Set," Information and Software Technology, Vol. 59, pp. 170-190, Mar. 2015, DOI: 10.1016/j.infsof.2014.11.006.
Meiliana, S. Karim, H. L. H. S. Warnars, F. L. Gaol, E. Abdurachman, and B. Soewito, "Software Metrics for Fault Prediction Using Machine Learning Approaches: A Literature Review with PROMISE Repository Dataset," in IEEE International Conference on Cybernetics and Computational Intelligence (CyberneticsCom), pp. 19-23, Nov. 2017. DOI: 10.1109/CYBERNETICSCOM.2017.8311708.
J. M. Catherine and S. Djodilatchoumy, "Multi-Layer Perceptron Neural Network with Feature Selection for Software Defect Prediction," in 2nd International Conference on Intelligent Engineering and Management (ICIEM), pp. 228-232 Apr. 2021. DOI: 10.1109/ICIEM51511.2021.9445350.
“PROMISE-backup/bug-data at master · feiwww/PROMISE-backup,” GitHub. Accessed: Dec. 25, 2021. [Online]. Available: https://github.com/feiwww/PROMISE-backup.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 The Research Publication
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.