Context: Mining Software Repositories (MSR) is a growing area of Software Engineering (SE) research. Since their emergence in 2004, many investigations have analysed different aspects of these studies. However, there are no guidelines on how to conduct systematic MSR studies. There is a need to evaluate how MSR research is approached to provide a framework to do so systematically.
Objective: To identify how MSR studies are conducted in terms of repository selection and data extraction. To uncover potential for improvement in directing systematic research and providing guidelines to do so.
Method: A systematic literature review of MSR studies was conducted following the guidelines and template proposed by Mian et al. (which refines those provided by Kitchenham and Charters). These guidelines were extended and revised to provide a framework for systematic MSR studies.
Results: MSR studies typically do not follow a systematic approach for repository selection, and many do not report selection or data extraction protocols. Furthermore, few manuscripts discuss threats to the study’s validity due to the selection or data extraction steps followed.
Conclusions: Although MSR studies are evidence-based research, they seldom follow a systematic process. Hence, there is a need for guidelines on how to conduct systematic MSR studies. New guidelines and a template have been proposed, consolidating related studies in the MSR field and strategies for systematic literature reviews.
(2021-12-11) I was invited to suggest changes to the preliminary ACM SIGSOFT Empirical Standard. Changes were made by pull request into the open repository, and later streamlined.
@article{Vidoni2021c,
title = "{A Systematic Process for Mining Software Repositories: Results from a Systematic Literature Review}",
journal = {Information and Software Technology},
pages = {106791},
year = {2022},
issn = {0950-5849},
doi = {https://doi.org/10.1016/j.infsof.2021.106791},
url = {https://www.sciencedirect.com/science/article/pii/S0950584921002317},
author = {M. Vidoni},
keywords = {Mining Software Repositories, Systematic literature review, Evidence-based software engineering, Guidelines},
}
The following is the venue impact, according to Scimago Journal Ranking: