A Systematic Process for Mining Software Repositories: Results from a Systematic Literature Review


Context: Mining Software Repositories (MSR) is a growing area of Software Engineering (SE) research. Since their emergence in 2004, many investigations have analysed different aspects of these studies. However, there are no guidelines on how to conduct systematic MSR studies. There is a need to evaluate how MSR research is approached to provide a framework to do so systematically.
Objective: To identify how MSR studies are conducted in terms of repository selection and data extraction. To uncover potential for improvement in directing systematic research and providing guidelines to do so.
Method: A systematic literature review of MSR studies was conducted following the guidelines and template proposed by Mian et al. (which refines those provided by Kitchenham and Charters). These guidelines were extended and revised to provide a framework for systematic MSR studies.
Results: MSR studies typically do not follow a systematic approach for repository selection, and many do not report selection or data extraction protocols. Furthermore, few manuscripts discuss threats to the study’s validity due to the selection or data extraction steps followed.
Conclusions: Although MSR studies are evidence-based research, they seldom follow a systematic process. Hence, there is a need for guidelines on how to conduct systematic MSR studies. New guidelines and a template have been proposed, consolidating related studies in the MSR field and strategies for systematic literature reviews.

M. Vidoni, ‘A Systematic Process for Mining Software Repositories: Results from a Systematic Literature Review’, Information and Software Technology, 2022. ISSN 0950-5849. DOI: 10.1016/j.infsof.2021.106791


  • Systematic literature review examining the process of MSR studies.
  • Provides a systematic approach for MSR research with a protocol template.
  • Proposes guidelines derived from existing processes, consolidating prior works.


  • Assess the current practice of conducting MSR-based studies in Software Engineering (SE) through a systematic literature review (SLR).
  • Identifying gaps in MSR-based studies and compare their process to that of SLRs (as defined in [8]).
  • Centralised the findings into guidelines for systematic MSRs that support an unbiased aggregation of empirical results. These are derived from established SLR practices for SE (Petersen’s and Kitchenham’s guidelines) and supported by results from this SLR and previous related works by other authors.
  • A protocol template as guidelines to conduct MSR research.

Results Impact

(2021-12-11) I was invited to suggest changes to the preliminary ACM SIGSOFT Empirical Standard. Changes were made by pull request into the open repository, and later streamlined.


title = "{A Systematic Process for Mining Software Repositories: Results from a Systematic Literature Review}",
journal = {Information and Software Technology},
pages = {106791},
year = {2022},
issn = {0950-5849},
doi = {https://doi.org/10.1016/j.infsof.2021.106791},
url = {https://www.sciencedirect.com/science/article/pii/S0950584921002317},
author = {M. Vidoni},
keywords = {Mining Software Repositories, Systematic literature review, Evidence-based software engineering, Guidelines},

Venue Impact

The following is the venue impact, according to Scimago Journal Ranking:

SCImago Journal & Country Rank