Should I Get Involved? On the Privacy Perils of Mining Software Repositories for Research Participants

Abstract

Mining Software Repositories (MSRs) cross-links data to uncover actionable information about software systems. Empirical studies in software engineering often leverage MSR techniques as they allow researchers to unveil issues and flaws in software development to analyse the different factors contributing to them. Hence, counting on fine-grained information about the repositories and sources being mined (e.g., server names, and contributors’ identities) is essential for the reproducibility and transparency of MSR studies. However, this can also introduce threats to participant’s privacy as their identities may be linked to flawed/sub-optimal programming practices (e.g., code smells, improper documentation), or vice-versa. Moreover, this can be extended to close collaborators and community members resulting in ‘guilty by association’. This position paper aims to start a discussion about indirect participation in MSRs investigations, the dichotomy of ‘privacy vs. utility’ regarding sharing non-aggregated data, and its effects on privacy restrictions and ethical considerations for participant involvement

Publication
in Workshop on Recruiting Participants for Empirical Software Engineering (ROPES, Co-located with ICSE 2022)

Acknowledgements

This research did not receive any specific grants from funding agencies in the public, commercial, or not-for-profit sectors.

Venue

RoPES is the 1st International Workshop on Recruitment of Participants for Empirical Software Engineering Studies, co-located with ICSE 2022. The website is available here.

Citations

@INPROCEEDINGS{VidoniDiaz2022a,
author = {M. Vidoni and N. Diaz-Ferreyra},
booktitle = "{1st International Workshop on Recruiting Participants for Empirical Software Engineering}",
title = "{Should I Get Involved? On the Privacy Perils of Mining Software Repositories for Research Participants}",
year = {2022},
pages = {1-3},
keywords = {Mining software repositories; developers' privacy; ethical considerations; privacy-preserving protocols; indirect participants},
doi = {},
publisher = {IEEE Computer Society},
address = {Pittsburg, USA},
}