Towards a Taxonomy of Roxygen Documentation in R Packages

Abstract

Software documentation is often neglected, impacting maintenance and reuse and leading to technical issues. In particular, when working with scientific software, such issues in the documentation pose a risk to producing reliable scientific results as they may cause improper or incorrect use of the software. R is a popular programming language for scientific software with a prolific package-based ecosystem, where users contribute packages (i.e., libraries). R packages are intended to be reused, and their users rely extensively on the available documentation. Thus, understanding what information developers provide in their packages’ documentation (generally, through a system known as Roxygen, based on Javadoc) is essential to contribute to it. This study mined 379 GitHub repositories of R packages and analysed a sample to develop a taxonomy of natural language descriptions used in Roxygen documentation. This was done through hybrid card sorting, which included two experienced R developers. The resulting taxonomy covers parameters, returns, and descriptions, providing a baseline for further studies. Our taxonomy is the first of its kind for R. Based on previous studies in pure object-oriented languages, our taxonomy could be extensible to other dynamically-typed languages used in scientific programming.

Publication
in Empirical Software Engineering, vol. 28


Contributions

This paper contributes to the need for better documentation standards in R programming (Monperrus et al., 2012) by extending prior work in library documentation to the R domain. Our other contributions are:

  • This is the first study conducted to explore and understand R packages documentation practices.

  • A taxonomy of Roxygen directives for parameters, returns, and description elements. It is structured, including examples (taken from mined GitHub repositories), good practices, and anti-patterns. Our taxonomy is more detailed and complete than Roxygen’s own package documentation.

  • An analysis of the documentation directives, discussing frequencies, anti-patterns, and comparatives to existing taxonomies. The `documentation directives’ are natural-language statements explaining constraints and guidelines about correctly using a piece of code (Monperrus et al., 2012).

  • We make available an extensive, well-documented replication package. See Data Availability.


Citation

@article{Vidoni2023,
title = "{Towards a Taxonomy of Roxygen Documentation in R Packages}",
journal = {Empirical Software Engineering},
volume = {28},
pages = {106},
year = {2023},
issn = {1573-7616},
doi = {https://doi.org/10.1007/s10664-023-10345-4},
author = {Melina Vidoni and Zadia Codabux},
keywords = {R Programming, Package Documentation, Scientific Software, Documentation Taxonomy},
}


Venue Impact

The following is the venue impact, according to Scimago Journal Ranking:

SCImago Journal & Country Rank