Collaborations Workshop 2018 - 2018-03-26

Group N - CI6-CW18

Reporter

James Baker - drjameswbaker@gmail.com

Participants

Daniel S. Katz

James Baker

Martin Donnelly

Melodee Beals

Richard Adams

Stephen Dowsland


Melodee: data + software + article .. don’t just want lots of footnotes .. messy .. packaging.

Richard: engagement with software in recalcitrant disclines, threshold question

Martin: planning and policies that ensure software and code aren’t considered discrete.

Stephen: frontload and automate documentation, why things done the way they are, best practices

Dan: Software Projects Carpentry group .. define a set of common themes that a project needs to do .. best practices .. dissemination

James: lack of mandates

Cultures of checklists .. Credit for improving, working on, validating documentation .. badges .. what parts of checking documentation could be automated? ..

Dan: Could one build a service that checks the completeness and quality of documentation (for the developer, or for a user)? Could a group provide a set of repositories with documentation ratings, which could then be used to train a ML/DL model, which could then be used behind the service.

Title

Crediting Super Awesome Documentation to get Research Software Certification

Context / Research Domain

Documentation of research software, its use and connections to datasets (real and dummy) and narrative research outputs (articles, papers, etc).

Problem

Software documentation is hard. That documentation takes time. That documentation doesn’t meet the needs of various audiences. That documentation can alienate and make software projects seem inaccessible. That documentation rarely makes it clear where the creators are coming from intellectually/professionally. And there is no easy mechanism for rating the quality of that documentation so that users can make informed choices about reusing the software or the quality of the research that uses it.

Solution

A service (based on a trained ML/DL model) that checks the completeness and quality of documentation (for the developer, or for a user) and a template that signals best practice for software-supported/based research projects. This might allows semi-automated and user documentation of:

  • Software(s) used
    • Source Code Documentation
    • Narrative documentation / usage instructions
  • Data sets used
    • Deposit location
    • Field / Element descriptors
    • Provenance information
  • Research outputs
    • Data papers
    • Method Papers
    • Journal Articles / Narrative Outputs

It takes user inputs (URLs, DOIs, etc) to make automated checks of the completeness and quality of documentation (for the developer, or for a user). It uses programming language documentation to flag non-standard elements in the research software that need further explanation in software documentation. It combines these checks to produce a rating (certificate / badge / retitling [eg, see F1000 model]) for the quality of documentation. These ratings allow users to find all relevant parts of a project, facilitate informed choices about reusing the software or the quality of the research that uses it, and encourage article writers to reflect upon and acknowledge / document software and data use.

Diagrams / Illustrations

From reading documentation to achievement.

**All images are CC0 from PixelBay