Collaborations Workshop 2018 - 2018-03-26
Group N - CI6-CW18
Reporter
James Baker - drjameswbaker@gmail.com
Participants
Daniel S. Katz
James Baker
Martin Donnelly
Melodee Beals
Richard Adams
Stephen Dowsland
Melodee: data + software + article .. don’t just want lots of footnotes .. messy .. packaging.
Richard: engagement with software in recalcitrant disclines, threshold question
Martin: planning and policies that ensure software and code aren’t considered discrete.
Stephen: frontload and automate documentation, why things done the way they are, best practices
Dan: Software Projects Carpentry group .. define a set of common themes that a project needs to do .. best practices .. dissemination
James: lack of mandates
Cultures of checklists .. Credit for improving, working on, validating documentation .. badges .. what parts of checking documentation could be automated? ..
Dan: Could one build a service that checks the completeness and quality of documentation (for the developer, or for a user)? Could a group provide a set of repositories with documentation ratings, which could then be used to train a ML/DL model, which could then be used behind the service.
Title
Crediting Super Awesome Documentation to get Research Software Certification
Context / Research Domain
Documentation of research software, its use and connections to datasets (real and dummy) and narrative research outputs (articles, papers, etc).
Problem
Software documentation is hard. That documentation takes time. That documentation doesn’t meet the needs of various audiences. That documentation can alienate and make software projects seem inaccessible. That documentation rarely makes it clear where the creators are coming from intellectually/professionally. And there is no easy mechanism for rating the quality of that documentation so that users can make informed choices about reusing the software or the quality of the research that uses it.
Solution
A service (based on a trained ML/DL model) that checks the completeness and quality of documentation (for the developer, or for a user) and a template that signals best practice for software-supported/based research projects. This might allows semi-automated and user documentation of:
- Software(s) used
- Source Code Documentation
- Narrative documentation / usage instructions
- Data sets used
- Deposit location
- Field / Element descriptors
- Provenance information
- Research outputs
- Data papers
- Method Papers
- Journal Articles / Narrative Outputs
It takes user inputs (URLs, DOIs, etc) to make automated checks of the completeness and quality of documentation (for the developer, or for a user). It uses programming language documentation to flag non-standard elements in the research software that need further explanation in software documentation. It combines these checks to produce a rating (certificate / badge / retitling [eg, see F1000 model]) for the quality of documentation. These ratings allow users to find all relevant parts of a project, facilitate informed choices about reusing the software or the quality of the research that uses it, and encourage article writers to reflect upon and acknowledge / document software and data use.
Diagrams / Illustrations
**All images are CC0 from PixelBay