This lesson is being piloted (Beta version)
If you teach this lesson, please tell the authors and provide feedback by opening an issue in the source repository

Intermediate Research Software Development: Glossary

Key Points

Setting the Scene
  • This lesson focuses on core, intermediate skills covering the whole software development life-cycle that will be of most use to anyone working collaboratively on code.

  • For code development in teams - you need more than just the right tools and languages. You need a strategy (best practices) for how you’ll use these tools as a team.

  • The lesson follows on from the novice Software Carpentry lesson, but this is not a prerequisite for attending as long as you have some basic Python, command line and Git skills and you have been using them for a while to write code to help with your work.

Section 1: Setting Up Environment For Collaborative Code Development
  • In order to develop (write, test, debug, backup) code efficiently, you need to use a number of different tools.

  • When there is a choice of tools for a task you will have to decide which tool is right for you, which may be a matter of personal preference or what the team or community you belong to is using.

Introduction to Our Software Project
  • Programming interfaces define how individual modules within a software application interact among themselves or how the application itself interacts with its users.

  • MVC is a software design architecture which divides the application into three interconnected modules: Model (data), View (user interface), and Controller (input/output and data manipulation).

  • The software project we use throughout this course is an example of an MVC application that manipulates patients’ inflammation data and performs basic statistical analysis using Python.

Virtual Environments For Software Development
  • Virtual environments keep Python versions and dependencies required by different projects separate.

  • A virtual environment is itself a directory structure.

  • Use venv to create and manage Python virtual environments.

  • Use pip to install and manage Python external (third-party) libraries.

  • pip allows you to declare all dependencies for a project in a separate file (by convention called requirements.txt) which can be shared with collaborators/users and used to replicate a virtual environment.

  • Use python3 -m pip freeze > requirements.txt to take snapshot of your project’s dependencies.

  • Use python3 -m pip install -r requirements.txt to replicate someone else’s virtual environment on your machine from the requirements.txt file.

Integrated Software Development Environments
  • An IDE is an application that provides a comprehensive set of facilities for software development, including syntax highlighting, code search and completion, version control, testing and debugging.

  • PyCharm recognises virtual environments configured from the command line using venv and pip.

Software Development Using Git and GitHub
  • A branch is one version of your project that can contain its own set of commits.

  • Feature branches enable us to develop / explore / test new code features without affecting the stable main code.

Python Code Style Conventions
  • Always assume that someone else will read your code at a later date, including yourself.

  • Community coding conventions help you create more readable software projects that are easier to contribute to.

  • Python Enhancement Proposals (or PEPs) describe a recommended convention or specification for how to do something in Python.

  • Style checking to ensure code conforms to coding conventions is often part of IDEs.

  • Consistency with the style guide is important - whichever style you choose.

Verifying Code Style Using Linters
  • Use linting tools on the command line (or via continuous integration) to automatically check your code style.

Section 2: Ensuring Correctness of Software at Scale
  • Using testing requires us to change our practice of code development, but saves time in the long run by allowing us to more comprehensively and rapidly find faults in code, as well as giving us greater confidence in the correctness of our code.

  • The use of test techniques and infrastructures such as parameterisation and Continuous Integration can help scale and further automate our testing process.

Automatically Testing Software
  • The three main types of automated tests are unit tests, functional tests and regression tests.

  • We can write unit tests to verify that functions generate expected output given a set of specific inputs.

  • It should be easy to add or change tests, understand and run them, and understand their results.

  • We can use a unit testing framework like Pytest to structure and simplify the writing of tests in Python.

  • We should test for expected errors in our code.

  • Testing program behaviour against both valid and invalid inputs is important and is known as data validation.

Scaling Up Unit Testing
  • We can assign multiple inputs to tests using parametrisation.

  • It’s important to understand the coverage of our tests across our code.

  • Writing unit tests takes time, so apply them where it makes the most sense.

Continuous Integration for Automated Testing
  • Continuous Integration can run tests automatically to verify changes as code develops in our repository.

  • CI builds are typically triggered by commits pushed to a repository.

  • We need to write a configuration file to inform a CI service what to do for a build.

  • We can specify a build matrix to specify multiple platforms and programming language versions to test against

  • Builds can be enabled and configured separately for each branch.

  • We can run - and get reports from - different CI infrastructure builds simultaneously.

Diagnosing Issues and Improving Robustness
  • Unit testing can show us what does not work, but does not help us locate problems in code.

  • Use a debugger to help you locate problems in code.

  • A debugger allows us to pause code execution and examine its state by adding breakpoints to lines in code.

  • Use preconditions to ensure correct behaviour of code.

  • Ensure that unit tests check for edge and corner cases too.

  • Using linting tools to automatically flag suspicious programming language constructs and stylistic errors can help improve code robustness.

Section 3: Software Development as a Process
  • Software engineering takes a wider view of software development beyond programming (or coding).

  • Ensuring requirements are sufficiently captured is critical to the success of any project.

  • Following a process makes software development predictable, saves time in the long run, and helps ensure each stage of development is given sufficient consideration before proceeding to the next.

  • Once you get the hang of a programming language, writing code to do what you want is relatively easy. The hard part is writing code that is easy to adapt when your requirements change.

Software Requirements
  • When writing software used for research, requirements will almost always change.

  • Consider non-functional requirements (how the software will behave) as well as functional requirements (what the software is supposed to do).

  • The environment in which users run our software has an effect on many design choices we might make.

  • Consider the expected longevity of any code before you write it.

  • The perspective and language of a particular requirement stakeholder group should be reflected in requirements for that group.

Software Architecture and Design
  • ‘Good’ code is designed to be maintainable: readable by people who did not author the code, testable through a set of automated tests, adaptable to new requirements.

  • Use abstraction and decoupling to logically separate the different aspects of your software within design as well as implementation.

  • Use refactoring to improve existing code to improve its consistency internally and within its overall architecture.

  • Include software design as a key stage in the lifecycle of your project so that development and maintenance becomes easier.

Code Decoupling & Abstractions
  • Code decoupling is separating code into smaller components and reducing the interdependence between them so that the code is easier to understand, test and maintain.

  • Abstractions can hide certain details of the code behind classes and interfaces.

  • Encapsulation bundles data into a structured component along with methods that operate on the data, and provides a mechanism for restricting access to that data, hiding the internal representation of the component.

  • Polymorphism describes the provision of a single interface to entities of different types, or the use of a single symbol to represent different types.

Code Refactoring
  • Code refactoring is a technique for improving the structure of existing code.

  • Implementing regression tests before refactoring gives you confidence that your changes have not broken the code.

  • Using pure functions that process data without side effects whenever possible makes the code easier to understand, test and maintain.

Software Architecture Revisited
  • Sometimes new, contributed code needs refactoring for it to fit within an existing codebase.

  • Try to leave the code in a better state that you found it.

Section 4: Collaborative Software Development for Reuse
  • Agreeing on a set of best practices within a software development team will help to improve your software’s understandability, extensibility, testability, reusability and overall sustainability.

Developing Software In a Team: Code Review
  • Code review is a team software quality assurance practice where team members look at parts of the codebase in order to improve their code’s readability, understandability, quality and maintainability.

  • It is important to agree on a set of best practices and establish a code review process in a team to help to sustain a good, stable and maintainable code for many years.

Preparing Software for Reuse and Release
  • The reuse battle is won before it is fought. Select and use good practices consistently throughout development and not just at the end.

Packaging Code for Release and Distribution
  • Poetry allows us to produce an installable package and upload it to a package repository.

  • Making our software installable with Pip makes it easier for others to start using it.

  • For complete control over building a package, we can use a setup.py file.

Section 5: Managing and Improving Software Over Its Lifetime
  • For software to succeed it needs to be managed as well as developed.

  • Estimating the effort to deliver work items is a foundational tool for prioritising that work.

Managing a Collaborative Software Project
  • We should use GitHub’s Issues to keep track of software problems and other requests for change - even if we are the only developer and user.

  • GitHub’s Mentions play an important part in communicating between collaborators and is used as a way of alerting team members of activities and referencing one issue/pull requests/comment/commit from another.

  • Without a good project and issue management framework, it can be hard to keep track of what’s done, or what needs doing, and particularly difficult to convey that to others in the team or sharing the responsibilities.

Assessing Software for Suitability and Improvement
  • It’s as important to have a critical attitude to adopting software as we do to developing it.

  • As a team agree on who and to what extent you will support software you make available to others.

Software Improvement Through Feedback
  • Prioritisation is a key tool in academia where research goals can change and software development is often given short shrift.

  • In order to prioritise things to do we must first estimate the effort required to do them.

  • For accurate effort estimation, it should be done by the people who will actually do the work.

  • Aim to reduce cognitive biases in effort estimation by being honest about your abilities.

  • Ask other team members - or do estimation as a team - to help make accurate estimates.

  • MoSCoW is a useful technique for prioritising work to help ensure projects deliver successfully.

  • Aim for a 60%/20%/20% ratio of Must Haves/Should Haves/Could Haves for requirements within a timebox.

Wrap-up
  • Collaborative techniques and tools play an important part of research software development in teams.

Glossary