Content from Lesson 1: Integrated Development Environments
Last updated on 2025-04-17 | Edit this page
Overview
Questions
- What is an Integrated Development Environment (IDE)?
- Why use an IDE?
- What are common IDE features that code developers find extremely useful?
- What are some commonly used IDEs among the RSE community?
Objectives
- Explain what an Integrated Development Environment (IDE) is and list a number of IDEs used by RSEs
- List common IDE features and explain how they contribute to easier and quicker code development
What is an Integrated Development Environment (IDE)?
An Integrated Development Environment (IDE) is a software application that provides a comprehensive workspace for writing, editing, testing, and debugging code—all in one place. It combines several tools that developers need into a single interface to streamline the code development process. IDEs are extremely useful and modern software development would be very hard without them.
Why use an IDE?
An IDE brings everything you need to write, test, and debug code into one place — saving time by helping you write better code faster. IDEs help by:
- reducing setup and development time - everything you need for editing, running, and debugging code is in one place and the need to switch between different tools/applications/windows is significantly reduced
- offering helpful tools like syntax checking, code suggestions and autocomplete, and error checking leading to fewer errors thanks to real-time feedback and intelligent suggestions
- making it easier to debug and test code leading to easier issue detection and fixing
- providing a consistent environment across projects
For beginners, IDEs lower the barrier to entry by making it easier to spot mistakes and understand code structure. For experienced developers, IDEs boost productivity and streamline complex workflows.
Common IDE Features
The following is a list of the most commonly seen IDE features:
- code editting - providing a place to write and edit code
- syntax highlighting - to show programming language constructs, keywords and the syntax errors with visually distinct colours and font effects for better readability
- code completion - to speed up programming by suggesting a set of possible (syntactically correct) code options
- code search - to find package, class, function and variable declarations, their usages and referencing
- version control support - to interact with source code repositories
- debugging support - for setting breakpoints in the code editor, step-by-step execution of code and inspection of variables to help find and fix bugs in code
- integrated terminal - to run commands without leaving the IDE
- project/file explorer - to navigate the code project structure easily
Popular IDEs
Here are a few widely used IDEs across different languages and fields:
- Visual Studio Code (VS Code) – lightweight and highly customisable; supports many languages
- PyCharm – great for Python development
- RStudio – designed specifically for R programming
- Eclipse – often used for Java and other compiled languages
- JupyterLab – interactive environment for Python and data science
- Spyder – popular among scientists using Python
What is Code Debugging?
Code debugging is the process of identifying, isolating, and fixing errors or bugs in your software or script. Bugs can manifest as unexpected behavior, crashes, or incorrect outputs. Debugging is an essential step in software development, ensuring that your code runs as intended and meets quality standards.
Why Debugging Matters?
Even small mistakes in code can cause unexpected behavior or crashes. Debugging helps with:
- code correctness - debugging ensures your program works as expected and meets requirements
- error resolution - debugging helps you understand why your code isn’t performing correctly, allowing you to find and fix issues that make your program behave incorrectly rather than just guessing.
- improving code quality - regular debugging leads to cleaner, more reliable and performant code and reduces the risk of problems in production.
- efficient code development - familiarity with debugging tools and techniques can significantly reduce the time spent on troubleshooting and enhance overall productivity.
Common Debugging Techniques
- Adding print statements at certain points in the code to print and trace variable values and check code flow is one of the simplest debugging methods
- Using a debugger integrated in your IDE allows you to set breakpoints, step through your code line by line, and inspect variables at runtime
- Incorporating logging into your application can provide insights into its behavior over time and help diagnose issues that occur in specific runtime conditions
- Writing tests that automatically check your code’s functionality can help catch bugs early in the development process
Content from 1.1 Setup & Prerequisites
Last updated on 2025-04-17 | Edit this page
Overview
Questions
- What prerequiste knowledge is required to follow this topic?
- How to setup your machine to follow this topic?
Objectives
- Understand what prerequiste knowledge is needed before following this topic
- Setup your machine to follow this topic
Prerequisite
- Familiarity with using any code editor and navigating a filesystem structure
Setup
The hands-on part of this topic will be conducted using Visual Studio Code (VSCode), a widely used IDE. Please download the appropriate version of Visual Studio Code for your operating system (Windows, macOS, or Linux) and system architecture (e.g., 64-bit, ARM).
Content from 1.2 Getting Started with VSCode
Last updated on 2025-04-08 | Edit this page
Overview
Questions
- FIXME
Objectives
- FIXME
Running VSCode
Let’s start by running VSCode now on our machines, so run it now. How you run VSCode will differ depending on which operating system you have installed.
TODO: add screenshot of opening vscode for first time
The first thing you’ll likely see is a “Welcome” page. You may find it asks you which kind of theme you’d like - you can select from either a dark or light theme.
Navigating Around VSCode
So let’s take a look at the application. You’ll see some icons on the left side, which give you access to its key features. Hovering your mouse over each one will show a tooltip that names that feature:
TODO: add screenshot with highlighted icons for what we’ll cover here
Explorer
- the top one is a file navigator, or explorer - we can use this to open existing folders containing program files.Search
- the next one down is a search capability, so you can search for things (and replace them with other text) over your code files.-
Source control
- this gives you access to source code control, which includes Git version control functionality. This feature means you can do things like clone Git repositories (for example, from GitHub), add and commit files to a repository, things like that.Callout
If you’re not familiar with Git, that’s totally fine - you don’t have to use this feature, although it’s worth looking into using version control for writing your code. Version control Systems like Git allow you to manage your code by storing it - and all the changes you make to it - within a repository hosted elsewhere, for example, on GitHub.
Run and Debug
- this allows you to run programs you write in a special way with a debugger, which allows you to check the state of your program as it is running, which is very useful and we’ll look into later.Extensions
- which we’ll look into right now, allows you to install extensions to VSCode to extend its functionality in some way.
There are many other features and ways to access them, and we’ll cover key ones throughout this lesson.
Installing Extensions
Extensions are a major strength of VSCode. Whilst VSCode appears quite lightweight, and presents a simple interface (particularly compared to many other IDEs!), this is quite deceptive. You can extend its functionality in many different ways. or example, installing support for other languages, greater support for version control, there’s even support for working with databases, and so on. There are literally tens of thousands of possible extensions now.
Now VSCode already comes with built-in support for JavaScript, including TypeScript and node.js, but also has extensions for other languages too (C++, C#, Java, PHP, Go, and many others). Installing a language extension will allow you to do more things with that particular language in VSCode, as we’ll see now.
Let’s install an extension now:
- Firstly, select the extensions icon first, then type in Python into the search box at the top, and it’ll give you a list of all python-related extensions.
- Select the one which says
Python
from Microsoft. This is the Microsoft official Python extension. - Then select
Install
.
It might take a minute - you can see a sliding blue line in the top left to indicate it’s working. Once complete, you should see a couple of “Welcome” windows introducing you to two of its key features - support for Python and Jupyter notebooks. If you use Jupyter notebooks, which is a way of writing Python programs that you can run line by line from within an editor as you write the program, you may find this useful.
For now, let’s configure this extension for our Python development,
and to do that, we need to do is tell VSCode which Python installation
on our machine we’d like it to use. In the Python Welcome window, select
Select a Python interpreter
, and then
Select Python interpreter
. You may find you have many
installations of Python, or only have one. Try to select one later than
3.8 if you can. Then select Mark done
, and close the
welcome windows.
A Sample Project
FIXME: copy code-style-example repo to softwaresaved’s organisation
Next, let’s obtain some example Python and edit it from within
VSCode. So first, you can download the example code we’ll use from https://github.com/UNIVERSE-HPC/code-style-example/releases/tag/v1.0.0,
either as a .zip
or .tar.gz
compressed archive
file. If you’re unsure, download the .zip
file. Then,
extract all the files from the archive into a convenient location. You
should see files contained within a new directory named
code-style-example-1.0.0
.
Now we need to load the code into VSCode to see it. You can do this in a couple of ways, either:
- Select the
Source control
icon from the middle of the icons on the left navigation bar. You should see anOpen Folder
option, so select that. - Select the
File
option from the top menu bar, and selectOpen Folder...
.
In either case, you should then be able to use the file browser to
locate the directory with the files you just extracted, and then select
Open
. Note that we’re looking for the folder that
contains the files, not a specific file.
What about using Git Version Control?
If your system has the Git version control system installed, you may
see a Clone Repository
option here too. If you are familiar
with Git and wish to use this option instead, select this option instead
and enter the repository’s location as
https://github.com/UNIVERSE-HPC/code-style-example
. Then
use the file browser that is presented to find a convenient location to
store the cloned code and click on
Select as Repository Destination
, then select
Open
when ‘Would you like to open the cloned repository?’
appears.
You’ll then likely be presented with a window asking whether you trust the authors of this code. In general, it’s a good idea to be at least a little wary, since you’re obtaining code from the internet, so be sure to check your sources! Be careful here - I found on Windows the “Trust” option appears on the left, whilst on Mac, it appears on the right! In this case, feel free to trust the repository! You’ll then see the explorer present you with some files in a small window (or pane) on the left you can use to navigate and find files.
So far within VSCode we have downloaded some code from a repository and opened a folder. Whenever we open a folder in VSCode, this is referred to as a “Workspace” - essentially, a collection of a project’s files and directories. So within this workspace, you’ll see the following:
- A
data
folder, containing a single data file (click on it to see the data file within it). - Two files, a
climate_analysis.py
Python file, and aLICENSE.md
file
So next, let’s look at editing code.
Key Points
- FIXME
Content from 1.3 Using the Code Editor
Last updated on 2025-04-08 | Edit this page
Overview
Questions
- FIXME
Objectives
- FIXME
Now we’ve acquainted ourselves with running VSCode, let’s take a look
at our example code. Select the climate_analysis.py
file in
the explorer window, which will bring up the contents of the file in the
code editor.
The File Explorer has Disappeared!
You may find, perhaps on reopening VSCode, that the explorer is no
longer visible. In this case, you can select Explorer
from
the sidebar to bring it back up again, and if you don’t currently have a
workspace loaded, you can select Open Folder
to select the
code folder.
Note that as an example, it’s deliberately written to have flaws. Things like the line spacing is inconsistent, there are no code comments, there’s a variable that’s not used, and you may spot other issues too. But in essence, the code is designed to do the following:
- Open a file in the CSV (comma separated value) format
- Go through the file line by line, and:
- If the line begins with a
#
symbol, ignore it. - Otherwise, extract the fourth column (which is in Fahrenheit), convert it to Celsius and Kelvin, and output those readings.
- If the line begins with a
Let’s take a look at some of what the code editor gives us.
Syntax Highlighting
You’ll notice that the Python syntax is being highlighted for us, which helps readability.
FIXME: add screenshot of code editor with syntax highlighting of code example
Here, it uses colour to distinguish the various parts of our program. Functions are yellow, Python statements are purple, variables are light blue, and strings are this reddy-orange, and so on. Which, perhaps unsurprisingly, is a feature known as Syntax Highlighting, and it’s possible to edit the colour scheme to your liking for a particular language if you like, although we won’t go into that now.
This is really handy to give you immediate feedback on what you are
typing, and can help you to identify common syntax mistakes. For
example, deleting the closing parenthesis on open
- the
opening one goes red, with a squiggly line underneath, indicating an
issue.
So this is great, and helps us understand what we are writing, and highlights some mistakes.
Code Completion
Something that’s also useful is VSCode’s ability (via the Python and Pylance extensions) to help you write and format your code whilst you’re typing.
For example, on a blank line somewhere, enter
for x in something:
.
On the next line, we can see that it’s automatically indented it for us, knowing that we’re inside a loop.
Another really helpful feature is something known as code completion
(in VSCode, this is referred to as IntelliSense). This is a great time
saver, and a really useful feature of IDEs. Essentially, as you type, it
works out the context of what you are doing, and gives you hints. For
example, if we start typing a variable we’ve already defined, for
example climate_data
, we can see that it’s zeroing in as we
type on the options for what we might be trying to type. When we see
climate_data
, we can press Tab
to complete it
for us. As another example, if we wanted to open another file, we might
type new_file = open(
. In this case, it provides
information on the file open
function and its arguments,
along with a description of what it does. This is really handy to we
don’t have to take the time to look up all this information up on the
web, for example.
Code Linting
FIXME: intro to code linter (add to intro section)
In the introduction we covered code linting tools, which go even
further that syntax highlighting to analyse and identify deeper issues
with our code. The good news is that we can install a Python linter in
VSCode to give us this code analysis functionality, by installing a
linter extension. As before, select the Extensions
icon and
this time search for Pylint
, the one by Microsoft, and
click Install
.
What is Pylint?
Pylint is a tool that can be run from the command line or via IDEs like VSCode, which can help our code in many ways:
- Ensure consistent code style : whilst in-IDE context-sensitive highlighting such as that provided by VSCode, it helps us stay consistent with established code style standards such as (PEP 8) as we write code by highlighting infractions.
- Perform basic error detection: Pylint can look for certain Python type errors.
- Check variable naming conventions: Pylint often goes beyond PEP 8 to include other common conventions, such as naming variables outside of functions in upper case.
- Customisation: you can specify which errors and conventions you wish to check for, and those you wish to ignore.
Going back to our code you should now find lots of squiggly underlines of various colours.
I don’t see any Squiggly Underlines!!
If you happen to not see any squiggly underlines in the editor, it
could be the linter extension hasn’t looked at your code yet. In order
to trigger the linter to show us further issues, try saving the file to
trigger the linter to do this. So go to File
then
Save
on the menu bar, and you should now see a lot of
squiggly underlines in the code.
These squiggly lines indicate an issue, and by hovering over them, we
can see details of the issue. For example, by hovering over the
variables shift
or comment
- we can see that
the variable names don’t conform to what’s known as an
UPPER_CASE
naming convention. Simply, the linter has
identified these variables as constants, and typically, these are in
upper case. We should rename them, e.g. SHIFT
and
COMMENT
. But following this, we also need to update the
reference to comment
in the code so it’s also upper case.
Now if we save the file selecting File
then
Save
, we should see the linter rerun, and those highlighted
issues disappear.
We can also see a comprehensive list of all the issues found, by
opening a code Problems
window. In the menu, go to
View
then Problems
, and then you’ll see a
complete list of issues which we can work on displayed in the pane at
the bottom of the code editor. We don’t have to address them, of course,
but by following them we bring our code style closer to a commonly
accepted and consistent form of Python.
The linter also picks up things like functions that don’t have
docstrings which we’ll take a look at. For now, we can close the
Problems
pane.
Need a Thing? Install an Extension!
As we just saw, included in the list of issues with our code was the lack of docstrings. If we want to write good code, we should be adding code comments, including docstrings for our functions, methods, and modules.
Let’s try and find an extension that might help us with writing
docstrings. Select the Extensions
icon, and type
docstring
- you should see an autoDocstring
extension by Nils Werner at the top. Select that, and you’ll see a page
outlining what it is Also note via the number of downloads that it’s
very widely used.
What’s really handy is the little video that shows us what it does
This looks exactly like what we’re after! Select
Install
.
Now, when we go to a function for example FahrToCelsius
,
go to the next line, and add """
, we’ll see a small pop-up
to add a docstring. Press Tab
to do so.
FIXME: add screenshot snippet showing docstring boilerplate being added
It does all the hard work of adding in the structure of a docstring for us, so we just need to fill in the blanks. This is another good example of us realising it would be nice to have something to help us, searching for an extension, and trying it out.
Using a Git Code Repository?
For those of you familiar with version control and who retrieved the example code via cloning its repository instead of downloading it, there are some other editor features that help with using version control. One of these is that the filename changes colours in the file explorer depending on its status within version control:
- White - an existing file is unchanged from the copy in the local repository).
- Orange - the content of an existing file has changed, and the change(s) have not been tracked by version control yet.
- Green - a new file has been added and is unknown to version control.
So at a glance, you can get an idea of what’s changed since your last commit.
Summary
So in summary, many of these editing features are typical of IDEs in general, and the great thing is that they are really helpful at saving us time. Things like syntax highlighting, code completion, automatic code formatting and inserting docstrings, may not seem like much, but it all adds up!
Key Points
- FIXME
Content from 1.4 Running and Debugging Code
Last updated on 2025-04-08 | Edit this page
Overview
Questions
- FIXME
Objectives
- FIXME
Running Python in VSCode
Now let’s try running a Python file. First, make sure your Python doesn’t have any errors! Then, select the “Play”-looking icon at the top right of the code editor.
FIXME: screenshot snippet of the play icon?
You should see the program run, and output displayed in a pop-up teminal window at the bottom:
OUTPUT
steve@laptop:~/code-style-example$ /bin/python3 /home/steve/code-style-example/climate_analysis.py
Max temperature in Celsius 14.73888888888889 Kelvin 287.88888888888886
Max temperature in Celsius 14.777777777777779 Kelvin 287.92777777777775
Max temperature in Celsius 14.61111111111111 Kelvin 287.76111111111106
Max temperature in Celsius 13.838888888888887 Kelvin 286.9888888888889
Max temperature in Celsius 15.477777777777778 Kelvin 288.62777777777774
Max temperature in Celsius 14.972222222222225 Kelvin 288.1222222222222
Max temperature in Celsius 14.85 Kelvin 288.0
Max temperature in Celsius 16.33888888888889 Kelvin 289.4888888888889
Max temperature in Celsius 16.261111111111113 Kelvin 289.4111111111111
Max temperature in Celsius 16.33888888888889 Kelvin 289.4888888888889
steve@laptop:~/code-style-example$
Error:
the term conda is not recognised
If you’re running an Anaconda distribution of Python on Windows, if you see this error it means that VSCode is not looking in the right place for Anaconda’s installation. In this case, you may need to configure VSCode accordingly.
VSCode has a sophisticated method to access it’s inner functionality
known as the Command Palette, which we’ll use to address this. Activate
the Command Paletter by pressing Ctrl
+ Shift
+ P
simultaneously, then type
Terminal: Select Default Profile
. From the options, select
Command Prompt C:\WINDOWS\...
, and hopefully that should
resolve the issue.
The pop-up window is known as the “Console”, and essentially is a
terminal, or command prompt, where the program is run. You’ll notice we
can also type in commands here too. For example in Windows, you could
type dir
, on Mac or Linux you could type ls
-
to get a listing of files, for example.
We can also close this terminal/console at any time, and start a new
one by selecting Terminal
from the menu and selecting
New Terminal
. So when we write and run our code, we have
the option of never having to leave VSCode at all for most things.
Debugging Code
Now finally, let’s look at a feature with IDEs which is often overlooked, that of the debugger.
A debugger is a bit like performing exploratory surgery on a patient. You know there’s something wrong, but you don’t know exactly where the problem resides. What’s useful with debuggers is that you go looking within the codebase as it’s actually running to find the source of the problem.
In order to run a debugging session we first need to tell the IDE where we’d like to examine the code. Then you run the code in a special way, using a debugger, and it pauses the execution of the code at that point. You then have the freedom to take a look around and examine the state of variables, which functions have been called up until this point, and so on, and hopefully identify the cause of the issue.
Now, many people when starting out with coding disregard debuggers as complicated and tough to understand. And 30 or 40 years ago, debuggers were indeed quite complicated to set up and use. But these days, debuggers are perhaps a little more straightforward, with IDEs doing a lot of complex stuff for us.
Introducing a Problem
Let’s assume we have a problem with our code - by introducing one. In
our climate_analysis.py
code, where it says
if data[0][0] != COMMENT
, replace COMMENT
with
'!'
. We perhaps might assume one of our colleagues
erroneously made this change, but we haven’t spotted it yet. We try to
run the code as before, and now it doesn’t work. We get a
ValueError
, which informs us it couldn’t perform a
conversion of a value extracted from the data file to a float as part of
its temperature conversion.
Adding a Debugging Breakpoint
Now we know where the error is occurring but we don’t know the source
of the problem, which may not be in the same place. So let’s add in what
is known as a breakpoint to our code. This is where the
debugger will stop running the code and pause for us Let’s add it at the
start of the for line in climate_data:
line. We do that by
clicking in the left margin for that line. By hovering in the margin,
you’ll see a faded red dot appear. Select it on that line and this sets
the breakpoint.
Using the Debugger
Let’s run the code using the debugger. Select the
Run and Debug
icon on the left, and select
Run and Debug
. Then it will likely ask two questions in
pop-up pane near the top:
- It asks you to
Select debugger
, so select the suggestedPython Debugger
. - Then it asks you to
Select a debug configuration
, so selectPython File
to debug the current file.
Now the Python script is running in debug mode. You’ll see the execution has paused on the line we entered the breakpoint, which is now highlighted., Some new information is now displayed in various panes on the left of the code editor. In particular:
FIXME: show screenshot of debugging panes (esp. variables and call stack)
-
VARIABLES
- on the left, we can see a list of variables, and their current state, at this point in the script’s execution, such asCOMMENT
andSHIFT
, andclimate_data
(which is a reference to our open data file). We don’t have many at the moment. It also distinguishes between local variables and global variables - this is to do with the “scope” of the variables, as to how they are accessible from this point in the running of the code. Global variables can be seen from anywhere in the script. And local variables are those that are visible from this point of the program. If we were within a function here, we would see variables that are defined and only used within that function as local variables only. For example, if we set a breakpoint within theFahrToKelvin
function, we would seekelvin
as a local variable, but it wouldn’t be listed as a global variable. -
CALL STACK
- this is a record of the journey the script has taken, in terms of functions called, to get to this position in its execution. It shows us that we are at the top level of our script, which makes sense, since our breakpoint is at the top level of script, and not within any function. If it were within theFahrToKelvin
function, for example, we’d see that added to the call stack. It also shows us the line number where execution has paused at this level of the call stack.
Now, we can also see some new icons at the top to do with debugging:
FIXME: show screenshot snippet of debugging icons
- The first one is continue, which allows the script to keep running until the next breakpoint.
- The next one allows us to step over - or through - the script one statement at a time.
- The next two allow us to choose to step into or out of a function call, which is interesting. If we want to examine the inner workings of a function during this debug session, we can do that.
- The green cycle one is to restart the debug process.
- The red cross stops debugging completely.
So let’s step through our code by selecting the second icon and see
what happens. As we do so, we can see the variable state changing. By
looking in the variables section, we can see that the line
variable contains the first line read from the data file. On the next
step, we’ve reached the if
statement. If we step again, and
then again, our program halts because it’s run into the problem we saw
before.
This tells us something useful - that the problem occurs in the first
iteration of the loop. So, this implies, the problem might be with the
first line of data being processed, since the Python is going through
the data file line by line. If we re-run the debugger, we can go through
this process again. And we can see something interesting when we get to
the if
statement. From the code, we know that the if
statement is looking for an exclamation mark at the beginning of the
line to indicate a comment. However, the data variable contains a ‘#’ as
the first character instead. Therefore, in this case, the code will
assume it’s a data line and attempt to process it as such. And then it
will fail with the exception we saw before.
Fixing the Issue
So now we’ve identified the problem, we can fix it.
Firstly, stop the debug process by selecting the red square. Then
edit the if
line to search for COMMENT
instead, reverting the code to what it was before. We can then rerun the
debugger if we wish, to check our understanding. And as we step through
the code, we can see if correctly identifies the first line as a
comment, and ignores it, continuing to the iteration of the for loop,
and the next line of data. Now we have our solution fixed, we can stop
the debugger again.
We’ve now solved our problem, so we should remove the breakpoint. Running our code again as normal, we can see it now works as expected.
Debugging in Context
Typically, we’d use debugging when we’ve discovered a problem. Other techniques, such as testing, are great at identifying that there are problems, but not always the root cause and location of the actual problem. Debugging is the next step of that process. Sometimes, we discover a problem - perhaps our code testing show us there’s an issue, or maybe we find out some other way. If we’re lucky, we can identify and fix the problem quickly. Where we can’t, debugging is there to help us. With particularly complex programs, it can be very difficult to reason about how they work, and where the problem are, and debugging allows us to pick apart that process, and step by step, help us find the source of those issues.
Key Points
- FIXME
Content from Lesson 2: Code Style & Linting
Last updated on 2025-03-21 | Edit this page
Overview
Questions
- FIXME
Objectives
- FIXME
Intro to code style
- Why does code style matter?
- Key styling practices, conventions and specifications
- Maintain your code quality to reduce bugs and errors
Intro to linters
- What is a linter and why use one?
- Example linting tools (e.g. in Python)
- Taking linting automation further using continuous integration
Content from 2.1 Setup
Last updated on 2025-02-28 | Edit this page
Setup
Overview
Questions
- FIXME
Objectives
- FIXME
Content from 2.2 Some Example Code
Last updated on 2025-04-08 | Edit this page
Overview
Questions
- FIXME
Objectives
- FIXME
Obtaining Some Example Code
FIXME: copy code-style-example into softwaresaved org
For this lesson we’ll be using some example code available on GitHub, which we’ll clone onto our machines using the Bash shell. So firstly open a Bash shell (via Git Bash in Windows or Terminal on a Mac). Then, on the command line, navigate to where you’d like the example code to reside, and use Git to clone it. For example, to clone the repository in our home directory, and change our directory to the repository contents:
Examining the Code
Next, let’s take a look at the code, which is in the root directory
of the repository in a file called climate_analysis.py
.
PYTHON
import string
shift = 3
comment = '#'
climate_data = open('data/sc_climate_data_10.csv', 'r')
def FahrToCelsius(fahr):
celsius = ((fahr - 32) * (5/9))
return celsius
def FahrToKelvin(fahr):
kelvin = FahrToCelsius(fahr) + 273.15
return kelvin
for line in climate_data:
data = line.split(',')
if data[0][0] != comment:
fahr = float(data[3])
celsius = FahrToCelsius(fahr)
kelvin = FahrToKelvin(fahr)
print('Max temperature in Celsius', celsius, 'Kelvin', kelvin)
The code is designed to process temperature data from a separate data file. The code reads in data line by line from the data file, and prints out fahrenheit temperatures in both celsius and kelvin.
The code expects to find the data file
sc_climate_data_10.csv
(formatted in the Comma Separated
Value CSV format) in the data
directory, and looks like
this:
# POINT_X,POINT_Y,Min_temp_Jul_F,Max_temp_jul_F,Rainfall_jul_inch
461196.8188,1198890.052,47.77,58.53,0.76
436196.8188,1191890.052,47.93,58.60,0.83
445196.8188,1168890.052,47.93,58.30,0.74
450196.8188,1144890.052,48.97,56.91,0.66
329196.8188,1034890.052,49.26,59.86,0.78
359196.8188,1017890.052,49.39,58.95,0.70
338196.8188,1011890.052,49.28,58.73,0.74
321196.8188,981890.0521,48.20,61.41,0.72
296196.8188,974890.0521,48.07,61.27,0.78
299196.8188,972890.0521,48.07,61.41,0.78
It contains a number of lines, each containing a number of values, each separated by a comma. There’s also a comment line at the top, to tell us what each column represents.
Now let’s take a look at the Python code, using any text or code
editor you like to open the file. You can also use nano
if
you’d prefer to use the command line, e.g.
The code opens the data file, and also defines some functions to do two temperature conversions from Fahrenheit to Celsius and Fahrenheit to Kelvin. Note that for the purposes of this lesson, the code is deliberately written to contain some issues!
Why Write Readable Code?
QUESTION: who has seen or used code that looks like this? Yes/No? QUESTION: who has written code like this? Yes/No
No one writes great code that’s readable, well formatted, and well designed all the time. Sometimes you often need to explore ideas with code to understand how the code should be designed, and this typically involves trying things out first. But… the key is that once you understand how to do something, it’s a good idea to make sure it’s readable and understandable by other people, which may includes a future version of yourself, 6 months into the future. So it’s really helpful to end up with good clean code so yit’s easier to understand.
Another key benefit to writing “cleaner” code is that its generally easier to extend and otherwise modify in the future. When code is initially written it’s often impossible to tell if it will be reused in some way elsewhere. A familiar scenario is that you stop developing a piece of code for a while, and put it to one side. Maybe it’s not needed any more, or perhaps a project has finished. You forget about it, until suddenly, there’s a need to use the code again. Maybe all of it needs to be reused in another project, or maybe just a part of it. However, you come back to your code, and it’s a mess you can’t understand. But by spending a little time now to write good code while you understand it, you can save yourself (and possibly others) a lot of time later!
Does my Code Smell?
Developers sometimes talk about “code smells”. Code smells are cursory indications from looking at the source code that a piece of code may have some deeper issues. And looking at this code, it smells pretty terrible. For example, we can see that there is inconsistent spacing, with lines bunched together in some places, and very spread out in others. This doesn’t engender a great deal of confidence that the code will work as we expect, and it raises the question that if the style of the code appears rushed, what else has been rushed? How about the design of the code? Something to bear in mind when writing code!
Running the Example Code
Now despite the issues with the code, does it work? Let’s find out. So in the shell, in the root directory of the repository:
OUTPUT
Max temperature in Celsius 14.73888888888889 Kelvin 287.88888888888886
Max temperature in Celsius 14.777777777777779 Kelvin 287.92777777777775
Max temperature in Celsius 14.61111111111111 Kelvin 287.76111111111106
Max temperature in Celsius 13.838888888888887 Kelvin 286.9888888888889
Max temperature in Celsius 15.477777777777778 Kelvin 288.62777777777774
Max temperature in Celsius 14.972222222222225 Kelvin 288.1222222222222
Max temperature in Celsius 14.85 Kelvin 288.0
Max temperature in Celsius 16.33888888888889 Kelvin 289.4888888888889
Max temperature in Celsius 16.261111111111113 Kelvin 289.4111111111111
Max temperature in Celsius 16.33888888888889 Kelvin 289.4888888888889
And we can see that the code does indeed appear to work, with celsius and kelvin values being printed to the terminal. But how can we improve its readability? We’ll use a special tool, called a code linter, to help us identify these sorts of issues with the code.
Key Points
- FIXME
Content from 2.3 Analysing Code using a Linter
Last updated on 2025-04-08 | Edit this page
Overview
Questions
- FIXME
Objectives
- FIXME
Installing a Code Linter
The first thing we need to do is install pylint, a very well established tool for statically analysing Python code.
Now fortunately, pylint can be installed as a Python package, and we’re going to create what’s known as a virtual environment to hold this installation of pylint.
QUESTION: who has installed a Python package before, using the program pip? Yes/No QUESTION: who has created and used a Python virtual environment before? Yes/No
Benefits of Virtual Environments
Virtual environments are an indispensible tool for managing package dependencies across multiple projects, and could be a whole topic itself. In the case of Python, the idea is that instead of installing Python packages at the level of our machine’s Python installation, which we could do, we’re going to install them within their own “container”, which is separate to the machine’s Python installation. Then we’ll run our Python code only using packages within that virtual environment.
There are a number of key benefits to using virtual environments:
- It creates a clear separation between the packages we use for this project, and the packages we use other projects.
- We don’t end up with a machine’s Python installation containing a clutter of a thousand different packages, where determining which packages are used for which project often becomes very time consuming and prone to error.
- Since we are sure what our code actually needs as dependencies, it becomes much easier for someone else (which could be a future version of ourselves) to know what these dependencies are and install them to use our code.
- Virtual environments are not limited to Python; for example there are similar tools for available for Ruby, Java and JavaScript.
Setting up a Virtual Environment
Let’s now create a Python virtual environment and make use of it. Make sure you’re in the root directory of the repository, then type
Here, we’re using the built-on Python venv module - short for virtual
environment - to create a virtual environment directory called
venv
. We could have called the directory anything, but
naming it venv
(or .venv
) is a common
convention, as is creating it within the repository root directory. This
makes sure the virtual environment is closely associated with this
project, and not easily confused with another.
Once created, we can activate it so it’s the one in use:
BASH
[Linux] source venv/bin/activate
[Mac] source venv/bin/activate
[Windows] source venv/Scripts/activate
You should notice the prompt changes to reflect that the virtual environment is active, which is a handy reminder. For example:
OUTPUT
(venv) $
QUESTION: who has successfully created and activated their virtual environment? Yes/No?
Now it’s created, let’s take a look at what’s in this virtual environment at this point.
OUTPUT
Package Version
---------- -------
pip 22.0.2
setuptools 59.6.0
We can see this is essentially empty, aside from some default packages that are always installed. Note that whilst within this virtual environment, we no longer have access to any globally installed Python packages.
Installing Pylint into our Virtual Environment
The next thing we can do is install any packages needed for this codebase. As it turns out, there isn’t any needed for the code itself, but we wish to use pylint, and that’s a python package. So we can install pylint into our virtual environment:
Now if we check the packages, we see:
OUTPUT
Package Version
----------------- -------
astroid 3.3.9
dill 0.3.9
isort 6.0.1
mccabe 0.7.0
pip 22.0.2
platformdirs 4.3.7
pylint 3.3.6
setuptools 59.6.0
tomli 2.2.1
tomlkit 0.13.2
typing_extensions 4.13.1
So in addition to pylint, we see a number of other dependent packages installed that are required by it.
We can also deactivate our virtual environment:
You should see the (venv)
prefix disappear, indicating
we have returned to our global Python environment. Let’s reactivate it
since we’ll need it to use pylint.
Analysing our Code using a Linter
Let’s point pylint at our code and see what it reports:
We run this, and it gives us a report containing issues it has found with the code, and also an overall score.
OUTPUT
************* Module climate_analysis
climate_analysis.py:9:35: C0303: Trailing whitespace (trailing-whitespace)
climate_analysis.py:9:0: C0325: Unnecessary parens after '=' keyword (superfluous-parens)
climate_analysis.py:1:0: C0114: Missing module docstring (missing-module-docstring)
climate_analysis.py:4:0: C0103: Constant name "shift" doesn't conform to UPPER_CASE naming style (invalid-name)
climate_analysis.py:5:0: C0103: Constant name "comment" doesn't conform to UPPER_CASE naming style (invalid-name)
climate_analysis.py:6:15: W1514: Using open without explicitly specifying an encoding (unspecified-encoding)
climate_analysis.py:8:0: C0116: Missing function or method docstring (missing-function-docstring)
climate_analysis.py:8:0: C0103: Function name "FahrToCelsius" doesn't conform to snake_case naming style (invalid-name)
climate_analysis.py:8:18: W0621: Redefining name 'fahr' from outer scope (line 20) (redefined-outer-name)
climate_analysis.py:9:4: W0621: Redefining name 'celsius' from outer scope (line 21) (redefined-outer-name)
climate_analysis.py:11:0: C0116: Missing function or method docstring (missing-function-docstring)
climate_analysis.py:11:0: C0103: Function name "FahrToKelvin" doesn't conform to snake_case naming style (invalid-name)
climate_analysis.py:11:17: W0621: Redefining name 'fahr' from outer scope (line 20) (redefined-outer-name)
climate_analysis.py:12:4: W0621: Redefining name 'kelvin' from outer scope (line 22) (redefined-outer-name)
climate_analysis.py:6:15: R1732: Consider using 'with' for resource-allocating operations (consider-using-with)
climate_analysis.py:1:0: W0611: Unused import string (unused-import)
------------------------------------------------------------------
Your code has been rated at 0.59/10 (previous run: 0.59/10, +0.00)
For each issue, it tells us:
- The filename
- The line number and text column the problem occurred
- An issue identifier (what type of issue it is)
- Some text describing this type of error (as well as a shortened form of the error type)
You’ll notice there’s also a score at the bottom, out of 10. Essentially, for every infraction, it deducts from an ideal score of 10. Note that it is perfectly possible to get a negative score, since it just keeps deducting from 10! But we can see here that our score appears very low - 0.59/10, and if we were to now resolve each of these issues in turn, we should get a perfect score.
Identifying and Fixing an Issue
We can also ask for more information on an issue identifier. For example, we can see at line 9, near column 35, there is a trailing whitespace
OUTPUT
:trailing-whitespace (C0303): *Trailing whitespace*
Used when there is whitespace between the end of a line and the newline. This
message belongs to the format checker.
Which is helpful if we need clarification on a particular message.
If we now edit the file, and go to line 9, column 35, we can see that there is an unnecessary space.
QUESTION: who’s managed to run pylint on the example code? Yes/No
Let’s fix this issue now by removing the space, save the changed file, and then re-run pylint on it.
OUTPUT
------------------------------------------------------------------
Your code has been rated at 1.18/10 (previous run: 0.59/10, +0.59)
And we see that the C0303
issue has disappeared and our
score has gone up! Note that it also gives us a comparison against our
last score.
As a gentle warning: it can get quite addictive to keep increasing your score, which might well be the point!
So looking at the issue identifiers, e.g. C0303
, what do
the C
, W
, R
prefix symbols
mean?
At the end, we can see a breakdown of what they mean:
-
I
is for informational messages -
C
is for a programming standards violation. Part of the code is not conforming to the normally accepted conventions of writing good code (e.g. things like variable or function naming) -
R
for a need to refactor, due to a “bad code smell” -
W
for warning - something that -
E
for error - so pylint think’s it’s spotted a bug (useful, but don’t depend on this to find errors!) -
F
for a fatal pylint error
So if we run it again on our code:
OUTPUT
************* Module climate_analysis
climate_analysis.py:9:0: C0325: Unnecessary parens after '=' keyword (superfluous-parens)
climate_analysis.py:1:0: C0114: Missing module docstring (missing-module-docstring)
climate_analysis.py:4:0: C0103: Constant name "shift" doesn't conform to UPPER_CASE naming style (invalid-name)
climate_analysis.py:5:0: C0103: Constant name "comment" doesn't conform to UPPER_CASE naming style (invalid-name)
climate_analysis.py:6:15: W1514: Using open without explicitly specifying an encoding (unspecified-encoding)
climate_analysis.py:8:0: C0116: Missing function or method docstring (missing-function-docstring)
climate_analysis.py:8:0: C0103: Function name "FahrToCelsius" doesn't conform to snake_case naming style (invalid-name)
climate_analysis.py:8:18: W0621: Redefining name 'fahr' from outer scope (line 20) (redefined-outer-name)
climate_analysis.py:9:4: W0621: Redefining name 'celsius' from outer scope (line 21) (redefined-outer-name)
climate_analysis.py:11:0: C0116: Missing function or method docstring (missing-function-docstring)
climate_analysis.py:11:0: C0103: Function name "FahrToKelvin" doesn't conform to snake_case naming style (invalid-name)
climate_analysis.py:11:17: W0621: Redefining name 'fahr' from outer scope (line 20) (redefined-outer-name)
climate_analysis.py:12:4: W0621: Redefining name 'kelvin' from outer scope (line 22) (redefined-outer-name)
climate_analysis.py:6:15: R1732: Consider using 'with' for resource-allocating operations (consider-using-with)
climate_analysis.py:1:0: W0611: Unused import string (unused-import)
------------------------------------------------------------------
Your code has been rated at 1.18/10 (previous run: 1.18/10, +0.00)
We can see that most of our issues are do to with coding conventions.
Key Points
- FIXME
Content from 2.4 Advanced Linting Features
Last updated on 2025-04-08 | Edit this page
Overview
Questions
- FIXME
Objectives
- FIXME
More Verbose Reporting
We can also obtain a more verbose report by adding
--reports y
to the command, which gives us a lot more
detail:
Here’s a part of that output:
OUTPUT
...
Messages
--------
+---------------------------+------------+
|message id |occurrences |
+===========================+============+
|redefined-outer-name |4 |
+---------------------------+------------+
|invalid-name |4 |
+---------------------------+------------+
|missing-function-docstring |2 |
+---------------------------+------------+
|unused-import |1 |
+---------------------------+------------+
|unspecified-encoding |1 |
+---------------------------+------------+
|superfluous-parens |1 |
+---------------------------+------------+
|missing-module-docstring |1 |
+---------------------------+------------+
|consider-using-with |1 |
+---------------------------+------------+
...
QUESTION: for those doing activity, who’s managed to run this command? YES/NO
It gives you some overall statistics, plus comparisons with the last time you ran it, on aspects such as:
- How many modules, classes, methods and functions were looked at
- Raw metrics (which we’ll look at in a minute)
- Extent of code duplication (none, which is good)
- Number of messages by category (again, we can see that it’s mainly convention issues)
- A sorted count of the messages we received
Looking at raw metrics, we can see that it breaks down our program into how many lines are code lines, python docstrings, standalone comments, and empty lines. This is very useful, since it gives us an idea of how well commented our code is. In this case - not very well commented at all! For normal comments, the usually accepted wisdom is to add them to explain why you are doing something, or perhaps to explain how necessarily complex code works, but not to explain the obvious, since clearly written code should do that itself.
Increasing our Pylint Score - Adding a Docstring
QUESTION: Who’s familiar with Python docstrings? Yes/No
Docstrings are a special kind of comment for a function, that explain what the function does, the parameters it expects, and what is returned. You can also write docstrings for classes, methods, and modules, but you should usually aim to add docstring comments to your code wherever you can, particularly for critical or complex functions.
Let’s add one to our code now, within the
fahr_to_celsius
function.
PYTHON
"""Convert fahrenheit to Celsius.
:param fahr: temperature in Fahrenheit
:returns: temperature in Celsius
"""
Re-run pylint - can see we have one less docstring error, and a slightly higher score.
If you’d like to know more about docstrings and commenting, there’s an in-depth RealPython tutorial on these and the different ways you can format them.
Ignoring Issues
We can instruct pylint to ignore any particular types of issues, which is useful if they are not seen as important or pedantic, or we need to see other types more clearly. For example, to ignore any unused imports:
Or, to disable all issues of type “warning”:
This can be particularly useful if we wish to ignore particularly pedantic rules, such as long line lengths over 100 characters.
Challenge
Edit the climate_analysis.py
file and add in a comment
line that exceeds 100 characters. Then re-run pylint and determine the
issue identifier for this message, and re-run pylint again disabling
this specific issue.
OUTPUT
************* Module climate_analysis
climate_analysis.py:3:0: C0301: Line too long (111/100) (line-too-long)
climate_analysis.py:17:0: C0325: Unnecessary parens after '=' keyword (superfluous-parens)
climate_analysis.py:1:0: C0114: Missing module docstring (missing-module-docstring)
...
We can see that the identifier is C0301
, so:
However, if we wanted to ignore this issue for the foreseeable future, typing this in every time would be tiresome. Fortunately we can specify a configuration file to pylint which specifies how we want to interpret issues.
We do this by first using pylint to generate a default
.pylintrc
file. It directs this as output to the shell, so
we need to redirect it to a file to capture it. Ensure you are in the
repository root directory, then:
If you edit this generated file you’ll notice there are many things
we can specify to pylint. For now, look for disable=
and
add C0301
to the list of ignored issues already present
that are separated by commas, e.g.:
# no Warning level messages displayed, use "--disable=all --enable=classes
# --disable=W".
disable=C0301,
raw-checker-failed,
bad-inline-option,
locally-disabled,
file-ignored,
suppressed-message,
useless-suppression,
deprecated-pragma,
use-implicit-booleaness-not-comparison-to-string,
use-implicit-booleaness-not-comparison-to-zero,
use-symbolic-message-instead
Every time you re-run it now, the C0301
issue will not
be present.
Summary
Code linters like pylint help us to identify problems in our code, such as code styling issues and potential errors, and importantly, if we work in a team of developers such tools help us keep our code style consistent. Attempting to understand a code base which employs a variety of coding styles (perhaps even in the same source file) can be remarkably difficult.
But there are some aspects we should be careful of when using linters and interpreting their results:
- They don’t tell us that the code actually works and they don’t tell us if the results our code produces are actually correct, so we still need to test our code.
- They don’t give us any Idea of whether it’s a good implementation, and that the technical choices are good ones. For example, this code contains functions to conduct temperature conversions, but it turns out there’s a number of well-maintained Python packages that do this (e.g. pytemperature)so we should be using a tried and tested package instead of reinventing the wheel.
- They also don’t tell us if the implementation is actually fit for purpose. Even if the code is a good implementation, and it works as expected, is it actually solving the intended problem?
- They also don’t tell us anything about the data the program uses which may have its own problems.
- A high score or zero warnings may give us false confidence. Just because we have reached a 10.00 score, doesn’t mean the code is actually good code, just that it’s likely well formatted and hopefully easier to read and understand.
So we have to be a bit careful. These are all valid, high-level questions to ask while you’re writing code, both as a team, and also individually. In the fog of development, it can be surprisingly easy to lose track of what’s actually being implemented and how it’s being implemented. A good idea is to revisit these questions regularly, to be sure you can answer them!
However, whilst taking these shortcomings into account, linters are a very low effort way to help us improve our code and keep it consistent.
Key Points
- FIXME
Content from Lesson 3: Intermediate Git
Last updated on 2025-03-21 | Edit this page
Overview
Questions
- FIXME
Objectives
- FIXME
Intro to feature branch workflow
Intro to merging strategies
- Options for merging (fast forward, merge commit, rebase and merge)
- Other useful git features (cherry picking via git cherry-pick, stashing changes via git stash, resetting local state via git reset)
Content from 3.1 Setup
Last updated on 2025-02-28 | Edit this page
Setup
Overview
Questions
- FIXME
Objectives
- FIXME
Content from 3.2 Some Example Code
Last updated on 2025-04-08 | Edit this page
Overview
Questions
- FIXME
Objectives
- FIXME
Creating a Copy of the Example Code Repository
FIXME: copy git-example repo into softwaresaved
For this lesson we’ll need to create a new GitHub repository based on the contents of another repository.
- Once logged into GitHub in a web browser, go to https://github.com/UNIVERSE-HPC/git-example.
- Select ‘Use this template’, and then select ‘Create a new repository’ from the dropdown menu.
- On the next screen, ensure your personal GitHub account is selected
in the
Owner
field, and fill inRepository name
withgit-example
. - Ensure the repository is set to
Public
. - Select
Create repository
.
You should be presented with the new repository’s main page. Next, we
need to clone this repository onto our own machines, using the Bash
shell. So firstly open a Bash shell (via Git Bash in Windows or Terminal
on a Mac). Then, on the command line, navigate to where you’d like the
example code to reside, and use Git to clone it. For example, to clone
the repository in our home directory (replacing
github-account-name
with our own account), and change
directory to the repository contents:
Examining the Code
Let’s first take a look at the example code on GitHub, in the file
climate_analysis.py
.
PYTHON
SHIFT = 3
COMMENT = '#'
climate_data = open('data/sc_climate_data_10.csv', 'r')
def FahrToCelsius(fahr):
"""COnverts fahrenehit to celsius
Args:
fahr (float): temperature in fahrenheit
Returns:
float: temperature in Celsius
"""
celsius = ((fahr - 32) * (5/9))
return celsius
def FahrToKelvin(fahr):
kelvin = FahrToCelsius(fahr) + 273.15
return kelvin
for line in climate_data:
data = line.split(',')
if data[0][0] != COMMENT:
fahr = float(data[3])
celsius = FahrToCels(fahr)
kelvin = FahrToKelvin(fahr)
print('Max temperature in Celsius', celsius, 'Kelvin', kelvin)
If you have been through previous Byte-sized RSE episodes, you may have already encountered a version of this code before!
It’s designed to perform some temperature conversions from fahrenheit to either celsius or kelvin, and the code here is for illustrative purposes. If we actually wanted to do temperature conversions, there are at least three existing Python packages we would ideally use instead that would do this for us (and much better). Similarly, this code should also use a library to handle the CSV data files, as opposed to handling them line by line itself.
There are also a number of other rather large issues (which should be emphasised is deliberate!):
- The code is quite untidy, with inconsistent spacing and commenting which makes it harder to understand.
- It contains a hardcoded file path, as opposed to having them within a separate configuration file or passed in as an argument.
- Function names are capitalised - perhaps we need to change these to be lower case, and use underscores between words - a naming style also known as snake case.
- The code is also some comments (known as docstrings) describing the
function and the script (or module) itself. For those that haven’t
encountered docstrings yet, they are special comments described in a
particular format that describe what the function or module is supposed
to do. You can see an example here in the
FahrToCelsius
function, where the docstring explains what the function does, its input arguments, and what it returns. - An incorrect function name
FahrToCels
, which should beFahrToCelsius
. This will cause it to fail if we try to run it.
Another thing to note on this repository is that we have a single main branch (also used to be called a master branch) which you may see in older repositories. You’ll also notice some commits on the main branch already. One way to look at this is as a single “stream” of development. We’ve made changes to this codebase one after the other on this main branch, however, it might be that we may want to add a new software feature, or fix a bug in our code later on. This may take, maybe, more than a few commits to complete and make it work, perhaps over a matter of hours or days. Of course, as we make changes to make this feature work, the commits along the way may well break the “working” nature of our repository and after that, users getting hold of our software by cloning the repo, also get a version of the software that then isn’t working. This is also true for developers as well: for example, it’s very hard to develop a new feature for a piece of software if you don’t start with software that is already working. The problem would also become compounded if other developers become involved, perhaps as part of a new project that will develop the software. What would be really helpful would be to be able to do all these things whilst always maintaining working code in our repository. Fortunately, version control allows us to create and use separate branches in addition to the main branch, which will not interfere with our working code on the main branch.
Create Example Issues
Before we look into branches, let’s create a few new issues on our repository, to represent some work we’re going to do in this session.
One thing that might be good to do is to tidy up our code. So let’s add issues to fix that script function naming bug, changing our function names to use snake case, and add the missing docstrings.
Let’s create our first issue about using snake case:
- Go to your new repository in GitHub in a browser, and select
Issues
at the top. You’ll notice a new page with no issues listed at present. - Select
New issue
. - On the issue creation page, add something like the following:
- In the title add: Functions should use snake case naming style
- In the description add: Naming of functions currently is using capitalisation, and needs to follow snake case naming instead.
- We can also assign people to this issue (in the top right), and for
the purposes of this activity, let’s assign ourselves, so select
Assign yourself
. - Select
Create
to create the issue.
Adding Work for Ourselves
Repeat this process for the other two issues in the following order: - “Add missing docstrings to function and module” - “Script fails with undefined function name error” We’ll refer back to these issues later!
QUESTION: who’s created the three issues? Yes/No
Key Points
- FIXME
Content from 3.3 Feature Branch Workflow
Last updated on 2025-04-08 | Edit this page
Overview
Questions
- FIXME
Objectives
- FIXME
Create new feature branch to work on first issue
We’ll start by working on the missing docstring issue. For the purpose of this activity, let’s assume that the bug which causes the script to fail is being tackled by someone else.
So let’s create a feature branch, and work on adding that docstring, using that branch. But before we do, let’s take a look and see what’s already there.
Examining Existing Repository Branches
We’ve already checked out our new repository, and can see what branches it currently has by doing:
OUTPUT
* main
And we can see we have only our main branch, with an asterisk to indicate it’s the current branch we’re on.
We can also use -a
to show us all branches:
OUTPUT
* main
remotes/origin/HEAD -> origin/main
remotes/origin/main
Note the other two remotes/origin
branches, which are
references to the remote repository we have on GitHub. In this case, a
reference to the equivalent main
branch in the remote
repository. HEAD
here, as you may know, refers to the
latest commit, so this refers to the latest commit on the main branch
(which is where we are now). You can think of origin
as a
stand-in for the full repository URL. Indeed, if we do the following, we
can see the full URL for origin
:
OUTPUT
origin git@github.com:steve-crouch/git-example2.git (fetch)
origin git@github.com:steve-crouch/git-example2.git (push)
If we do git log
, we can see only one commit so far:
OUTPUT
commit be6376bb349df0905693fdaad3a016273de2bdeb (HEAD -> main, origin/main, origin/HEAD)
Author: Steve Crouch <s.crouch@software.ac.uk>
Date: Tue Apr 8 14:47:05 2025 +0100
Initial commit
Creating a new Branch
So, in order to get started on our docstring work, let’s tell git to create a new branch.
When we name the branch, it’s considered good practice to include the issue number (if there is one), and perhaps something useful about the issue, in the name of the feature branch. This makes it easier to see what this branch was about:
Now if we use the following, we can see that our new branch has been created:
OUTPUT
issue-2-missing-docstrings
* main
However, note that the asterisk indicates that we are still on our
main branch, and any commits at this point will still go on this main
branch and not our new one. We can verify this by doing
git status
:
OUTPUT
On branch main
Your branch is up-to-date with 'origin/main'.
nothing to commit, working tree clean
QUESTION: who’s created their new feature branch? Yes/No
Switching to the New Branch
So what we need to do now is to switch to this new branch, which we can do via:
OUTPUT
Switched to branch 'issue-2-missing-docstrings'
Now if we do git branch
again, we can see we’re on the
new branch. And if we do git status
again, this verifies
we’re on this new branch.
Using git status
before you do anything is a good habit.
It helps to clarify on which branch you’re working, and also any
outstanding changes you may have forgotten about.
Now, one thing that’s important to realise, is that the contents of
the new branch are at the state at which we created the branch. If we do
git log
, to show us the commits, we can see they are the
same as when we first cloned the repository (i.e. from the first
commit). So any commits we do now, will be on our new feature branch and
will depart from this commit on the main branch, and be separate from
any other commits that occur on the main branch.
Work on First Issue in New Feature Branch
Now we’re on our feature branch, we can make some changes to fix this
issue. So open up the climate_analysis.py
file in an editor
of your choice.
Then add the following to the FahrToKelvin
function
(just below the function declaration):
PYTHON
"""Converts fahrenheit to kelvin
Args:
fahr (float): temperature in fahrenheit
Returns:
float: temperature in kelvin
"""
Then save the file.
QUESTION: Who has added this to the file, and saved it? Yes/No
Now we’ve done this, let’s commit this change to the repository on our new branch.
Notice we’ve added in the issue number and a short description to the commit message here. If you’ve never seen this before, this is considered good practice. We’ve created an issue describing the problem, and in the commit, we reference that issue number explicitly. Later, we’ll see GitHub will pick up on this, and in this issue, we’ll be able to see the commits associated with this issue.
Now we’ve also got a module docstring to add as well, so let’s add that. Open up our editor on this file again, and add the following to the top of the file:
Then, add and commit this change:
So again, we’re committing this change against issue number 2. Now let’s look at our new branch:
OUTPUT
commit 6bfc96e2961277b441e5f5d6d924c4c4d4ec6a68 (HEAD -> issue-2-missing-docstrings)
Author: Steve Crouch <s.crouch@software.ac.uk>
Date: Tue Apr 8 15:40:47 2025 +0100
#2 Add missing module docstring
commit 20ea697db6b122aae759634892f9dd17e6497345
Author: Steve Crouch <s.crouch@software.ac.uk>
Date: Tue Apr 8 15:29:37 2025 +0100
#2 Add missing function docstring
commit be6376bb349df0905693fdaad3a016273de2bdeb (origin/main, origin/HEAD, main)
Author: Steve Crouch <s.crouch@software.ac.uk>
Date: Tue Apr 8 14:47:05 2025 +0100
Initial commit
So, as we can see, on our new feature branch we now have our initial commit inherited from the main branch, and also our two new commits.
QUESTION: who’s edited the file and made the changes, and committed them - who’s done that twice? Yes/No
Push New Feature Branch and Commits to GitHub
Let’s push these changes to GitHub. Since this is a new branch, we need to tell GitHub where to push the new branch commits, by naming the branch on the remote repository.
If we just type git push
:
OUTPUT
fatal: The current branch issue-2-missing-docstrings has no upstream branch.
To push the current branch and set the remote as upstream, use
git push --set-upstream origin issue-2-missing-docstrings
We get a suggestion telling us we need to do this, which is quite helpful!
OUTPUT
Enumerating objects: 8, done.
Counting objects: 100% (8/8), done.
Delta compression using up to 20 threads
Compressing objects: 100% (6/6), done.
Writing objects: 100% (6/6), 805 bytes | 805.00 KiB/s, done.
Total 6 (delta 2), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (2/2), completed with 1 local object.
remote:
remote: Create a pull request for 'issue-2-missing-docstrings' on GitHub by visiting:
remote: https://github.com/steve-crouch/git-example2/pull/new/issue-2-missing-docstrings
remote:
To github.com:steve-crouch/git-example2.git
* [new branch] issue-2-missing-docstrings -> issue-2-missing-docstrings
Branch 'issue-2-missing-docstrings' set up to track remote branch 'issue-2-missing-docstrings' from 'origin'.
So here, we’re telling git to push the changes on the new branch to a
branch with the same name on the remote repository. origin
here is a shorthand that refers to the originating repository (the one
we cloned originally). You’ll notice a message suggesting we could
create a pull request to merge the changes with the main branch.
QUESTION: who’s committed that change and pushed the new branch with its commits to GItHub? Yes/no Let’s do this now!
Key Points
- FIXME
Content from 3.4 Creating a Pull Request
Last updated on 2025-04-09 | Edit this page
Overview
Questions
- FIXME
Objectives
- FIXME
How to Merge our Changes with main
?
We’ve essentially fixed the docstring issue now, so next we need to somehow merge these changes on our feature branch with the main branch.
Test Code Before you Push it!
Now before we this, ordinarily we should test the changes and ensure the code is working properly. To save time, we haven’t done that here, but that’s definitely something worth noting:
- Before pushing any changes, always manually test your code first.
- If you have any unit tests, run those too to check that your changes haven’t broken anything.
- Particularly if this was a change implementing a new feature, consider writing a new unit test to test that feature.
And we’ll do that now by using what’s known as a pull request. A pull request is a way to propose changes on a particular Git branch, and request they be merged with another branch. They’re really useful as a tool in teams, since they provide a way to check the changes before doing a merge. They allow you to see the changes to files across all the commits in the branch, and look at how these commits will change the codebase. And you can assign a number of reviewers to review the pull request, and submit a review with their thoughts on whether to accept the pull request, or make changes, and so on. Really useful! So we could create the pull request on the command line, but we’ll use the GitHub interface to do it. Which frankly, is much clearer and gives you a far better view of what’s going on.
So let’s go back to our repository on GitHub. You may see a message displayed at the top about protecting the main branch. We may come back to this later, so no need to worry about this for now.
If we select the dropdown where it says “main”, it gives us a list of
branches. We can see all branches by selecting that option at the
bottom. Now, we can see we have our new branch that has appeared, which
is separate from our main branch. If we select that, we can see the
state of the repository within this branch, including the new latest
commits here - on our climate-analysis.py
file.
Create a Pull Request
Let’s create the pull request now, by selecting
Compare & pull request
. We could also do this from the
Pull requests
tab from the top menu as well, then selecting
New pull request
.
Now it shows us an interface for creating the pull request: -
Importantly, at the top, it shows us which branch will be merged with
which branch, with the source (or comparison) branch on the right, and
the destination branch on the left. This should be our new branch for
compare:
, and main
for base:
. -
It tells us we are “able to merge” - and in this case, there are no
conflicts to worry about, which is really useful to know. So what if
there are conflicts? This is something we’ll look at later. - Below
this, it also shows us the commits associated with this branch as well
as the sum of changes to the files by these commits.
In the title, we’ll rename the PR to reference issue 2 directly,
changing it to Fixes #2 - missing docstrings
. We could add
more contextual information in the description if needed. We could also
assign others as reviewers, as we did in the previous session on code
review. But for simplicity, we’ll leave those for now. But we will
assign the pull request (or PR for short) to ourselves, since it’s a
good idea to assign responsibility where we can. So let’s create the
pull request by selecting the button.
QUESTION: who’s created the pull request? Yes/No
Now we get another screen describing the new PR itself. If we’d assigned any reviewers, we now wait for their reviews of this pull request. At this point, we could assume we’ve just done that, and the PR has been approved and is ready to merge.
By contributing work in PRs, and having reviews of PRs, it’s not just a number of people making changes in isolation. In collaborations around software, it’s very important to increase the flow of information between people making changes, in case there are any new potential issues that are introduced. And PRs give us that discipline - an opportunity really - to make sure that the changes we are making are well considered. This then becomes part of the overall cycle of development: we write code, we have it reviewed, it gets merged. But also, we help with reviewing other code too.
Coming back to the interface, it now tells us we can merge this branch automatically, and also the list of commits involved. Interestingly, even though we have created this PR to do a merge, we could continue developing our code on this new branch indefinitely if we wanted. We could make and push new commits to this branch, which would show up here, and we then merge at a future date. This may be particularly useful if we need to have a longer discussion about the PR as it is developing. The discussion would be captured in the comments for the PR, and when ready, we then merge the PR.
How Long should PRs be Open?
Which raises the question, of how long should PRs be open, or
branches for that matter? To some degree, this depends on the nature of
the changes being made But branches in Git are designed, and should be
wherever possible, short-lived. The longer a branch is open, the more
potential changes could be made to the main branch. Then when it comes
time to merge the branch, we may get a lot of conflicts we need to
manage. So generally, it’s a good idea to keep your branches open for a
day or two, a few days maximum, before creating a PR and doing a merge
if you can. Note that we can also see this PR, as well as any others, by
selecting the Pull request
tab.
Key Points
- FIXME
Content from 3.5 Merging a Pull Request
Last updated on 2025-04-09 | Edit this page
Overview
Questions
- FIXME
Objectives
- FIXME
How to Merge the Pull Request?
You’ll notice there’s a subtle dropdown on the
Merge pull request
button, which presents options forhow to
perform the merge.
FIXME: ensure rebase and merge is covered in intro?
You may remember from the introduction about doing a “rebase and
merge” as opposed to just doing a merge commit, since it leads to a
cleaner repository history. For example, if we did a normal merge here,
we’d end up with our two new commits and a merge commit on the
main
branch. But if we do a rebase and then merge, our two
commits are essentially just added to the top of the main
branch. Let’s use this method, by selecting the third option in the
dropdown: Rebase and merge
.
Note that if there had been a conflict with any commits on the main
branch, we very likely wouldn’t have been able to merge using this
method. Which in itself is a good question: even if we’d done a straight
commit directly to the main
branch, what would happen if
there was a conflict? If we have time, we’ll look at this later
The Golden Rule of Rebasing
Note that you can also do rebasing with branches on the command line. But a word of warning: when doing this, be sure you know what will happen.
Rebasing in this way rewrites the repository’s history, and therefore, with rebasing, there is a GOLDEN RULE which states that you should only rebase with a local branch, never a public (shared) branch you suspect is being used by others. When rebasing, you’re re-writing the history of commits, so if someone else has the repository on their own machine and has worked on a particular branch, if you rebase on that branch, the history will have changed, and they will run into difficulties when pushing their changes due to the rewritten history. It can get quite messy, so if in doubt, do a standard merge!
Merge the Pull Request
So now let’s go ahead and select Rebase pull request
. We
can add more information here if needed - but let’s
Confirm rebase and merge
. Note that it says that the merge
was done successfully, and suggests we can delete the branch.
QUESTION: who has merged the pull request? Yes/No
We said earlier that branches in Git should be short lived where possible, and keeping branches hanging around may cause confusion. So let’s delete it now. Now if we go the main branch on the main repository page in GitHub, we can see that the changes have been merged. And if we look at “commits”, we can see the commits we made on our feature branch have been added to the main branch.
See Commits on Issues
Now, remember those commit messages with the issue numbers in them? If we go to our issues, we can see them with the commits associated with those issues, which are listed in chronological order. This is really handy when checking on issue progress. Plus, it means the reason behind each commit is now traceable back to the originating issue. So why are there two sets of commits, when we only made one? That’s because we first made two commits to the branch, and then, using a rebase method, we applied our commits to the main branch.
Summary
So what are the benefits so far?
- By using different feature branches, as opposed to just committing directly to the main branch, we’ve isolated the “churn” of developing a feature from the main branch. This makes the work on any single branch easier to understand as a thread of work.
- It gives us the opportunity to abandon a branch entirely, with no need to manually change things back. In such a case, all we need to do is delete the branch.
- From a single developer’s perspective, we are also effectively isolated from the changes being made on other feature branches. So when a number of changes are being made, we still (hopefully!) only have to worry about our own changes.
- It gives us a process that helps us maintain a working version of
the code on
main
for our users (which may very well include ourselves!), as long as we ensure that work on other branches is properly tested and works as expected before we merge back to themain
branch. - It also gives us a mechanism - via pull requests - to have others review our code before changes are introduced into the codebase.
So what we’ve shown is one way to use feature branch workflow, By
using feature branches directly off the main branch, and merging to
main
when these changes are ready. We’ve chosen this way
for the training, since it’s more straightforward to teach in a
practical activity, but there are other “branching strategies” you can
use. Another way is to use a long-lived branch off of main, called
usually something like dev
or develop
:
- This
dev
branch represents a general branch of development. - Feature branches are created off of the
dev
branch instead ofmain
, and then merged back to thedev
branch. - Later, when a release of the software is due, or at an appropriate
point after the software has been tested, the
dev
branch is merged with the main branch.
This approach gives development greater distance from the main
branch, and it means you can merge and test all changes together on the
dev
branch before you merge with the main
branch, to ensure it all works together first. However, it also means
when it comes to merging back to main
, it can be more
difficult since the dev
branch could have built up a
considerable number of changes that need to be merged. In either case,
the key is to make sure that code is tested and checked with the right
people in your team before you merge to main
.
Key Points
- FIXME
Content from 3.6 Merge Conflicts
Last updated on 2025-04-09 | Edit this page
Overview
Questions
- FIXME
Objectives
- FIXME
Work on Another Issue
Now we still have two remaining issues we can look at. Interestingly, both of them require changes that can cause a conflict when merging, so let’s look at those now.
First, let’s go to the main
branch and pull the
repository changes on this branch to our local repository. Generally,
it’s good practice to use git pull
to synchronise your
local repo with the GitHub one before doing any further work.
So now, again, we have those two commits on main as we would expect. Let’s create a feature branch to fix our snake-case issue.
So now, edit the climate_analysis.py
file, and change
functions to use a snake case style, e.g. change
FahrToCelsius
to fahr_to_celsius
. Remember to
also change the one in the fahr_to_kelvin
function as
well.
Note we’ve changed the call to fahrtocelsius
near the
bottom. let’s commit this to our new feature branch:
Now we can commit as before
Introducing a Conflict
At this point, we could follow this advice and merge this branch’s
work into the main
branch, which would be very neat and
tidy. But life is rarely like that: what happens when someone else
commits their changes in some way to the main branch? Where does that
leave us when we come to merge?
You may recall we created an issue for fixing the function call to
the FahrToCelsius
function, where the call referenced the
function incorrectly. Let’s assume that a colleague has made these
changes, and updated the main
branch. Let’s pretend we’re
our colleague, and we’re making this fix to the main branch. First,
let’s switch to the main branch:
Now as we can see, this main branch is completely unaware of the
commits in our new feature branch, and is at the initial state of the
repository. Let’s make the fix. Now we could (and should) create a
feature branch here, make and commit the change, then merge with the
main branch. But for expediency, we’ll commit directly to the main
branch, and assume they did it the right way. Edit the
climate_analysis.py
file, and update the
FahrToCels
function call to FahrToCelsius
, and
save the changes.
BASH
git status
git add climate_analysis.py
git commit -m "#3 Fix incorrect function call"
git log
git push
Now - we have this extra commit on main, which we can see if we do:
Resolving a Merge Conflict
Now let’s see what happens when we create a pull request on our feature branch, as before, and try to merge. Again, let’s go to GitHub and then:
- Go to
Pull requests
, and selectNew pull request
. - Select the new feature branch
issue-1-use-snake-case
. - Select this new branch in
compare:
, and ensure thatbase:
saysmain
.
Note that now it says we can’t automatically perform this merge.
Essentially, it’s compared the source and destination branches, and
determined that there is a conflict, since there are different edits on
the same line for commits in the main
and feature branch.
But we’ll go ahead and create the PR anyway:
- Select
Create pull request
. - For the title, add “Fixes #1 - use snake case naming”.
- Assign yourself to the issue.
- Select
Create pull request
.
We should now see that “This branch has conflicts that must be
resolved”. And in this case, there is only one conflicting file -
climate_analysis..py
, but there could be more than one. Now
we can attempt to resolve the conflict by selecting
Resolve conflicts
.
The GitHub interface is really useful here. It tells you which files
have the conflicts on the left (only climate_analysis.py
in
this case), and where in each file the conflicts are. So let’s fix the
conflict. Near the bottom, we can see that our snake case naming of that
function call conflicts with the fix to it, and this has caused a
conflict.
Now importantly we have to decide how to resolve the conflict.
Fortunately, our fix for the snake_case issue resolves this issue as
well, since we’re calling the function correctly, which makes the other
fix redundant. So let’s remove the other fix, by editing out the chevron
and equals parts, and the fix we don’t want. We then select
Mark as resolved
, then Commit merge
. Now
unfortunately, due to the conflict commit, we can no longer rebase and
merge. So select the option to create a Merge commit
, then
select Merge pull request
, and Confirm merge
.
And as before, delete the feature branch which is no longer needed.
Commits Highlighted in Issues
If we go to the repository’s list of commits now in the
main
branch, we see that we have a “merge branch main into
issue-1-use-snake-case” commit which resolves the conflict (which
occurred on the feature branch) and also a merge pull request commit,
for when we merged the feature branch with main.
Key Points
- FIXME
Content from Lesson 4: Code Review
Last updated on 2025-03-21 | Edit this page
Overview
Questions
- FIXME
Objectives
- FIXME
Intro to coding within a collaboration
- Benefits to coding with others
- Use collaborative practices and tools before you need them
Intro to code review
- Why do code reviews?
- Types of code review
- Code reviews in Git
Content from 4.1 Setup
Last updated on 2025-03-05 | Edit this page
Setup
Overview
Questions
- FIXME
Objectives
- FIXME
Content from 4.2 Some Example Code
Last updated on 2025-04-08 | Edit this page
Overview
Questions
- FIXME
Objectives
- FIXME
Creating a Copy of the Example Code Repository
FIXME: copy review-example repo into softwaresaved org
So the first thing we need to do is create a new GitHub repository from a template repository So first go to https://github.com/UNIVERSE-HPC/review-example [copy and paste] Select ‘Use this template’ -> Create a new repository Set owner and repo name (e.g. git-example), ensure it’s set to public, Create
Key Points
- FIXME
Content from 4.3 Fixing a Repository Issue
Last updated on 2025-04-08 | Edit this page
Overview
Questions
- FIXME
Objectives
- FIXME
Adding an Issue to the Repository
Next thing to do is to add an issue to the repository Which will represent something we need to work on For the sake of this exercise, it doesn’t really matter what the issue is But perhaps we’ve spotted a problem with our codebase during development, and we need to note this problem needs to be fixed
For example, if we look at the README for the repo, we can see there’s a broken link Clearly a problem, so let’s register that as an issue Select “Issues”, then “New issue” Title: Broken link to article Description: The README link to the SSI website article is broken, resulting in a page not found error Select “Submit new issue” Have opportunity to assign someone to the issue - let’s say me And also assign what type of issue it is It’s a problem with the README, so that’s probably documentation, so let’s set it as that
QUESTION: who’s been able to create a new issue on the repository? Yes/No
Fixing the Issue
Now the next thing, is perhaps a bit later on, we decide to fix the issue So we navigate to the README (go to repository main page) And here, for the sake of the exercise, we’ll just use GitHub’s edit mechanism to edit the file directly Alternatively, and in most cases, we’d probably do this by having the repository cloned on our machine, and then we’d make the change, and submit it that way But in the interests of time and simplicity, we’ll just use GitHub’s edit function So select the edit icon And edit the README to fix the link (remove the bit that says “typo/”)
So we now need to commit the change, so we now select “Commit changes” in the top right Good practice when committing a change is to refer to the issue number in the commit message This gives us traceability for changes back to the originating issue We had our issue number 1, so let’s refer to that #1 - Fix broken article link We could optionally put more info about the fix in the description if we wanted
Now importantly, we want to submit this change as a pull request on a new branch This will allow others to review that pull request Selecting the second option here allows us to create a new branch for these changes And we can give this new branch an identifiable name readme-broken-link-fix
Once we select propose changes, this change is submitted and our new branch, with that fix, is created And scrolling down, we can see our change highlighted
QUESTION: who’s managed to commit their fix to a new branch? Yes/No
Key Points
- FIXME
Content from 4.4 Submiting a Pull Request
Last updated on 2025-04-08 | Edit this page
Overview
Questions
- FIXME
Objectives
- FIXME
Creating a Pull Request
But - we still need to submit this new branch and the commit we made as a pull request And GitHub nicely guides us to doing this Select “Create pull request”
Once we’ve done that, we can see that our pull request has been opened And is ready for consideration to be merged into the codebase For information, we can see that GitHub is aware that the change we’ve committed can be merged directly - without conflicts - into our main branch We could optionally add more info about this pull request here in comments if we wanted
QUESTION: who’s been able to create a new pull request? Yes/No
Swap Repository with Someone Else
For the next stage, you’ll be reviewing a pull request. Either:
- If you are attending a workshop with other learners, the instructor will enable you to swap the URL of your repository with the repository URL of another learner so you can review the pull request they made on their own repository.
- If you are going through this material on your own, you can review the pull request you made on your own repository instead.
Key Points
- FIXME
Content from 4.5 Reviewing a Pull Request
Last updated on 2025-04-08 | Edit this page
Overview
Questions
- FIXME
Objectives
- FIXME
Let’s now adopt a new role - that of the reviewer of a pull request (or PR for short). Let’s assume that a colleague has created a pull request of his own, for the purposes of this exercise, it’s on their own repository But it could be a shared repository we are both working on In a collaborative environment, this is mostly likely going to be the case So let’s take a look at it and review it
Write a Review of the PR
So we open the repo URL link in a browser, and go to “Pull requests” on the repo main page Then select the pull request To review the request’s changes, we can go to ‘Files changed’ - one of the tabs Which perhaps unsurprisingly, shows us the changes in each file, in this case just one file, and one change The view on the left (in red) is the old version, and the view on the right (in green), is the revised version of the line change
We have the option of adding comments or suggestions inline to the proposed changes, if we want For example, perhaps we know there is a Zenodo record for the code that this article points to, which we think should be added By hovering over a line and selecting the ‘+’ symbol at the start of the line And adding a comment So select the changed line, and add something like We should also link to the Zenodo record for the code that this article links to, at https://zenodo.org/record/250494#.Y2UUN-zP1R4 Then selecting ‘Start a review’ Can add as many comments as we want If this were a larger pull request, we would review the other changes, and add comments as needed
So let’s assume we’ve done that
Finally, as a reviewer of this pull request, we “Finish our review” We can add a comment, maybe with some high-level observations or suggestions Overall, the changes look good, although we should consider adding the Zenodo repository link.
And then we can select whether (three options here) Comment - we can just leave a comment to consider Approve - the pull request is approved as is Request changes - there are some aspects that must be addressed before it can be merged For simplicity, let’s just go with the first option for this exercise
So then submit the review, and our role as reviewer on the pull request is complete The other participant can then take our review into account, when deciding whether to merge that pull request
QUESTION: Who’s submitted a very brief code review? Yes/No
Key Points
- FIXME
Content from 4.6 Merge the Pull Request
Last updated on 2025-04-08 | Edit this page
Overview
Questions
- FIXME
Objectives
- FIXME
Read Review and Merge the Pull Request
Now - final step Back to our role as contributor… We created our own pull request, that hopefully another participant (or ourselves) has reviewed
Let’s take a look, by going back to our repository, and looking at our own pull request, and looking at the review We should now consider the review, and any observations or suggestions made At this point, we could go ahead and make any needed changes But, for simplicity, and assuming their review is positive (and they don’t suggest more changes are required) We can go ahead and merge the pull request into our codebase By selecting ‘Merge pull request’, and then “Confirm Merge”
So now, our change has been integrated into our codebase
QUESTION: Who’s read the other participants’ review of their PR, and merged it? Yes/No
Housekeeping
But - there’s a bit of housekeeping we should do The pull request branch is no longer needed, everything’s been merged So let’s keep a tidy repository and delete the branch If we go to our repo’s main page, and select ‘branches’ We can delete our pull request branch
Key Points
- FIXME
Content from Lesson 5: Unit Testing Code
Last updated on 2025-03-21 | Edit this page
Overview
Questions
- FIXME
Objectives
- FIXME
Intro to testing
- Why test your code?
- Types of testing - levels (unit, integration, system)
- Types of testing - approaches (regression testing, property-based testing)
- Mocking
Content from 5.1 Setup
Last updated on 2025-03-05 | Edit this page
Setup
Overview
Questions
- FIXME
Objectives
- FIXME
Content from 5.2 Some Example Code
Last updated on 2025-04-10 | Edit this page
Overview
Questions
- FIXME
Objectives
- FIXME
Creating a Copy of the Example Code Repository
FIXME: copy factorial-example repo into softwaresaved
For this lesson we’ll be using some example code available on GitHub, which we’ll clone onto our machines using the Bash shell. So firstly open a Bash shell (via Git Bash in Windows or Terminal on a Mac). Then, on the command line, navigate to where you’d like the example code to reside, and use Git to clone it. For example, to clone the repository in our home directory, and change our directory to the repository contents:
Examining the Code
Next, let’s take a look at the code, which is in the
factorial-example/mymath directory
, called
factorial.py
, so open this file in an editor.
The example code is a basic Python implementation of Factorial. Essentially, it multiplies all the whole numbers from a given number down to 1 e.g. given 3, that’s 3 x 2 x 1 = 6 - so the factorial of 3 is 6.
We can also run this code from within Python to show it working. In the shell, ensure you are in the root directory of the repository, then type:
PYTHON
Python 3.10.12 (main, Feb 4 2025, 14:57:36) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
Then at the prompt, import the factorial
function from
the mymath
library and run it:
Which gives us 6 - which gives us some evidence that this function is working. Of course, in practice, our functions may well be more complicated than this, and of course, they may call other separate functions. Now we could just come up with a list of known input numbers and expected outputs and run each of these manually to test the code, but this would take some time. Computers are really good at one thing - automation - so let’s use that and automate our tests, to make it easy for ourselves.
Running the Tests
As it turns out, this code repository already has a test. Navigate to
the repository’s tests
directory, and open a file called
test_factorial.py
:
PYTHON
import unittest
from mymath.factorial import factorial
class TestFactorialFunctions(unittest.TestCase):
def test_3(self):
self.assertEqual(factorial(3), 6)
Now, we using a Python unit test framework called
unittest
. There are other such frameworks for Python,
including nose
and pytest
which is very
popular, but the advantage of using unittest
is that it’s
already built-in to Python so it’s easier for us to use it.
Before we look into this example unit test, questions Who here is familiar with object oriented programming? Yes/No Who’s written an object oriented program? Yes/No
What is Object Oriented Programming?
For those that aren’t familiar with object oriented programming, it’s a way of structuring your programs around the data of your problem. It’s based around the concept of objects, which are structures that contain both data and functions that operate on that data. In object oriented programming, objects are used to model real-world entities, such as people, bank accounts, libraries, books, even molecules, and so on. With each object having its own:
- data - known as attributes
- functions - known as methods
These are encapsulated within a defined structure known as a class. An introduction to object oriented programming is beyond the scope of this session, but if you’d like to know more there’s a great introductory tutorial on the RealPython site. This site is a great practical resource for learning about how to do many things in Python!
For the purposes of this activity, we use object oriented classes to
encapsulate our unit tests since that’s how they’re defined in the
unittest
framework. You can consider them as a kind of
syntactic sugar to group our tests together, 2ith a single unit test
being represented as a single function - or method - within a class.
In this example, we have a class called
TestFactorialFunctions
with a single unit test, which we’ve
called test_3
. Within that test method, we are essentially
doing what we did when we ran it manually earlier: we’re running
factorial with the argument 3, and checking it equals 6. We use an
inbuilt function, or method, in this class called
assertEqual
, that checks the two are the same, and if not,
the test will fail.
So how do we run this test? In the shell, we can run this test by ensuring we’re in the repository’s root directory, and running:
OUTPUT
.
----------------------------------------------------------------------
Ran 1 test in 0.000s
OK
[CHECKPOINT - who’s run the tests and got this output? Yes/No]
So what happens? We see a single .
, we see a message
that says it ran very quickly, and OK
. The single dot means
the single test we have was successfully run, so our test passes!
But how does unittest know what to run exactly? Unit test frameworks
like unitttest
follow a common pattern of finding tests and
running them. When we give a single file argument to
unittest
, it searches the Python file for
unittest.TestCase
classes, and within those classes, looks
for methods starting with test_
, and runs them. So we could
add more tests in this class in the same way, and it would run each in
turn. We could even add multiple unittest.TestCase
classes
here if we wanted, each testing different aspects of our code for
example, and unittest
would search all of these classes and
run each test_
function in turn.
Testing for Failure
We’ve seen what happens if a test succeeds, but what happens if a
test fails? Let’s deliberately change our test to be wrong and find out,
by editing the tests/test_factorial.py
file, changing the
expected result of factorial(3)
to be 10
, and
saving the file.
We’ll rerun our tests slightly differently than last time:
In this case, we add -v
for more verbose output, giving
us detailed results test-by-test.
OUTPUT
test_3 (tests.test_factorial.TestFactorialFunctions) ... FAIL
======================================================================
FAIL: test_3 (tests.test_factorial.TestFactorialFunctions)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/steve/factorial-example/tests/test_factorial.py", line 8, in test_3
self.assertEqual(factorial(3), 10)
AssertionError: 6 != 10
----------------------------------------------------------------------
Ran 1 test in 0.000s
FAILED (failures=1)
In this instance we get a FAIL
instead of an
OK
for our test, and we see an AssertionError
that 6
is not equal to 10
, which is clearly
true.
Let’s now change our faulty test back by editing the file again,
changing the 10
back to 6
, and re-run our
tests:
OUTPUT
test_3 (tests.test_factorial.TestFactorialFunctions) ... ok
----------------------------------------------------------------------
Ran 1 test in 0.000s
OK
This illustrates an important point with our tests: it’s important to make sure your tests are correct too. So make sure you work with known ‘good’ test data which has been verified to be correct!
Key Points
- FIXME
Content from 5.3 Creating a New Test
Last updated on 2025-04-10 | Edit this page
Overview
Questions
- FIXME
Objectives
- FIXME
Add a New Test
As we’ve mentioned, adding a new unit test is a matter of adding a
new test method. Let’s add one to test the number 5
. Edit
the tests/test_factorial.py
file again:
[CHECKPOINT - who’s finished editing the file Yes/No]
And then we can run it exactly as before, in the shell
OUTPUT
test_3 (tests.test_factorial.TestFactorialFunctions) ... ok
test_5 (tests.test_factorial.TestFactorialFunctions) ... ok
----------------------------------------------------------------------
Ran 2 tests in 0.000s
OK
We can see the tests pass. So the really useful thing here, is we can rapidly add tests and rerun all of them. Particularly with more complex codes that are harder to reason about, we can develop a set of tests into a suite of tests to verify the codes’ correctness. Then, whenever we make changes to our code, we can rerun our tests to make sure we haven’t broken anything. An additional benefit is that successfully running our unit tests can also give others confidence that our code works as expected.
[CHECKPOINT - who managed to run this with their new unit test Yes/No]
Change our Implementation, and Re-test
Let’s illustrate another key advantage of having unit tests. Let’s
assume during development we find an error in our code. For example, if
we run our code with factorial(10000)
our Python program
from within the Python interpreter, it crashes with an exception:
OUTPUT
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/steve/factorial-example/mymath/factorial.py", line 11, in factorial
return n * factorial(n-1)
File "/home/steve/factorial-example/mymath/factorial.py", line 11, in factorial
return n * factorial(n-1)
File "/home/steve/factorial-example/mymath/factorial.py", line 11, in factorial
return n * factorial(n-1)
[Previous line repeated 995 more times]
File "/home/steve/factorial-example/mymath/factorial.py", line 8, in factorial
if n == 0 or n == 1:
RecursionError: maximum recursion depth exceeded in comparison
It turns out that our factorial function is recursive, which
means it calls itself. In order to compute the factorial of 10000, it
does that a lot. Python has a default limit for recursion of 1000, hence
the exception, which is a bit of a limitation in our implementation.
However, we can correct our implementation by changing it to use a
different method of calculating factorials that isn’t recursive. Edit
the mymath/factorial.py
file and replace the function with
this one:
PYTHON
def factorial(n):
"""
Calculate the factorial of a given number.
:param int n: The factorial to calculate
:return: The resultant factorial
"""
factorial = 1
for i in range(1, n + 1):
factorial = factorial * i
return factorial
Make sure you replace the code in the factorial.py
file,
and not the test_factorial.py
file.
This is an iterative approach to solving factorial that isn’t recursive, and won’t suffer from the previous issue. It simply goes through the intended range of numbers and multiples it by a previous running total each time, but doesn’t do it recursively by calling itself. Notice that we’re not changing how the function is called, or its intended behaviour. So we don’t need to change the Python docstring here, since it still applies.
We now have our updated implementation, but we need to make sure it works as intended. Fortunately, we have our set of tests, so let’s run them again:
OUTPUT
test_3 (tests.test_factorial.TestFactorialFunctions) ... ok
test_5 (tests.test_factorial.TestFactorialFunctions) ... ok
----------------------------------------------------------------------
Ran 2 tests in 0.000s
OK
And they work, which gives us some confidence - very rapidly - that our new implementation is behaving exactly the same as before. So again, each time we change our code, whether it’s making small or large changes, we retest and check they all pass
[CHECKPOINT - who managed to write unit test and run it? Yes/No]
What makes a Good Test?
Of course, we only have 2 tests so far, and it would be good to have more But what kind of tests are good to write? With more tests that sufficiently test our code, the more confidence we have that our code is correct. We could keep writing tests for e.g., 10, 15, 20, and so on. But these become increasingly less useful, since they’re in much the same “space”. We can’t test all positive numbers, and it’s fair to say at a certain point, these types of low integers are sufficiently tested. So what test cases should we choose?
We should select test cases that test two things:
The paths through our code, so we can check they work as we expect. For example, if we had a number of paths through the code dictated with if statements, we write tests to ensure those are followed.
We also need to test the boundaries of the input data we expect to use, known as edge cases. For example, if we go back to our code. we can see that there are some interesting edge cases to test for:
Zero?
Very large numbers (as we’ve already seen)?
Negative numbers?
All good candidates for further tests, since they test the code in different ways, and test different paths through the code.
Key Points
- FIXME
Content from 5.4 Handling Errors
Last updated on 2025-04-10 | Edit this page
Overview
Questions
- FIXME
Objectives
- FIXME
How do we Handle Testing for Errors?
But what do we do if our code is expected to throw an error? How would we test for that?
Let’s try our code with a negative number, which we’ve already identified as a good test case, from within the python interpreter:
We can see that we get the result of 1, which is incorrect, since the factorial function is undefined for negative numbers.
Perhaps what we want in this case is to test for negative numbers as an invalid input, and display an exception if that is the case. How would we implement that, and how would we test for the presence of an exception?
In our implementation let’s add a check at the start of our function, which is known as a precondition. The precondition will check the validity of our input data before we do any processing on it, and this approach to checking function input data is considered good practice.
Edit the mymath/factorial.py
file again, and add at the
start, below the docstring:
If we run it now, we should see our error:
OUTPUT
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/steve/factorial-example/mymath/factorial.py", line 9, in factorial
raise ValueError('Only use non-negative integers.')
ValueError: Only use non-negative integers.
Sure enough, we get our exception as desired. But how do we test for this in a unit test, since this is an exception, not a value? Fortunately, unit test frameworks have ways to check for this.
Let’s add a new test to tests/test_factorial.py
:
So here, we use unittest
’s built-in
assertRaises()
(instead of assertEquals()
) to
test for a ValueError
exception occurring when we run
factorial(-1)
. We also use Python’s with
here
to test for this within the call to factorial()
. So if we
re-run our tests again, we should see them all succeed:
You should see:
OUTPUT
test_3 (tests.test_factorial.TestFactorialFunctions) ... ok
test_5 (tests.test_factorial.TestFactorialFunctions) ... ok
test_negative (tests.test_factorial.TestFactorialFunctions) ... ok
----------------------------------------------------------------------
Ran 3 tests in 0.000s
OK
Brief Summary
So we now have the beginnings of a test suite! And every time we change our code, we can rerun our tests. So the overall process of development becomes:
- Add new functionality (or modify new functionality) to our code
- Potentially add new tests to test any new functionality
- Re-run all our tests
Key Points
- FIXME