All in One View

Content from Introduction to AI-assisted Coding

Last updated on 2026-06-17 | Edit this page

Overview

Questions

How can an AI coding assistant help development within an IDE?
What are the mechanisms by which IDE AI assistants provide help?
What risks arise when AI coding tools and autonomous agents are given increasing levels of autonomy?
How can I use AI coding assistants responsibly?
What are the main modes of AI-assisted software development, and how do they differ?
What are the limitations of a free Copilot account?
What is GitHub Copilot?
Which AI models are available within Copilot?

Objectives

Summarize the primary functions and intended use cases of common AI coding assistants.
Describe some common tasks undertaken by an IDE coding assistant.
Describe a responsible approach to using IDE coding assistants in development.
Describe how GitHub Copilot integrates with Visual Studio Code.
Describe the lifecycle of a Copilot request and how it uses data.
Describe the different built-in models and their specialisms and tradeoffs.
List the limitations of the free pricing tier of GitHub Copilot.

Generative AI has the potential to transform how researchers work with code, and with the integration of such capability within common IDEs, such as Visual Studio Code, provide the coding researcher with powerful tools to modify, expand and otherwise work with code. But this potential needs to be tempered with critical thinking and a healthy degree of skepticism.

How AI Coding Assistants Aid Software Development

AI coding tools support a spectrum of development approaches, from simple code completion to highly autonomous software development. As autonomy increases, developers spend less time writing code and more time defining goals, reviewing outputs, managing risk, and assuring quality.

Although AI capabilities are advancing rapidly, best practices for their use are still evolving. Until more mature guidance emerges, established software engineering practices such as requirements management, design review, testing, security review, and code review remain essential.

Most organisations currently use AI primarily through autocomplete and conversational assistance, with task-agent workflows becoming increasingly common.

1. Inline / Autocomplete Assistance

AI suggests code as the developer types, helping with boilerplate, repetitive patterns, simple functions, and documentation. The developer remains fully in control and reviews every suggestion.

2. Conversational Assistance

Developers interact with AI through natural language to ask questions, generate code, explain unfamiliar concepts, debug issues, or refactor software.

Examples include ChatGPT, Claude Code, GitHub Copilot Chat, and Gemini Code Assist.

3. Agentic Coding

AI tools can perform semi-autonomous tasks by planning and executing a sequence of actions within a defined scope. Developers provide goals and review the results.

Common uses include:

Implementing features
Updating tests
Drafting documentation
Refactoring code

4. Role-Based Agentic Workflow

Multiple specialised AI agents collaborate on a task, taking on roles such as analyst, architect, developer, tester, or reviewer. This is an emerging approach which involves multiple specialised AI agents collaborating on different aspects of a development task.

This encourages separation of development stages and structured review, but requires careful oversight to avoid propagating errors between stages.

5. The “Dark Factory”

Sometimes referred to as a “Dark Factory”, this aspirational model involves AI systems performing most development activities with minimal human intervention.

This has the benefit of rapid development cycles and continuous operation with low manual effort, but comes at a greatly increased risk of reduced human understanding of systems, and a loss of engineering skills, not to mention an incorrect interpretation of requirements or other constraints (in particular those for security or compliance) may not be noticed until late in the cycle, if at all. Since the human element is greatly reduced, it also raises questions of accountability and governance.

A Cornucopia of Models

An every increasing number of LLM-based AI assistants are becoming available, including (at time of writing):

ChatGPT – A conversational large language model by OpenAI that can generate code, explain programming concepts, assist with debugging, and support data analysis workflows.
GitHub Copilot – AI-powered coding assistant integrated into code editors, suggesting code completions, functions, and boilerplate across multiple programming languages.
Google Gemini – Google’s AI platform for research and coding assistance, capable of generating code, providing explanations, and supporting data analysis and workflow tasks.
Claude – A conversational AI by Anthropic designed to assist with coding, writing, and research tasks, providing explanations, summaries, and code generation support.
Microsoft Copilot – Integrated into Microsoft tools like Word, Excel, and Visual Studio, this AI assistant helps with code generation, data analysis, and workflow automation.

Within this training, we’ll be using GitHub Copilot with a selection of its integrated AI models (ChatGPT and Claude) within Visual Studio Code as the vehicles to illustrate the concepts and demonstrate how to use these tools.

Benefits and Risks

AI coding assistants offer several key benefits to research software development. They can accelerate development by reducing time spent on routine coding tasks, allowing researchers to focus on domain-specific problems. Plus, for those new to a programming language, these tools help lower the learning curve and enable faster productivity. The suggestions and examples provided by AI assistants may also encourage better coding practices and improve overall code quality. Additionally, they make it easier to maintain clear and comprehensive documentation, supporting long-term code maintainability.

However, they also introduce a number of significant risks:

Correctness and Validation - generative systems optimize for likelihood, not correctness. AI-generated code can sound confident but be incomplete, incorrect, or insecure. Researchers remain fully responsible for validating outputs.
Limited explanation - unlike a standalone AI like ChatGPT, IDE-integrated AI often provides suggestions without detailed reasoning. This can reduce researchers’ understanding of AI-generated code
Potential over-reliance - it can be very tempting to accept AI code suggestions that appear to work, without fully understanding them, and this can lead to errors or misunderstandings about what your code does.
Privacy and security risks - the AI may send code snippets to cloud services for processing. Sensitive data or unpublished research could be exposed if this is not carefully managed.
Context and Edge Cases – AI assistants may miss domain-specific requirements, edge cases, or research-specific constraints that are critical for correctness.
Code Quality - generated code may work but be inefficient, poorly structured, or violate best practices, degrading long-term maintainability.

There are also a number of tangential non-coding risks we should consider. One of these is vendor lock-in: as the perceived value of a particular vendor’s AI tool increases, so does the risk of dependence on that particular vendor’s tool. This can make switching tools difficult and leaving users at greater risk of service price increases and at the mercy of that vendor’s product roadmap, which may not align with the goals of the user. Secondly, depending on the training data used (e.g. codebases, best practice articles, etc.), biases may be introduced that favour particular technical tools and approaches which are not optimal or even sensible choices for a project. Plus, even Microsoft acknowledges that:

“The language, image, and audio models that underly the Copilot experience may include training data that can reflect societal biases, which in turn can potentially cause Copilot to behave in ways that are perceived as unfair, unreliable, or offensive.”

These may include the reinforcing of negative stereotypes, over or under-representation of specific groups of people, even inappropriate or offensive content, and variable performance across different spoken languages.

Research software produces results that inform publications, policy, and further research. Unlike commercial software where bugs typically cause inconvenience, errors in research code can invalidate findings, waste resources, introduce unwanted biases, and compromise scientific integrity. AI tools optimize for likelihood, not correctness, so again, as responsible researchers, we must scrutinise AI generative responses.

Callout

The Cautionary Tale of Replit’s use of AI

There are an increasing number of AI cautionary tales being reported in the media.

A particularly disturbing event at Replit was reported in June 2025. Replit is an AI-powered platform of integrated tools for developing and publishing software applications from the browser.

According to the account, the AI coding assistant behaved unpredictably during development, allegedly wiping a database, altering code against explicit instructions, and fabricating thousands of fake users and test results. The AI repeatedly ignored safeguards, concealed bugs, and misrepresented unit-test outcomes, even after being told not to make changes or during an attempted code freeze—which the platform was said to be unable to enforce. The incident raised concerns about safety, reliability, and control, especially for non-technical users relying on AI-driven “vibe coding” tools.

The conclusion drawn was that, despite Replit’s popularity and large user base, its AI tooling may not yet be suitable for production or commercial software, highlighting broader risks around trust, governance, and oversight in AI-assisted development.

Challenge

Class Discussion: What are your AI Coding Fears?

3 mins.

Two questions:

How do you hope AI coding assistants will help your software development?
Do you have any concerns with using AI coding assistants? e.g. what are you afraid could happen?

Show me the solution

Some possible benefits:

Speed up development
Help with understanding codebases
Assist with learning and implementing new technologies
Rapid prototyping
Real-time assistance when writing code in an editor
Free up time for doing actual research

Some possible concerns:

Incorrect but plausible code
Hallucinated APIs or behavior
Hidden assumptions
Over-trust of code / reduced code review
Poor maintainability
Loss of algorithmic understanding
Reproducibility issues
Mismatch with scientific methods
Licensing/IP uncertainty
Security and data leakage risks

A Key Risk: Technical Debt

When faced with a problem that you need to solve by writing code, it may be tempting to skip the design phase and dive straight into coding, particularly when we have AI-assistants able to generate code so comprehensively and rapidly, with such an array of features.

Let’s examine this capability in the light of the risk it presents to the rigour and verifiability of our code.

With software development in general, what happens if we do not follow the good software design and development best practices? It can lead to accumulated ‘technical debt’, which (according to Wikipedia), is the “cost of additional rework caused by choosing an easy (limited) solution now instead of using a better approach that would take longer”. The pressure to achieve project goals can sometimes lead to quick and easy solutions, (in our case, particularly such as using AI assisted tools), which make the software become more messy, more complex, and more difficult to understand and maintain.

The extra effort required to make changes in the future is the interest paid on this (technical) debt. It is natural for software to accrue some technical debt, but it is important to pay off that debt during a maintenance phase - simplifying, clarifying the code, making it easier to understand - to keep these interest payments on making changes manageable.

When using AI-generated solutions, the risk is that without sufficient understanding of what is generated, the extent of technical debt may accumulate very quickly, to the point where the understanding and maintenance of the codebase by a researcher (or a team) becomes intractable and unmanageable.

Callout

The “Almost Right” Phenomenon

The “almost right” phenomenon in AI tools, as reported by VentureBeat in 2025, refers to the tendency of AI systems - especially those based on large language models or generative AI - to produce plausibly right-but-incorrect outputs that increase technical debt.

The Stack Overflow 2025 Developer Survey highlighted an interesting set of findings:

Only 33% of developers trust AI accuracy in 2025, down from 43% in 2024
AI favourability dropped from 72% in 2024 to 60% in 2025
Developers cite “AI solutions that are alsmost right, but not quite” as their top frustration
45% say debugging AI-generated code takes more time than expected

Remedial actions such as maintaining human expertise, a focus on AI literacy, and implementing staged AI adoption are suggested.

“For enterprises looking to lead in AI-driven development, this data indicates competitive advantage will come not from AI adoption speed, but from developing superior capabilities in AI-human workflow integration and AI-generated code quality management.

Organizations that solve the “almost right” problem, turning AI tools into reliable productivity multipliers rather than sources of technical debt,will gain significant advantages in development speed and code quality.”

The Immature and Rapidly Evolving Landscape

While AI coding assistants in IDEs present features that may appear advanced and polished, the field itself remains relatively immature and is evolving at a rapid pace.

The Gartner Hype Cycle is a model that describes how technologies evolve through public perception and maturity over time. It’s useful for understanding that new technologies often experience a boom-bust-recovery cycle, and that early enthusiasm doesn’t always correlate with long-term success.

Labelled curve graph of the Gartner Hype Cycle, labelling each of the five stages

It has five phases:

Technology Trigger – a breakthrough or significant media attention launches the technology into public awareness, creating excitement and high expectations
Peak of Inflated Expectations – hype reaches its peak as early adopters, vendors, and media promote the technology enthusiastically. Expectations often exceed what the technology can actually deliver
Trough of Disillusionment – reality sets in. Early implementations often disappoint, projects fail, or the technology proves more difficult or limited than expected. Media coverage becomes negative, and interest drops sharply.
Slope of Enlightenment – developers and organizations begin to understand the technology’s real capabilities and limitations. Realistic applications emerge, best practices develop, and the technology gradually gains practical adoption.
Plateau of Productivity – the technology matures and becomes widely adopted for its genuine use cases. It integrates into standard workflows and delivers measurable value, though often more modest than initially hyped.

The landscape of AI coding assistants is characterized by:

Rapid feature development - new capabilities are continuously being added and refined by vendors competing in this space
Unstable implementations - how features are implemented, displayed, and accessed changes frequently, sometimes between minor version updates
Shifting vendor priorities - large technology companies regularly adjust their AI strategies For example, Microsoft has recently scaled back some of its ambitious AI goals for Visual Studio Code, which may affect the availability and priority of AI-assisted features in the editor
Incomplete standardization - there is no industry-wide standard for how AI assistants should integrate with IDEs, leading to inconsistent user experiences across different tools

This rapid evolution means that the tools and best practices applied to them can quickly become outdated. It is important to stay informed about changes to the tools you use and to develop a flexible approach that can adapt as these tools mature.

Challenge

Class Discussion: Where Does AI Coding Tools Fall on the Hype Curve?

1 mins.

In general, where do you think AI coding tools fall on this curve? Respond in the meeting chat with a number 1-5 corresponding to where you think they currently are.

Show me the solution

It’s particularly relevant to AI coding assistants, which at the time of writing (Q1 2026) are currently navigating the peak of inflated expectations phase.

Introduction to GitHub Copilot

GitHub Copilot integrates directly into Visual Studio Code as an extension installable from within the IDE, providing access to:

On-request explanations - allowing you to obtain responses to questions in a chat interface
Real-time assistance as you continue to develop your code - where Copilot continuously analyzes the code you write, as well as comments and surrounding context, to offer intelligent suggestions which require approval.
On-request direct code modification - by requesting specific changes, your code is modified directly by Copilot (again, requiring specific approval before it integrates the suggested changes)

All of this is integrated into the VSCode editor, so you do not need to leave your development environment.

The Lifecycle of a Copilot Prompt

So how does Copilot integrate with VSCode, and how does it handle data? Let’s look at how it creates a code suggestion as an example:

At a high level, the following steps are followed:

Within the Copilot-enabled IDE:

Developer enters text into code editor, such as VSCode, gathering context from a number of sources (code before and after cursor, file name and type, other open editor tabs)
The prompt is constructed from the amassed context and sent to the Copilot proxy

Within the Copilot proxy (within the “Cloud”):

Filters the requests, terminating those involving toxic language, unrelated code requests, and perceived hacking attempts. The prompt is sent to the GitHub Copilot LLM

The Copilot LLM (also in the “Cloud”):

Receives the request and formulates a code suggestion which is sent back to the proxy

Back within the Copilot proxy:

Receives the response, and tests code suggestions for code vulnerabilities, truncating responses that contain unique identifiers (such as email addresses, GitHub URLs, IP addresses, etc.), and filters out those matching known public code. The processed response is fed back to the Copilot client within the IDE

Back within the Copilot-enabled IDE:

The Copilot extension receives the code suggestion which is presented to the user to accept or reject

GitHub provides further detailed information about how GitHub Copilot handles data.

Different Models

GitHub Copilot’s free tier provides access to multiple large language models, each with different strengths and tradeoffs. The following table summarizes the models currently available at time of writing:

Model	Provider	Specialization	Speed	Best for
Claude Haiku	Anthropic	Balanced, efficient reasoning	Fast	Quick code completions, lightweight tasks, local development
GPT-4.1	OpenAI	Complex reasoning and analysis	Moderate	Detailed code reviews, architectural decisions, complex refactoring
GPT-5 Mini	OpenAI	Lightweight version of GPT-5	Faster	Balance of capability and speed, most general-purpose tasks

Each model can be selected based on your specific task requirements. For routine coding tasks, lighter models like Claude Haiku or GPT-5 Mini may be sufficient and faster, while more complex problems may benefit from the deeper reasoning of GPT-4.1.

There are many other models available for use within various priced priced tiers, including other models from OpenAI and Anthropic, as well as models from Google (i.e. Gemini). Some of these e.g. GPT-5-Codex have been further optimised for writing code and other software engineering tasks. You can also find a comparison of these models.

Limitations of the Copilot Free Tier

There are two key quotas which are reset per month to be aware of (which we’ll look into during the practical elements of the course):

Inline suggestions - 2000 completions per month, essentially where Copilot provides suggestions as you type
Premium requests - 50 per month, where you use more advanced AI features, such as Copilot chat requests or advanced reasonsing models

References

Key Points

AI coding tools range from autocomplete to highly autonomous development.
Higher AI autonomy requires greater human oversight and review.
Established software engineering practices remain essential when using AI.
AI can speed up development, learning, prototyping, and documentation.
AI-generated code may be incorrect, insecure, or poorly designed.
Developers remain responsible for validating all AI outputs.
Over-reliance on AI can increase technical debt and reduce understanding.
“Almost right” AI solutions often increase debugging and maintenance effort.
AI tools are evolving rapidly, and best practices are still emerging.
The Copilot free tier currently includes access to three AI models each with a different balance of speed and purpose.
The Copilot free tier currently allows 2000 completions and 50 premium requests per month.

Content from Example Code

Last updated on 2026-06-19 | Edit this page

Overview

Questions

What example code will we use for the lesson?
How do we set up the Python package prerequisites to run the code?

Objectives

Obtain the example code used for this lesson
Create a virtual environment to hold the Python packages needed to run the code
Open and run the example code within VSCode

Obtaining Example Code

For this lesson we’ll be using some example code available on GitHub, which we’ll clone onto our machines using the Bash shell. So firstly open a Bash shell (via Git Bash in Windows or Terminal on a Mac). Then, on the command line, navigate to where you’d like the example code to reside, and use Git to clone it. For example, to clone the repository in our home directory, and change our directory to the repository contents:

BASH

cd
git clone https://github.com/Southampton-RSG-Training/ai-tools-example
cd ai-tools-example

Setting up a Virtual Environment

The existing code needs the NumPy and Matplotlib packages in order to run. Let’s now create a Python virtual environment and install them. Make sure you’re in the root directory of the repository, then type

BASH

python3 -m venv venv

Once created, we can activate it so it’s the one in use:

BASH

[Linux] source venv/bin/activate
[Mac] source venv/bin/activate
[Windows] source venv/Scripts/activate

Once activated, install the needed packages:

BASH

python3 -m pip install numpy matplotlib

Examining the Repository

Next, let’s take a look at the contents of the repository by opening the repository directory within VSCode. After starting VSCode, you can do this in a couple of ways, either:

Select the Source control icon from the middle of the icons on the left navigation bar. You should see an Open Folder option, so select that.
Select the File option from the top menu bar, and select Open Folder....

In either case, you should then be able to use the file browser to locate the directory with the files you just extracted, and then select Open. Note that we’re looking for the folder that contains the files, not a specific file.

You may be presented with a window asking whether you trust the authors of this code. In general, it’s a good idea to be at least a little wary, since you’re obtaining code from the internet, so be sure to check your sources!

We also need to configure the Python extension within this workspace to use the Python contained with the virtual environment we created earlier. VSCode has a sophisticated method to access it’s inner functionality known as the Command Palette, which we’ll use to address this.

Select View and Command Palette from the VSCode menu
Begin typing Python: Select Interpreter, and then select it when it appears
A list of available Python installations should appear. Look for and select the one that says ./venv/bin/python (our virtual environment)

Once selected, the default Python interpreter for VSCode will be configured.

So far within VSCode we have downloaded some code from a repository and opened a folder. Whenever we open a folder in VSCode, this is referred to as a “Workspace” - essentially, a collection of a project’s files and directories. So within this workspace, you’ll see the following:

data/ - a directory containing some example CSV files, each representing inflammation data from a series of hypothetical clinical trials for 60 patients over 40 days
.gitignore - a text file that contains things that shouldn’t be tracked by Git version control
inflammation-plot.py - which plots three graphs of the mean, maximum, and minimum values for each day of a trial for all patients

You’ll also see venv/ which is not part of the repository, but the virtual environment we created and configured earlier.

Examining the Code

Select the inflammation-plot.py file in the explorer window, which will bring up the contents of the file in the code editor.

PYTHON

import glob

import numpy as np
from matplotlib import pyplot as plt

filenames = glob.glob('data/inflammation-*.csv')
filenames.sort()

for filename in filenames:
    print(filename)

    data = np.loadtxt(fname=filename, delimiter=',')

    fig = plt.figure(figsize=(10.0, 3.0))

    axes1 = fig.add_subplot(1, 3, 1)
    axes2 = fig.add_subplot(1, 3, 2)
    axes3 = fig.add_subplot(1, 3, 3)

    axes1.set_ylabel('average')
    axes1.plot(data.mean(axis=0))

    axes2.set_ylabel('max')
    axes2.plot(data.max(axis=0))

    axes3.set_ylabel('min')
    axes3.plot(data.min(axis=0))

    fig.tight_layout()
    fig.savefig(filename + '.png')

The aim will be to use GitHub Copilot within VSCode to improve this script throughout the session.

Note that as an example, the code is deliberately written to have flaws. Things like the line spacing is inconsistent, there are no code comments, there’s some code duplication, and you may spot other issues too. It’s also deliberately been kept relatively simple. This is for two reasons:

Most importantly, from a training perspective, when we use Copilot later to suggest changes, we’ll be able to quickly reason about the changes and how they impact the codebase
To give us enough scope to improve it

But in essence, the code is designed to do the following:

Loop through a list of all inflammation data files (sorted by their filename) in the data/ subdirectory
For each file, load the data into a Numpy array
For that array, create a plot containing three graphs, one for each of the mean, minimum and maximum of the data
Save the plot image to a file (essentially the same path and filename, with a .png added to it)

Running the Example Code

Next, you may recall we needed NumPy and Matplotlib to run this code; if you look at the the bottom right of VSCode’s status bar, it should mention the version of Python being used, e.g. 3.14.2 (venv). So here, we can see that VSCode has picked up our virtual environment we configured earlier and will use that by default.

Then, select the “Play”-looking icon at the top right of the code editor.

You should see the program run in a terminal window that appears at the bottom, along with the following output from the program:

OUTPUT

data/inflammation-01.csv
data/inflammation-02.csv
data/inflammation-03.csv
data/inflammation-04.csv
data/inflammation-05.csv
data/inflammation-06.csv
data/inflammation-07.csv
data/inflammation-08.csv
data/inflammation-09.csv
data/inflammation-10.csv
data/inflammation-11.csv
data/inflammation-12.csv

After it’s completed, we should see corresponding plot image files in the data directory, essentially each with a .png on the end. For example, inflammation-01.csv.png looks like:

Plot generated from data/inflammation-01.csv

Callout

Error: `the term conda is not recognised`

If you’re running an Anaconda distribution of Python on Windows, if you see this error it means that VSCode is not looking in the right place for Anaconda’s installation. In this case, you may need to configure VSCode accordingly:

Activate the Command Palette, either by selecting View and Command Palette in the menu, or by pressing Ctrl + Shift + P (Linux), Mac/Windows Key + Shift + P (Mac/Windows) simultaneously
Type Terminal: Select Default Profile
From the options, select the entry that’s something like Command Prompt C:\WINDOWS\...

Hopefully that should resolve the issue.

Callout

What Does the Patient Inflammation Data Contain?

Each dataset file in data/ records inflammation measurements from a separate clinical trial of the drug, and each dataset contains information for 60 patients, who had their inflammation levels recorded (in some arbitrary units of inflammation measurement) for 40 days whilst participating in the trial.

These datasets are reused from the Software Carpentry Python novice lesson.

Each of the data files uses the popular comma-separated (CSV) format to represent the data, where:

each row holds inflammation measurements for a single patient
each column represents a successive day in the trial
each cell represents an inflammation reading on a given day for a patient

Key Points

The example Python code generates a basic set of statistical plots for all data files within the data/ directory.
The code has a number of deliberate issues that we want to resolve throughout this lesson using GitHub Copilot.

Content from Getting Started with GitHub Copilot

Last updated on 2026-06-19 | Edit this page

Overview

Questions

How should I configure AI coding assistants to protect privacy and intellectual property?
How can AI help me understand an unfamiliar codebase?
Which AI assistant mode should I use for a particular task?
How can I use AI to investigate software architecture, dependencies, and implementation details?
How can I customise AI coding assistants to follow project conventions and standards?

Objectives

Configure AI coding assistants to protect privacy and intellectual property.
Use AI assistants to understand an unfamiliar codebase.
Construct effective prompts and manage context to improve AI responses.
Explain the non-deterministic nature and limitations of AI-generated outputs.
Customise an AI coding assistant to follow project conventions and standards.
Critically evaluate AI-generated explanations and code recommendations.
Use an AI assistants to identify opportunities for code improvement and refactoring.

Decide on Copilot Privacy Settings

Since Copilot’s VSCode configuration inherits from GitHub’s configuration, as a first step we can and should decide and configure a suitable level of privacy for how Copilot will operate; particularly if we are making use of sensitive or otherwise confidential data within our codebase.

We can set this within our GitHub user settings, which will apply to all we do with Copilot. Using a browser, go to github.com, and then:

Select the GitHub profile icon at github.com, and select Copilot Settings from the drop down menu
Scroll down to Privacy

By default, in the free tier, the two privacy options are enabled.

In general, it’s a good idea to disable Suggestions matching public code since the risk is that it may make use of public code sources in a way that isn’t properly licensed. In addition, it’s recommended to disable the other one (depending on the extent you trust GitHub and their affiliates), since - as it clearly states - it allows GitHub and others to use your data and code snippets for product improvements.

Asking Questions about Your Code

Attempting to understand a new codebase, whether it’s your first week on a project or one you have inherited from another source, can be difficult. This can be due to many reasons; documentation may be incomplete, architectural decisions may be undocumented, and the people who know the system may be unavailable or have left the organisation.

We can use Copilot to build our understanding and confidence about our codebase by asking natural-language questions about the code. It helps you:

Build a high-level mental model of the system, which with complex codebases is often a huge task!
Identify where key functionality is located in the code
Understand naming conventions, patterns, and dependencies
Reduce the cognitive load of first contact with unfamiliar code

Which sounds great, but with one critical caveat: it’s important to understand that Copilot is not an absolute source of truth, but more of a guide to help build your own understanding of a codebase.

Let’s use Copilot to help us investigate how our existing codebase works.

Using Copilot for the First Time

Let’s move over to using the VSCode chat pane on the right. If you don’t see this, select View from the VSCode menu and then Chat to display it.

You’ll notice at the bottom there’s a “chat” box, with a number of selectable dropdown options below it. You can hover over each one to discover what it is.

On the left, there’s Set Agent, which allows you to set the mode of operation with Copilot. Each of these modes provides an increasing degree of autonomy for Copilot to interact with your code:

Ask - designed for learning and orienting yourself with existing code without changing it.
Plan - will iterate on building a plan based on a prompt, exploring the codebase and creating a recommended plan containing step-by-step instructions for what to do. It won’t by itself make any changes to your codebase.
Agent - the most powerful mode, which lets you specify a high-level prompt to allow Copilot at a high level of autonomy to plan changes in steps, select files, and run tools or terminal commands, iterating on code edits until it is complete.

Since we want to ask questions regarding our codebase, let’s set the mode to Ask.

The next dropdown is the AI language model being used. By default, this is set to Auto so that Copilot will decide the model. However, for consistency in this training, let’s select the entry for Claude Haiki 4.5.

Select the inflammation-plot.py file in the File Explorer. You’ll notice this appears as “context” at the top of the chat box. Essentially, this is what we’re asking questions about.

Let’s now ask the very high level question: What does this code do?, and press enter. You’ll likely be asked whether you want to enable Claude Haiku 4.5 for all clients - select Enable when prompted.

You’ll find the response looks something like the following:

This code generates visualization plots for inflammation data:

1. Loads data files: Uses glob to find all CSV files matching the pattern inflammation-*.csv in the ../data/ directory, sorts them, and limits processing to the first 3 files.

2. Processes each file: For each CSV file, it:
   - Loads the data as a NumPy array using comma delimiters
   - Creates a figure with 3 subplots arranged horizontally

3. Generates three plots showing statistical summaries across patients (rows):
   - Left plot: Average inflammation values per day
   - Middle plot: Maximum inflammation values per day
   - Right plot: Minimum inflammation values per day

4. Saves output: Each figure is saved as a PNG file with the same name as the input CSV file plus .png extension.

Callout

Why Aren’t I Seeing the Same Response?

You’ll likely find it’s not exactly the same. Due to the ways in which LLMs process requests, based on an iterative probabilistic model, responses based on the same question will show differences: so essentially, AI models are probablistic, not deterministic. This effect is often visibly compounded over time, as the probabilistic outcome of a response generates an increasingly different result.

Writing Good Chat Prompts

AI tools work best with context, so it’s good to be as specific and concise as possible.

It’s also a good idea to provide simple questions, so if you end up with a large question that is decomposable into separate steps, ask each step separately and you’ll typically get a better outcome. This implies that it’s good to use a logical, iterative process of using AI to assist, using responses to simple questions to inform the next question, and so on.

There’s a very useful cheat sheet developed by Northwestern University (US) Research Computing and Data Services, which can be found in their GitHub promptEngineering repository. It contains tips and do and don’t examples to help you formulate good prompts.

Callout

Managing Context

To improve the specificity of a prompt, it’s helpful to add as much context as you can.

To add context to a chat prompt you can type # followed by the context item you want to add, such as a file, folder, tools, code elements (such as variables, functions or classes), tools, amongst others. Then press Tab to confirm. For example, we could type #inflammation-plot.py to ensure only responses relevant to that specific file will be generated. We could use #codebase if we aren’t sure which files are relevant to our question, to include all our files. Of course, our code may make use of sensitive data files that we don’t want Copilot to use, which we’ll look at shortly.

Another thing to remember is how the context of each chat informs the flow of questions. Copilot chat uses the history of a chat thread to get context about your request. To ensure you only provide the context you want, use separate threads for each new conversation area you wish to start, and delete requests that are no longer relevant or otherwise not useful.

Investigating our Codebase

Let’s ask some questions about our codebase so we can understand the implementation, in terms of the used components and how they work.

In the chat window, ensure that the Claude Haiku 4.5 model is selected:

What are the main packages used?

OUTPUT

1. glob: Built-in Python module for file pattern matching
   - glob.glob() finds all files matching a wildcard pattern (inflammation-*.csv)
   - Useful for batch processing multiple files with consistent naming

2. numpy: Numerical computing library
   - np.loadtxt() loads CSV data into a NumPy array
   - data.mean(axis=0), data.max(axis=0), data.min(axis=0) compute statistics across patients (rows) for each day (columns)

3. matplotlib.pyplot: Plotting library for creating visualizations
   - plt.figure() creates a new figure window
   - fig.add_subplot() adds subplots to the figure
   - axes.set_ylabel() and axes.plot() configure and draw the plots
   - fig.tight_layout() optimizes spacing
   - fig.savefig() exports the figure as an image file

These three packages work together: glob finds the files, numpy processes the data, and matplotlib visualizes the results.

This is a fairly typical response, although your output will vary.

Callout

Why Aren’t I Seeing the Same Results for the Same Question for the Same Codebase?

Sometimes, you may ask the same question and get a different response, even on the same machine in the same editor.

The reason is that LLMs are non-deterministic and probablistic: the same question can produce different answers. In addition to it’s probablistic nature, there are other reasons for these differences:

Context-dependent: small changes in files, cursor position, or prompt matter
Approximate understanding: infers intent, doesn’t actually “understand” the codebase
Model changes over time: behaviour can shift as Copilot and it’s underlying models are updated

You’ll see the question is answered as a general summary, but it’s volunteering other information such as the individual library calls used and what they do. AI tools are generally quite good at summarising existing content which is helpful to build understanding, whether it’s written text or code, although we must be careful not to accept it entirely at face value. Such tools often have an authoriative tone and are known to be confidentally incorrect. Nevertheless, particularly for small to moderate sized codebases that use well-documented technologies that are well represented in their training data, these tools tend towards being really useful.

However, let’s assume we want to know more about how Matplotlib generates the plots:

How is Matplotlib used to generate the plots?

This script uses Matplotlib to create and save visualization plots:

1. Create a figure: plt.figure(figsize=(10.0, 3.0)) creates a new figure with dimensions 10 inches wide by 3 inches tall.

2. Add subplots: fig.add_subplot(1, 3, n) creates a grid of 1 row and 3 columns, with n specifying which subplot (1, 2, or 3):
   - axes1, axes2, axes3 are the three subplot axes objects

3. Label and plot data:
   - axes.set_ylabel() adds a label to the y-axis
   - axes.plot() draws a line plot of the data on that subplot
   - Each subplot shows a different calculation: mean(), max(), and min()

4. Layout adjustment: fig.tight_layout() automatically adjusts spacing between subplots to prevent overlap.

5. Save output: fig.savefig() saves the entire figure as a PNG image file.

Which is a fair summary.

Due to the ambiguous nature of language and how we specify prompts, and the nature of how LLMs operate and generate responses, we’ll typically have overlaps in responses to multiple questions. Again, the key is to isolate what’s useful, and drill down and be increasingly specific on questioning until we have what we need (as we would in an a typical conversation).

Personalising Copilot to Match our Project

What is an Instructions File?

GitHub Copilot can be personalised by adding a instructions file to a repository that tells Copilot how you want it to behave in that project.

The file acts as persistent, project-level guidance for Copilot, covering things like:

Preferred architecture and design patterns
Coding style and naming conventions
Approved or banned libraries
Testing expectations and quality standards
Security or safety rules
How detailed Copilot’s answers should be

By giving Copilot this shared context it helps make its suggestions more relevant for a particular project.

It serves a similar purposes to a CONTRIBUTING.md file in a code repository; it provides guidance for how suggestions, code changes and contributions should be made, but aimed at Copilot’s day-to-day decisions instead. It does this by adding context to queries from the .github/.copilot-instructions.md file.

For example, if we were to ask Copilot “How should I make this code more readable?”, without instructions Copilot may suggest to:

Rename or format variable or function names inconsistently
Change behaviour subtly in an undesired way
Use an indentation style that isn’t typically used by team members
Without instructions, Copilot may also introduce a new design pattern the repository doesn’t use

Create an Instructions File

Let’s create an instructions file now, by invoking a Copilot command in the chat window:

/create-instructions in a .github/copilot-instructions.md file. Use PEP8 for Python code style. Do not modify files in the data/ directory.

We specify the exact location where the instructions will live, since

At this point, VSCode will do a number of things in order to create this file:

Analyse the structure and files in the workspace
Analyse the data directory data/ which contains our inflammation data files
Create the .github/instructions/copilot-instructions.md file itself
Summarise the contents of the new instructions file
Provide some questions for feedback to add more specific guidance in the instructions file

Note that it may request access to run a Bash command in the data directory, from which it can then ascertain the structure of the data, which you’ll have to approve. We’ll look at the contents of the file in a moment.

You may also see and need to respond to some feedback questions.

Working with our Instructions File

You should now see a .github/copilot-instructions.md file appear in VSCode’s file browser, which should look something like this one.

If you open this file now, you’ll see some things to note:

Perhaps some YAML “front matter” at the top, with a description and indicating it only applies to Python files (it may specify this information in another way however).
Content is highlighted with a Keep / Undo option. This is VSCode’s way of highlighting the changes Copilot has made as suggestions, so that you explicitly review and approve them.
Sections on code patterns and conventions to follow.
A section or other stipulation on not modifying the data/ directory.

Approve this file addition by selecting Keep, and save the file.

From this starting point we are free to update this file manually as we continue to develop the code, and this context will be used whenever we interact with Copilot.

Obtaining Guidance on how to Improve our Code

Now have an instructions file to guide Copilot, let’s take its guidance one step further. So far we’ve been asking quite specific questions about this codebase in order to learn more about it. But let’s now consider we might want to do to improve it.

Callout

What about other Models?

So far we’ve only used the Claude Haiku 4.5 model, but there are others at time of writing selectable from the models dropdown: essentially some variants of GPT, and a Raptor model preview.

Next to each model you’ll see an x figure, e.g. 1x. This is a multiplier related to the “cost” of using this model within our quota of premium requests. So models with a 2x beside them will cost twice as much as one with 1x.

It’s important to be clear on the relative cost, since our quotas will be consumed more quickly with higher cost multipliers!

Challenge

Asking Copilot to Suggest Improvements

5 mins.

In the chat window, select + to create a new chat conversation.
Ensure you have the Ask mode selected in the chat box, and the inflammation-plot.py file open and selected in the code editor.
Select the Claude Haiku 4.5 model in the model dropdown.
Enter How should I refactor this code to be more modular? into the chat.

Don’t update your code with any of the suggestions just yet!

Show me the solution

Looking at your code, I can see several opportunities to improve modularity. Here's a structured refactoring approach:

1. Mixed concerns: File loading, figure creation, and plotting logic are all tangled together
1. Difficult to test: Can't easily test individual plotting functions
1. Hard to reuse: If you want to plot just mean or max, you'd need to refactor
1. No separation of I/O: File operations aren't separated from business logic

Here's how I'd reorganize it:

PYTHON


from matplotlib import pyplot as plt

def load_data(filename):
    """Load inflammation data from CSV file."""
    return np.loadtxt(fname=filename, delimiter=',')

def create_figure_with_subplots():
    """Create a figure with 3 subplots for inflammation plots."""
    fig = plt.figure(figsize=(10.0, 3.0))
    axes = [
        fig.add_subplot(1, 3, 1),
        fig.add_subplot(1, 3, 2),
        fig.add_subplot(1, 3, 3)
    ]
    return fig, axes

def plot_inflammation_data(data, axes):
    """Plot mean, max, and min inflammation data on provided axes."""
    metrics = [
        ('average', data.mean(axis=0)),
        ('max', data.max(axis=0)),
        ('min', data.min(axis=0))
    ]
    
    for ax, (label, values) in zip(axes, metrics):
        ax.set_ylabel(label)
        ax.plot(values)

def process_inflammation_file(filename, output_dir='.'):
    """Process a single inflammation data file and save plot."""
    data = load_data(filename)
    fig, axes = create_figure_with_subplots()
    
    plot_inflammation_data(data, axes)
    
    fig.tight_layout()
    output_filename = f"{output_dir}/{filename}.png"
    fig.savefig(output_filename)
    plt.close(fig)  # Clean up to avoid memory issues
    
    return output_filename

def main():
    """Main entry point."""
    filenames = glob.glob('data/inflammation-*.csv')
    filenames.sort()
    
    for filename in filenames:
        print(filename)
        output_file = process_inflammation_file(filename)
        print(f"Saved: {output_file}")

if __name__ == '__main__':
    main()

Benefits of This Refactoring

- Testability: each function can be unit tested independently
- Reusability: can use plot_inflammation_data() with different data sources
- Clarity: each function has a single, clear responsibility
- Maintainability: changes to plot styling only affect plot_inflammation_data()
- Extensibility: easy to add new metrics or change output format

Further Improvements (Optional)

- Configuration: Extract hardcoded values (figsize, delimiter) to a config dict
- Error handling: Add try/except blocks around file I/O
- Logging: Replace print() with proper logging
- Type hints: Add Python type annotations for clarity
- Output directory: Make output location configurable (I've added an optional parameter)

What’s my Copilot Usage?

It’s important to keep track of just how much of our Copilot “plan” we’re using, particularly since we’re (by default) on the free tier of Copilot. We can see an overview of what we’ve consumed and how much is remaining by selecting the Copilot icon (on the left of the bell-shaped icon on the far bottom right of the status bar), e.g.

So in this case, we can see that this user has consumed 0.3% of its quota for inline suggestions (which we’ll look at later), and 36% of the chat messages quota. This allowance resets on a monthly basis, in this case February 23 2026.

Key Points

Configure Copilot privacy settings to disable public code suggestions and data-sharing before use.
Use Ask mode to explore and understand an unfamiliar codebase without making changes.
Copilot responses are probabilistic and non-deterministic, with the same question yielding different answers.
Write specific, concise prompts and break complex questions into separate steps for better results.
Use # to add explicit context (such as files or folders) to chat prompts.
A .github/copilot-instructions.md file provides persistent project-level guidance to Copilot for things like coding style and conventions.
Always critically evaluate AI-generated explanations and code, since they can be confidently incorrect.
Monitor your Copilot quota, especially when using higher-cost models (marked with multipliers greater than 1x).

Content from Refactoring Code with GitHub Copilot

Last updated on 2026-06-19 | Edit this page

Overview

Questions

What are the different ways to use Copilot to refactor code?
What is the difference between inline suggestions, plan mode, and agent mode?
Why is it important to review and critically evaluate AI-generated code suggestions?
How can plan mode help structure a refactoring task before making any changes?
What risks should you consider when delegating larger-scale code changes to an AI agent?

Objectives

Describe the three Copilot refactoring modes (inline suggestions, agent mode, plan mode) and when each is appropriate
Use inline suggestions and agent mode to make targeted, small-scale code changes
Apply plan mode to generate and review a structured refactoring plan before implementing changes
Evaluate AI-generated code suggestions to identify whether changes are correct and maintainable
Appreciate the importance of increasing skepticism and review as the autonomy and scope of AI-assisted changes increases

In the previous episode we looked at Copilot giving us guidance and advice to improve our code. In this episode, we’ll look at how Copilot can assist with code modifications more directly, looking at the following features:

Inline suggestions - where Copilot provides coding suggestions as you type
Plan mode - where an built-in Copilot agent explores our codebase and generates a step-by-step plan for improvement
Agent mode - where Copilot directly and autonomously undertakes either small or large scale changes on request, with a single option to approve the changes at the end of the process

These are in ascending order of autonomy, authority and scope that we delegate to Copilot to make changes. As we’ll see, as we delegate more authority and scope, the more we should increase our skepticism and diligence in reviewing and understanding suggestions made by such tools.

Inline Suggestions

Within VSCode, Copilot can provide “inline” suggestions as we type, which go much further than typical IDE autocomplete suggestions.

If we select the Copilot icon again in the status bar, we can see the current inline settings:

So here, we can see that inline settings will apply to all Python files. These suggestions will appear as “ghosted text” suggestion which you can autocomplete with tab, very similar to how they appear with standard VSCode autocomplete. There are also Next edit suggestions which go beyond the immediate context to make suggestions in other places in your code. These predict the location and the content of the next edit you’ll want to make.

Let’s say we want to add a new section describing our coding style. Add the following at a suitable point in the file (you won’t need to add the header if it already exists):

MARKDOWN

## Python Coding Style
-

You should see a suggestion appear direcly after your cursor, something like Use type hints for function parameters and return types, Use 'snake_case' for variable and function names or similar. You can accept this suggestion by pressing Tab. If you continue to add new lines after this, you may find it continues to suggest other things to include, so we end up with, for example:

MARKDOWN

### Coding Style
- Use type hints for function parameters and return types
- Use `snake_case` for variable and function names
- Use `CamelCase` for class names

It does this by rapidly incorporating contextual information from a number of sources to infer a suggestion, including:

The code file you are editing
Any code you have currently selected
Frameworks, languages and dependencies
Any instructions files

Copilot suggestions are a starting point, but we should alwyays review, understand, and amend as necessary, as opposed to blindly accepting suggestions.

Challenge

Amend the Suggestions

3 mins.

Review the copilot-instructions.md file, and add/amend to fit your coding style.

Show me the solution

For example:

MARKDOWN

### Python Coding Style
- Comment code sections for clarity, especially to explain purpose and logic, and to describe data processing steps

As we’ll see shortly, these coding style guidelines will inform future refactoring of our code.

Of course, we can also do this for code. In inflammation-plot.py, begin adding an additional parameter to the functions calls in the code, by placing an additional , at the end of the given list of parameters, and see which inline suggestions are being made, e.g.:

PYTHON

    axes1.plot(data.mean(axis=0), ...)

You might see a suggestion for a label and/or color parameter. For example, add in one for color:

PYTHON

    axes1.plot(data.mean(axis=0), color='blue')

If we approve this change and then do the same for the other axes2 and axes3 variables, it suggests variants of that for the other calls to plot, and you can approve these by selecting Tab, e.g.

PYTHON

    axes2.plot(data.mean(axis=0), color='red')
    axes3.plot(data.mean(axis=0), color='green')

So the suggestions made are based on the context of the code you are editing: it’s generating the most likely suggestions based on what we’ve entered before. We’ve added a color parameter to a previous .plot call, so doing something similar for other .plot calls is likely.

Agent Mode: What about Small Changes?

Agent mode differs from inline suggestions by offering the ability to enact changes step-by-step. Unlike inline suggestions which appear as you type, this mode allows you to request broader changes across multiple lines or functions, so it can be used for small or large scale changes.

To use Agent mode to make a small edit, you highlight the code you want to modify before requesting the change you want.

For example:

Select + to create a new chat conversation.
Select inflammation-plot.py in the chat context.
Select Agent as the Copilot mode in the chat window.
Select a model of your choice.
Select the entire for loop in inflammation-plot.py; you’ll notice that the chat context now includes this file with the selected line numbers.
Enter Add a comment about this code above this loop.
Press Enter.

You’ll now see a comment added above the loop highlighted in green, e.g.

PYTHON

# Process each inflammation data file and generate a 3-panel visualization
# showing the mean, maximum, and minimum inflammation values across patients

You’ll also see a Keep or Undo pop-up displayed at the bottom. Read through the comment, and if you agree that the comment summarises the code sufficiently, select Keep.

Callout

The Temptation to Blindly Accept!

So note that here, we properly scrutinise the suggestion as opposed to accepting it blindly! it would be all too easy to just assume it’s correct and just accept it, but it’s helpful to remember that tools like Copilot are like a more helpful autocomplete, not a thinking teammate. As such, skepticism and review must become a key practice when using such tools.

Plan Mode: Creating a Plan for Refactoring

We could now use agent mode to do larger-scale changes, such as refactoring our entire codebase. However. instead of “one-shotting” the development of code using generative AI, a far better approach is to plan and implement our code in a step-wise, incremental fashion. This way, we can understand and review changes before they are implemented. So how should we go about this?

Let’s use the built-in VSCode plan agent that helps developers break down tasks into clear and actionable steps before writing code. Instead of jumping straight into implementation, it researches the requested task, analyses any existing project code in read-only mode, and generates structured step-based plans for features or for other code modification activities. In short, it aims to guide users through a thoughtful planning phase prior to coding to reduce errors and encourage better design and implementation decisions.

Let’s try it out now.

Select + to create a new chat conversation.
Select Plan from the Copilot mode dropdown in the chat panel.
Select inflammation-plot.py in the chat context.
Select Claude Haiku 4.5 selected in the model dropdown.
Enter the following into the chat:

How should I refactor this code to make it more modular, readable, and documented?
Press Enter.
Answer any clarifying questions from the plan agent.
Observe the step-by-step thinking and actions undertaken by the agent.
When the planning agent concludes, you’ll have a number of options to proceed. Select the option to Open in Editor, and Keep.
Save the file that appears by selecting Save as prompt file which should appear in the bottom right of the editing window.

You should find you end up with something similar to this, saved as a prompt file (either in the repository root or in the .github/prompts directory), although the content will likely differ:

MARKDOWN

# Plan: Refactor inflammation-plot.py for modularity and readability

## TL;DR
Extract the visualization logic into dedicated functions with clear responsibilities, add comprehensive docstrings at module and function levels, use descriptive variable names, and apply the main-guard pattern. This will make the code more reusable, testable, and easier to maintain.

## Steps

1. **Add module-level docstring** — Describe what the script does, its input (data files), and output (PNG plots)

2. **Extract plot creation into `create_inflammation_plot()` function** — Move the subplot creation and data plotting logic (lines 14-27) into a separate function that takes a data array and returns a matplotlib figure. This function handles:
   - Creating the figure with specified dimensions
   - Setting up three subplots
   - Plotting mean, max, and min statistics
   - Returning the figure object

3. **Extract statistics calculation and plotting into helper function (optional)** — Create a small helper like `plot_statistic()` to reduce repetition when adding the three plots (mean, max, min). Or keep it simple and just document the pattern inline.

4. **Create main processing function** — Extract the file globbing, sorting, and iteration loop into a `main()` or `process_files()` function that orchestrates the workflow

5. **Add function docstrings** — Each function needs:
   - One-line summary
   - Parameters (what they are, expected types/format)
   - Returns (what is returned)
   - Brief description of behavior

6. **Improve variable names** — Replace `axes1`, `axes2`, `axes3` with `axes_avg`, `axes_max`, `axes_min` for clarity

7. **Extract configuration constants** — Move hardcoded values to the top:
   - Figure size: `FIGURE_SIZE = (10.0, 3.0)`
   - Colors: `COLOR_MEAN = 'blue'`, `COLOR_MAX = 'red'`, `COLOR_MIN = 'green'`
   - Output extension: `OUTPUT_EXT = '.png'`

8. **Add main guard** — Wrap the entry point with `if __name__ == '__main__':` to allow importing as a module

## Relevant files
- inflammation-plot.py — Refactor entire file with functions, docstrings, and constants

## Verification
1. Script runs without errors: `python inflammation-plot.py`
2. All 12 PNG files are generated in the workspace root
3. Each function has a complete docstring (module, function-level)
4. Variable names clearly indicate purpose (no generic `axes1`, `axes2`, `axes3`)
5. All configuration values are defined as constants at module top
6. Code is syntactically valid (no PEP8 linting errors per `.github/copilot-instructions.md`)

## Decisions
- Keep the core logic simple — don't over-engineer with factories or complex patterns
- Extract constants only for values used more than once or that represent meaningful configuration
- Document the statistics meaning (average across patients, max/min across time points) in function docstrings

## Further Considerations
1. **Testing strategy** — Would you like the refactored code to be more testable (e.g., separating I/O from logic so functions can be unit tested)?
   - Recommended: Separate file I/O and visualization logic, making the core plot function testable with mock numpy arrays

2. **Output directory** — Currently saves PNG files in the workspace root. Should outputs go to a dedicated `output/` directory instead?
   - Recommended: Keep in root for now (inline with current behavior), but mention this as future improvement

3. **Error handling** — Should the code validate that data files exist and handle malformed CSV gracefully?
   - Recommended: Add basic try-except around file loading with helpful error messages

Just by itself, this feature is incredibly useful. We should of course review the plan and ensure it’s suggestions are suitable and in line with what we want from this codebase and amend it appropriately, and once ready, we could then manually follow this plan and implement the changes. That way, we maintain full control of those changes.

The review and refine component is particularly important. We should ask questions such as:

Are there any questions posed by the plan agent that we need to answer?
Do the suggested changes solve the actual purpose of the request, or only a plausible-looking version of it?
Has the AI changed anything outside the intended scope?
Is the proposed design consistent with the existing architecture and coding conventions?
Importantly, is the code simple enough to understand, maintain, and review?
Are there meaningful new or updated tests that prove the change works?
Could the change introduce security, privacy, licensing, or performance risks?
Are any dependencies, generated code, or copied patterns acceptable for the project?
Can a human developer explain and take responsibility for every part of the change?

Essentially, amend the plan until it is within the scope of what is required, using techniques, design choices, and technologies that we understand, for which we are capable of taking responsibility to adapt and maintain in the future. If we can’t do these things, we need to refine the plan.

Challenge

Refine the Plan

5 mins.

Review the generated plan, answering any questions and refining it to your satisfaction, then save the file.

Show me the solution

For example, a generated plan asked questions about the following points:

MARKDOWN

## Further Considerations
1. **Testing strategy** — Would you like the refactored code to be more testable (e.g., separating I/O from logic so functions can be unit tested)?
   - Recommended: Separate file I/O and visualization logic, making the core plot function testable with mock numpy arrays

2. **Output directory** — Currently saves PNG files in the workspace root. Should outputs go to a dedicated `output/` directory instead?
   - Recommended: Keep in root for now (inline with current behavior), but mention this as future improvement

3. **Error handling** — Should the code validate that data files exist and handle malformed CSV gracefully?
   - Recommended: Add basic try-except around file loading with helpful error messages

Which were added to a Decisions section in the plan:

MARKDOWN

- Separate file I/O and visualization logic, making the core plot function testable with mock numpy arrays
- Place output files within a dedicated `output/` directory
- Add basic try-except around file loading with helpful error messages

So why not just use the agent mode to develop the code without developing a plan first? Whilst this would be faster, there are some key benefits to a planning step:

Importantly, we are now moving from ad-hoc development to intentional development, which externalises key decisions, and forces us to consider ways forward and make such decisions openly, and capture these within a defined plan that we validate and refine before moving to implementation.
We may extend this plan with other sections and further detail as needed.
It has created a concrete document we can discuss and refine with colleagues before we proceed.
Importantly, it also provides a “checkpoint”: if the implementation is unsatisfactory we can remove the implementation, amend the plan, and ask Copilot to create the implementation again.

Agent Mode: Larger-scale Changes

We can also ask Copilot in agent mode to make much larger, potentially multi-file changes across our codebase. So instead of asking Copilot to change a specific piece of code, you give it a goal, such as adding a feature, or refactoring a module. Copilot then plans how to achieve that goal and works across the repository to do so. It may read and modify multiple files, add or update tests, adjust configuration, and iterate over several steps before presenting the result. In this way, agent mode behaves more like a junior developer taking on a task, rather than a pair programmer responding line-by-line.

Note the increase in authority to modify code, which represents a much greater risk: the impact of changes is greater and requires more careful (and potentially more involved) review.

Since we’ve already come up with a plan for refactoring our code, let’s now use agent mode to follow this plan and implement the changes.

Select + to create a new chat conversation
Select inflammation-plot.py in the chat context, then select + and the plan file to ensure it’s also included.
Set Copilot’s mode to Agent
Select Claude Haiku 4.5 from the models dropdown
Enter the following chat prompt: Refactor the code following the plan
Once suggested improvements appear, review the changes and either:
- If you agree with them and think they are an improvement on the original code, Keep the suggestions
- If you don’t agree with them, cancel the suggestions and select the circular Retry icon at the bottom left of the chat response until you get something more acceptable

An example of output:

PYTHON

"""
Inflammation data visualization script.

This module processes inflammation data from CSV files and generates
visualization plots showing average, maximum, and minimum inflammation
values across patients over time.

Input: CSV files in the data/ directory (inflammation-*.csv)
Output: PNG plots saved to the output/ directory for each data file
"""

import glob
import numpy as np
from matplotlib import pyplot as plt

# Configuration constants
FIGURE_SIZE = (10.0, 3.0)
COLOR_MEAN = 'blue'
COLOR_MAX = 'red'
COLOR_MIN = 'green'
OUTPUT_EXT = '.png'
DATA_PATTERN = 'data/inflammation-*.csv'
OUTPUT_DIR = 'output/'


def create_inflammation_plot(data):
    """
    Create a visualization plot for inflammation data.
    
    Parameters
    ----------
    data : numpy.ndarray
        A 2D array where rows represent patients and columns represent
        time points. Each cell contains an inflammation value.
    
    Returns
    -------
    matplotlib.figure.Figure
        A figure object containing three subplots showing the mean,
        maximum, and minimum inflammation values across patients.
    """
    fig = plt.figure(figsize=FIGURE_SIZE)
    
    axes_avg = fig.add_subplot(1, 3, 1)
    axes_max = fig.add_subplot(1, 3, 2)
    axes_min = fig.add_subplot(1, 3, 3)
    
    axes_avg.set_ylabel('average')
    axes_avg.plot(data.mean(axis=0), color=COLOR_MEAN)
    
    axes_max.set_ylabel('max')
    axes_max.plot(data.max(axis=0), color=COLOR_MAX)
    
    axes_min.set_ylabel('min')
    axes_min.plot(data.min(axis=0), color=COLOR_MIN)
    
    fig.tight_layout()
    
    return fig


def process_files():
    """
    Process all inflammation data files and generate visualization plots.
    
    This function:
    1. Finds all inflammation CSV files in the data directory
    2. Loads each file as a numpy array
    3. Creates a visualization plot for the data
    4. Saves the plot as a PNG file in the output directory
    
    Returns
    -------
    None
    """
    filenames = glob.glob(DATA_PATTERN)
    filenames.sort()
    
    for filename in filenames:
        print(f"Processing {filename}")
        
        try:
            data = np.loadtxt(fname=filename, delimiter=',')
        except FileNotFoundError:
            print(f"Error: File {filename} not found")
            continue
        except ValueError as e:
            print(f"Error: Could not parse {filename}: {e}")
            continue
        
        fig = create_inflammation_plot(data)
        
        base_filename = filename.split('/')[-1].replace('.csv', '')
        output_filename = f"{OUTPUT_DIR}{base_filename}{OUTPUT_EXT}"
        fig.savefig(output_filename)
        print(f"Saved plot to {output_filename}")


if __name__ == '__main__':
    process_files()

It will also provide a list of changes made. As per the plan above:

Constants have been defined at the top, removing hardcoded elements
Modularised the codebase by refactoring into two separate functions
Docstrings have been added for each of the functions and the module
The subplots are generated within a loop using zip() to provide corresponding pairs of array elements into the loop. If we didn’t like this particular style, we might use Edit mode on this segment to simplify it

Note that it differs substantially from the version shown from a similar question made in Ask mode earlier, since LLMs are probablistic, but also more importantly because it was implemented based on a plan we refined.

However - whilst it is more modular, it’s now a lot larger whereas before it was 32 lines - we might consider this to be quite an over-engineered overkill!

Callout

What if I don’t get a Good Response?

If you aren’t getting a good response despite retrying several times, it suggests that the prompt is either not reflective of what you’re really after or isn’t specific enough, so try amending the request.

Now once the suggestions have been integrated, we’re still able to undo these changes, e.g. either by selecting Edit and Undo from the VSCode menu, or pressing Ctrl + Z (or Cmd/Windows Key + Z). We’re also able to edit a previous prompt in the chat window, perhaps adding more specifics for what we want.

Summary

In this episode we explored three ways GitHub Copilot can assist with refactoring code, each offering a different level of autonomy:

Inline suggestions, which provide lightweight, context-aware completions as you type — useful for small, targeted edits with immediate feedback.
Agent mode, which can apply changes across multiple lines or functions based on a single instruction — powerful for targeted refactoring, but requires careful review of the result.
Plan mode, that analyses your codebase and generates a structured, step-by-step refactoring plan before any changes are made — shifting development from ad-hoc to intentional, and giving you a reviewable document to refine and discuss before proceeding.

A key theme throughout is the importance of critical review: as the autonomy and scope of AI assistance increases, so too must your scrutiny. AI-generated suggestions are a starting point, not a finished product, and whilst they increase productivity, you remain responsible for understanding and maintaining every change that goes into your codebase.

Key Points

Copilot offers three levels of refactoring support: inline suggestions, agent mode, and plan mode, in ascending order of autonomy and scope.
Inline suggestions appear as ghosted text while typing and are driven by immediate code context.
Agent mode can make targeted edits across multiple lines or functions when given a specific instruction.
Plan mode analyses the codebase and produces a step-by-step refactoring plan without making any changes, enabling review before implementation.
AI-generated suggestions should always be reviewed and understood before accepting; increasing autonomy requires increasing scrutiny.

All in One View

Overview

Questions

Objectives

How AI Coding Assistants Aid Software Development

1. Inline / Autocomplete Assistance

2. Conversational Assistance

3. Agentic Coding

4. Role-Based Agentic Workflow

5. The “Dark Factory”

A Cornucopia of Models

Benefits and Risks

The Cautionary Tale of Replit’s use of AI

Class Discussion: What are your AI Coding Fears?

Show me the solution

A Key Risk: Technical Debt

The “Almost Right” Phenomenon

The Immature and Rapidly Evolving Landscape

Class Discussion: Where Does AI Coding Tools Fall on the Hype Curve?

Show me the solution

Introduction to GitHub Copilot

The Lifecycle of a Copilot Prompt

Different Models

Limitations of the Copilot Free Tier

References

Overview

Questions

Objectives

Obtaining Example Code

BASH

Setting up a Virtual Environment

BASH

BASH

BASH

Examining the Repository

Examining the Code

PYTHON

Running the Example Code

OUTPUT

Error: the term conda is not recognised

What Does the Patient Inflammation Data Contain?

Overview

Questions

Objectives

Decide on Copilot Privacy Settings

Asking Questions about Your Code

Using Copilot for the First Time

Why Aren’t I Seeing the Same Response?

Writing Good Chat Prompts

Managing Context

Investigating our Codebase

OUTPUT

Why Aren’t I Seeing the Same Results for the Same Question for the Same Codebase?

Personalising Copilot to Match our Project

What is an Instructions File?

Create an Instructions File

Working with our Instructions File

Obtaining Guidance on how to Improve our Code

What about other Models?

Asking Copilot to Suggest Improvements

Show me the solution

PYTHON

What’s my Copilot Usage?

Overview

Questions

Objectives

Inline Suggestions

MARKDOWN

MARKDOWN

Amend the Suggestions

Show me the solution

MARKDOWN

PYTHON

PYTHON

PYTHON

Agent Mode: What about Small Changes?

PYTHON

The Temptation to Blindly Accept!

Plan Mode: Creating a Plan for Refactoring

MARKDOWN

Error: `the term conda is not recognised`