Credit: Data Carpentry
This lesson is based on Data Carpentry’s “Data Organisation In Spreadsheets” lessons from the ecology and social sciences curricula.
Good data organisation is the foundation of any research project. Most researchers have data in spreadsheets, so it is the place that many research projects start.
We often organise data in spreadsheets in the ways that we as humans want to work with data. However, in order to use tools that make computation and data analysis more efficient, reusable and reproducible, such as programming languages like R or Python, we need to structure our data in a particular way so that computers can “understand” and “make use of” the data. Since this is where most research projects start, this is where we want to start too!
In this lesson, you will learn:
- What the basic principles for using spreadsheets for good data organisation are and how to format data in spreadsheets for effective data use
- What some common mistakes with formatting data in spreadsheets are and how to avoid them
- Peculiarities of storing and handling dates in spreadsheets and how to avoid mistakes with dates
- Validating data on input for basic quality assurance and control in spreadsheets and to limit incorrect data on entry
- How to export data from spreadsheets in a way that is useful for downstream applications or publications.
All these will help you to get your current data into a good shape and plan new data collections so less “data cleaning” and “data wrangling” is needed in the future.
In this lesson, however, you will not learn:
- How to use statistics and formulas in spreadsheets, or
- How to plot graphs using spreadsheets.
There are many good tutorials available online on the topic of data analysis in spreadsheets. If you are looking to learn any of the above, a good reference is Head First Excel, published by O’Reilly.
Screenshots
Throughout this lesson we will show content and screenshots from Excel. As an actively developed software, Excel is adding new features or changing visual elements. Screenshots in this lesson may then become slightly out-of-sync and may not look identical as on your computer.
Getting Started
This lesson assumes no prior knowledge of the computational skills or tools apart from the basic usage of spreadsheet software. To get started, follow the directions in the Setup section to download the lesson data to your computer.
For Instructors
If you are teaching this lesson in a workshop, please see the Instructor Notes.