Introduction
|
|
Organising data in spreadsheets
|
Organise your data according to the tidy data principles.
Never modify your raw data. Always make a copy before making any changes.
Keep track of all of the steps you take to clean your data in a plain text file.
Record metadata in a separate plain text file (such as README.txt) in your project root folder or folder with data.
|
Common spreadsheet errors
|
Avoid using multiple tables within one spreadsheet.
Avoid spreading data across multiple tabs.
Record zeros as zeros.
Use an appropriate null value to record missing data.
Do not use formatting to convey information or to make your spreadsheet look pretty.
Place comments in a separate column.
Record units in column headers.
Include only one piece of information in a cell.
Avoid spaces, numbers and special characters in column headers.
Avoid special characters in your data.
|
Dates as data
|
|
Quality assurance and control
|
Always copy your original spreadsheet file and work with a copy so you do not affect the raw data.
Use data validation to prevent accidentally entering invalid data.
Use sorting to check for invalid data.
Use conditional formatting (cautiously) to check for invalid data.
|
Exporting data
|
Data stored in common spreadsheet formats will often not be read correctly into data analysis software, introducing errors into your data.
Exporting data from spreadsheets to formats like CSV or TSV puts it in a format that can be used consistently by most programs.
|