Skip to Main Content

Clean and Prepare Your Data

How do I use this guide?

This is the homepage in a series of guides to help you clean and prepare your data for analysis or visualization. 

Use the navigation to the left of the screen to proceed through the guide.

Why do I need to clean and prepare my data?

When you get your data, whether you've collected it yourself or found it somewhere else, it's often not in a format that's ready for analysis or visualization.

The process of cleaning and preparing data is often referred to as Data Wrangling. Data Wrangling tasks cover a wide variety of steps which you may need to perform to get your data into a format that is usable.

What are the tasks I need to do to clean and prepare my data?

These can vary greatly from project to project, but some of the common tasks include:

  • Formatting values
    • For example: changing date formats
    • Converting between units
  • Dealing with Anomalies
    • Detecting and dealing with outliers
    • Detecting and removing duplicate values 
  • Standardizing values
    • Making sure all values are formatted consistently for each variable
    • Detecting and standardizing values which have the same meaning but are formatted differently
      • Eg. Values like “Charles St” and “Charles Street” in an address.
  • Data Augmentation or Extension
    • Adding new columns to your dataset
    • Combining your dataset with another dataset

What are the tasks I need to do to clean and prepare my data?

Step 1. Get to know your data

Learn to identify important information that you’ll need to know before you can start working with your data.

Step 2. Plan your approach

Learn how to stay organized by making a plan to guide your data work.

Step 3. Manage your data

Learn good practices for working with your data that will help you avoid costly mistakes.

Step 4. File management

Learn good habits to help you recover from mistakes or data loss.

Step 5. Choose your tool

Learn what kind of tools are available and how to pick the right tool for the job.

Step 6. Conduct your tasks

Learn how to get started cleaning and processing your data using OpenRefine.

Resources to help with data research

Guide: Find data and Statistics 

Suggest an edit to this guide

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.