Skip to Main Content

Clean and Prepare Your Data

Why do I need to make a plan for my data?

Before you begin cleaning and processing your data creating a plan will help to keep you organized and to determine the process you'll go through to clean and prepare your data.

Your plan will also serve as documentation for those who wish to use your data in the future. The plan can also help you track down and fix mistakes.

How do I create a plan?

In general, the process for creating a plan is:

  1. Consider the current state of your data and how it is formatted.
  2. Determine the goal state of your data and how it needs to be formatted.
  3. Determine the necessary steps to transition from the current state to the goal state.

What is a goal state?

The goal state will describe how you want your data files to look like after you have finished your cleaning and processing tasks. This describes the result of all of your data cleaning and processing work. The goal state should document:

  • Creation
    • New variables added
    • New files by filtering, subsetting, joining together or processing existing data files
  • Upates
    • Changes to units or formatting of values in existing data
    • Changes to the structure of existing files (for example converting a text file from a comma delimited file to a tab delimited file).
  • Deletion
    • Variables or files removed from a file to streamline analysis or remove personally identifiable information

How will I determine my goal state?

The goal state will be determined by a number of factors:

  • The analysis that you would like to perform or output you would like to generate.
    • These are often determined by the research question which you would like to answer or the message you would like to communicate.
    • In many cases an analysis will expect your data to be formatted in a particular way.  
  • The tools you would like to use to perform the analysis or generate an output.
    • Specific tools will also impose their own requirements for the structure and format of your data.
    • To determine these requirements refer to the documentation for your tools.

Once you've determined your goal state, create your plan by documenting the steps that you'll have to perform to get from your current state to the goal state.

Suggest an edit to this guide

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.