Skip to Main Content

Clean and Prepare Your Data

How do I familiarize myself with my data?

Depending on the complexity of your project you may have many different files and types of files with varying degrees of information: 

  • Knowledge of how the data was created
  • How it is structured
  • What the variables the files contain can help you to get the most out of your data.

What types of questions do I need to ask about my dataset?

Before you start working with a dataset, these are the types of questions you may need to ask yourself:

  • What kind of files do you have?
  • How were your files created?
  • How are your files formatted or structured?
  • What is the relationship between the files?
  • What variables do you have in each file?
  • What do these variables actually mean?
  • What values would you expect to find for each variable?
  • What sort of values would you expect to see?

How do I document this information?

Documenting this information makes it easier for you to collaborate and share work with others. 

For more information on how to do this go to Step 3: Manage Your Data.

Suggest an edit to this guide

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.