Skip to main content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Clean and Prepare Your Data

How do I familiarize myself with my data?

Depending on the complexity of your project you may have many different files and types of files with varying degrees of information: 

  • knowledge of how the data was created
  • how it is structured
  • what the variables the files contain can help you to get the most out of your data.

What types of questions do I need to ask about my dataset?

Before you start working with a dataset, these are the types of questions you may need to ask yourself:

  • What kind of files do you have?
  • How were your files created?
  • How are your files formatted or structured?
  • What is the relationship between the files?
  • What variables do you have in each file?
  • What do these variables actually mean?
  • What values would you expect to find for each variable?
  • What sort of values would you expect to see?

How do I document this information?

Documenting this information makes it easier for you to collaborate and share work with others. 

For more information on how to do this go to Step 3: Manage Your Data.

Suggest an edit to this guide

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.