Skip to main content

Anonymize Your Data

What is Data Anonymization?

  • Data anonymization involves assessing data to see if there is any information that could identify participants, and changing that data to be anonymous. This process does not affect the data’s integrity.
  • Things that increase the risk of identification include:
    • name
    • contact information 
    • account information 
    • driver’s license
    • SIN 
    • photos.
  • Information that can be combined with other parts of the data, or publicly available information to identify participants, such as age, uncommon characteristics, ethnicity, occupation, or geographic information, also presents a risk. 

Why should I Anonymize my Data?

The Tri-Council Policy Statement for Ethical Conduct for Research Involving Humans requires sensitive or information that would personally identify a participant to not be disclosed in research findings unless participants have given explicit consent. 

How do I Anonymize my Data?

Some methods to anonymize data include:

  • removing direct identifiers
  • aggregating variables into broader categories 
  • adding random variation to the data
  • provide a lower level of detail (eg. birthdate > birth year)
  • giving a random sample instead of the entire data file 
  • using pseudonyms or vague descriptors

How do I get started?

  • Consider the degree of anonymization required, and base methods on your data
  • Keep a copy of your original data in a secure location 
  • Record changes you make in a log that is also kept separately in a secure location 
  • Be consistent with codes, pseudonyms, and replacement terms 
  • Use a standard classification system


Resources to Help with Data Research

Guide: Find data and StatisticsGuide: Choose the Best InfoVideo: Thinking Critically About Data

Suggest an edit to this guide

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.