Skip to Main Content

Organizing Your Research Data

Contributors: Carrie Breton and Lucia Costanzo

Enhance your research data organization skills

This guide was designed to benefit U of G researchers and outlines the best practices needed for organizing research data. Whether you are a faculty member, staff, or student, by the end of this guide you should be able to:

  • Describe the best practices for organizing research data.
  • Implement the best practices of research data organization when writing a Data Management Plan (DMP).
  • Apply these best practices when preparing data for deposit in the U of G Research Data Repositories.

Review and structure the content of your data files

What content should be included in the data?

Before sharing and preserving your project's data, you need to decide what content will be included in your data files. Consider whether to include raw data, analysis data, or a subset of your data. If there is sensitive or personally identifying information, ensure you have Research Ethics Board (REB) approval and informed consent from participants to share/deposit these data and anonymize the data if necessary. If using secondary data, ensure you have the right to share, redistribute, and deposit it.

How should data be structured?

Consider data structure throughout the research life cycle to enhance readability, accessibility, and compatibility with different software, promoting long-term usability.

When working with spreadsheet data, it is important to use rectangular format with only columns and rows, excluding any dynamic or linked content such as charts, formulas, and inter-worksheet links.

The example below depicts an unstructured data table with structural issues including:

  • Blank columns, rows, and cells.
  • The header row is placed lower in the table.
  • Merged cells are used to group variables.
  • Mixed text and numbers in cells.
  • Field notes throughout the table.
  • Summary statistics in the last row of the table.

An unstructured, non-rectangular data table.

The example below depicts a structured data table presented in rectangular format. The data table includes:

  • Only filled rows and columns. Blank rows/columns have been removed.
  • Defined header row at the top of the table.
  • Filled cells and missing values have been declared (e.g., -9999 = missing or NA).
  • Cell merging has been removed and prefixes added to variable names are used to group related variables.
  • Cells are defined as text (string) or numeric.
  • Notes field at the end of the table to capture field notes.
  • Summary statistics have been removed.

A structured, rectangular data table.

Recommended resources

Use consistent folder and file naming conventions and organization

How should folders and files be named?

Files/folders should be named consistently and descriptively to promote effective browsing and retrieval. Names should reveal file and folder content and provide context.

When naming your folders and files:

  • Keep names short (25 characters or less)
  • Make file names differ from folder names
  • Use alphanumeric characters only
  • Avoid spaces and special characters
  • Use camelCase or underscores to separate parts of the name
  • Follow an international standard date format like YYYYMMDD
  • Include version numbers if needed (e.g., V1, V2)

An example of a filename convention:

  • YYYYMMDD_ContentDescription_Version.ext

Where ‘YYYYMMDD’ is a standard date format, ‘Content Description’ is a file description, ‘Version’ is versioning information, and ‘ext’ is the file extension.

How should folders and files be organized?

Your folder directory structure should maximize clarity and discoverability of different types of files.  Keep it simple, limit the directory structure to no more than four levels and ten or less sub-folders within each level.

Recommended resources

Use preservation friendly file formats

What file formats should data be shared/preserved in?

To promote future readability, save files in open-source, well-documented, software-agnostic formats that are likely to remain usable over the long term (e.g., TXT, CSV, JPEG, MP3, MP4, etc.). Additionally, save and store files in uncompressed and unencrypted format.

Please refer to the U of G Research Data Repositories Data Deposit guide for recommended file formats that promote long-term access and preservation of research data.

How should Microsoft Office Excel files be shared/preserved?

If your Excel files have important formatting needed to support interpretation and reuse of the data, keep the original Excel file, but save a copy in a preservation friendly format.

If you are depositing your Excel files in a repository, it is recommended that multi-sheet workbooks be broken down into separate worksheet files before depositing. 
 

Provide detailed metadata and clear data access and terms of use

What is metadata?

Robust metadata is essential for understanding and reusing data. It covers all aspects of the data including:

  • Why the data was collected
  • Who collected the data
  • What was collected
  • When it was collected
  • How it was collected

How can data access be controlled?

When depositing data to a repository, if required, file restrictions can limit access to designated users or upon request. Alternatively, embargoes can be used to temporarily restrict access, such as before manuscript publishing or for patent/commercialization purposes.

How should terms of use be defined for data?

Clearly define the terms of use for the data so that potential users may understand how they can utilize the data.

When depositing data, you can select the terms based on its copyright status, like a CC0 Public Domain Dedication waiver, a Creative Commons license, or a customized data use agreement.

Check the  Licensing your dataset in the U of G Research Data Repositories guide for details on choosing terms.

Recommended resources

Include supplemental documentation with your data

What is supplemental documentation?

Supplemental documentation provides all necessary information for comprehending, evaluating, analyzing research data, and for reproducing research results without contacting the data creator. It can include a readme file, a codebook, a data dictionary, user guide, commented script file and a data management plan (DMP). You can find a Readme template in the How to deposit research data in the University of Guelph Research Data Repositories guide.

What information should be included as supplemental documentation?

Supplemental documentation should:

  • Include basic descriptive information about the data such as citation details, data creators, research purpose, and temporal/spatial scope.
  • Explain naming conventions.
  • List folders and files with content descriptions and format/version details.
  • Clarify codes, acronyms, and abbreviations.
  • Highlight data file relationships and use instructions.
  • Outline data collection, processing, and analysis methodologies.
  • Provide variable level information including full variable names, descriptions, units of measure, coding explanations (if applicable), weighting information, and any additional notes.

Recommended resources

 

Suggest an edit to this guide

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.