Skip to Main Content

Organizing Your Research Data

Contributors: Carrie Breton and Lucia Costanzo

What is a README file?

The term “README” is likely inspired by the famous scene in “Alice’s Adventures in Wonderland” where the main character Alice is presented with the “Eat Me” and “Drink Me” treats. The use of README files dates to as early as 1974, and they have since evolved into essential components of datasets.

A README file serves as data documentation, which accompanies metadata and data files. It acts as an explanatory guide and offers an overview of your research project, detailing the contents and structure of the folders and files, and providing instructions for getting started with the dataset. The primary goal of a README file is to ensure both you and future users can understand and effectively use the dataset for years to come.

A README file should be in plain text format (e.g., .txt) to ensure it can be easily opened and read by anyone. It is a portable document that accompanies the data files, whether you are sharing a project folder with a colleague or depositing your final dataset in an online repository for others to download and reuse.

What information should be included in a README file?

A README file should include three main components:

  1. Project-level information: Contextual information about the research project.
  2. File-level information: Details related to individual data files.
  3. Variable-level information: Descriptions of the variables found in data files.

Core elements of a README file:

  • General information (Project Overview):
    • Title of the dataset.
    • Persistent identifier (e.g., DOI) of the dataset.
    • Long-term contact information where questions or clarifications about the dataset may be sent.
    • Author information, including ORCID identifier and affiliation details.
    • Brief description of the dataset’s content and scope.
    • Date of data collection.
    • Geographic location of data collection.
    • Funding sources and/or collaborator information.
  • Sharing/access information:
    • Restrictions or license information, including data access and terms of use details.
    • Citations for related publications (e.g., journal article, thesis, or dissertation).
    • Instructions on how to properly cite the dataset.
    • Citations for data sources if secondary data is included.
  • Folder and file overview:
    • A list of all files or folders included in the dataset, along with brief descriptions.
    • An explanation of the relationships between files.
  • Data-specific information:
    • Variable names, descriptions, labels, units of measure, coding explanations, notes.
  • Methodological information:
    • Details on data collection, cleaning, and analysis methods.
    • Information about pre-processing, cleaning, or transformations performed on the data.
    • Specifications of instruments or software used.

Who should create and update a README file?

The README file should be created and updated by a member of the research project team, such as a data collector, data steward, or project manager.

Once the README file is created, it is recommended to have someone less familiar with the research review it. This reviewer can provide feedback on areas that need clarification or additional information to ensure the dataset is understandable and easy to use.

Why should you create a README file?

Benefits to creating a README file:

  • Ensures the data is understandable and reusable.
    • Creating a README file aligns with the FAIR Principles (Findable, Accessible, Interoperable, and Reusable).
  • Increases collaboration.
    • Making the data easier to use enhances collaboration with other researchers.
  • Improves project visibility.
    • Making the data more accessible and understandable increases the likelihood of citations.
  • Supports future use.
    • Creating a README file is helpful when revisiting your project. It helps you remember what you did and why and assists in answering questions about your work (e.g., from journal reviewers or users of the data).

When should you create a README file?

A README file should be created early in the project. This helps the project team understand the dataset, ensures smooth onboarding for new team members, and promotes consistency in project procedures and research data management (RDM) practices.

It should be updated regularly throughout the project to document significant milestones, updates or changes. Regular updates ensure that the README file remains accurate, detailed, and complete, making it ready for sharing with other researchers or for public release (e.g., depositing the dataset in a data repository).

How should you write a README file?

When writing a README file, follow these best practices:

  • Keep it simple. Use clear structure and straightforward formatting.
  • Use plain language. Write in everyday terms with short, concise sentences.
  • Define terms. Explain all terms, acronyms, and abbreviations to ensure clarity.
  • Only use alphanumeric characters.
    • Slashes (/), underscores (_), and hyphens (-) are acceptable.
    • Avoid using special characters (e.g., ~ ! @ # $ % ^ & * ( ) ` ; : < > ? . , [ ] { } ‘ “ |).
  • Use a template. Start with a standard README file template and adapt it for your dataset.
  • Follow a standard file naming convention.
    • Use a forced numbering system to ensure the README file appears at the top of your file list.
      • Example:
        • 001Readme.txt
        • 100A_Readme.txt

Example of a completed README file

Examples of datasets with well documented README files:

  1. Belanger, Catherine R.; Anderson, Maureen E.C.; Weese, J. Scott; Spence, Kelsey L.; Clow, Katie M., 2025, "Supplemental data for: Rabies antibody titres in imported dogs and a population of dogs residing in Ontario, Canada", https://doi.org/10.5683/SP3/6HUVRH, Borealis, V1.
  2. Byun, E., Rezanezhad, F., Slowinski, S., Lam, C., Saraswati, S., Wright, S., Quinton, W., Webster, K., Van Cappellen, P. (2024). Dataset for Examining the Effects of Nutrient Pulses on Biogeochemical Cycling in Subarctic Peatlands in the Context of Permafrost Thaw and Wildfires. Federated Research Data Repository. https://doi.org/10.20383/102.0712.
  3. Hampe, Beate; Gries, Stefan Th., 2025, "Replication Data for: Syntax from and for discourse II: More on complex sentences as meso-constructions", https://doi.org/10.18710/SIPOUV, DataverseNO, V1.

Recommended resources

Suggest an edit to this guide

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.