Skip to Main Content

U of G Research Data Repositories

Contributors: Carrie Breton, Lucia Costanzo, and Kaitlyn DeWeerd

What types of content can be deposited in the Data Repositories?

Acceptable content includes primary or adapted research datasets. According to the Canadian Tri-Agency, research data refers to the materials “used as primary sources to support technical or scientific enquiry, research, scholarship, or creative practice”.

Research data can be generated throughout the course of a research project, graduate work (e.g., major projects, theses, dissertations), experiential learning or capstone projects.

Examples of research data include:

  • Audiovisual files
  • Finding aids, information maps and bibliographies
  • Geospatial data
  • Images (e.g., photographs, micrographs, slides)
  • Interview recordings and transcripts
  • Marked up digitized historical documents (e.g., diaries, letters)
  • Marked up text corpora
  • Models, algorithms, applications, codes, scripts
  • Questionnaires, transcripts, codebooks, data dictionaries
  • Sequence data
  • Tabular (spreadsheet) data
  • Web archives

What are the components of a dataset?

Datasets submitted to the Data Repositories should be composed of:

  • Data files.
  • Descriptive metadata.
  • Supplemental documentation (e.g., readme file, codebook, data dictionary, user guide, commented scripts/codes).

What types of data file formats are supported?

Data files should be deposited in open, platform-independent, nonproprietary file formats, whenever possible. Files should also be saved in an uncompressed and unencrypted format for long-term accessibility.

Recommended file formats

The following list includes common, recommended file formats but is not exhaustive. If you are using a different file format and are unsure about its long-term sustainability, please contact us for support.

Audio / Video

  • MP3 (.mp3)
  • WAV (.wav)
  • Free Lossless Audio Codec (.flac)
  • MPEG-4 (.mp4)

Images

  • JPEG 2000 (.jp2)
  • TIFF (.tiff, .tif)
  • PNG (.png)
  • JPEG (.jpg, .jpeg)
  • Scalable Vector Graphics (.svg)

Geospatial data

  • GeoTIFF (.tif)
  • Geography Markup Language Encoding Standard (.gml)
  • GeoJSON (.geojson)

Linked data

  • RDF/XML (.rdf)
  • Turtle (.ttl)
  • NTriples (.nt)
  • JSON-LD

Tabular data

  • Comma separated values (.csv)
  • Tab separated values (.tsv, .tab)
  • OpenDocument – spreadsheet (.ods)

Textual documents

  • Plain text (.txt)
  • OpenDocument – text (.odt)

Other

  • Extensible Markup Language (.xml)
  • Hypertext Markup Language (.html)
  • NetCDF Network Common Data Form (.nc)
  • R (.rdata, .rmd)
  • Sequence data (.bam, .fasta, .fastq,.seq)
  • Uncompressed ZIP (.zip)
  • Web archives (.warc)

Accepted file formats

The following formats are accepted as they are widely used, however they are either proprietary or system / application dependent. We cannot guarantee the long-term sustainability of these formats. In cases where files are submitted in an accepted file format, Data Repositories staff may convert the file to a recommended format where appropriate.

  • AVI (.avi)
  • Microsoft Word (.docx)
  • Microsoft PowerPoint XML (.pptx)
  • Microsoft Excel XML (.xlsx)
  • Quicktime (.mov)
  • SAS (.sas; .7dat; .sd2; .tpt)
  • Shapefile (.shp, .shx, .dbf, .prf)
  • SPSS (.dat, .sav, .sps)
  • STATA (.dat, .do)
  • Tagged PDF (.pdf)
  • Windows Bitmap (.bmp)
  • Windows Media Audio (.wma)

Are there any limitations on what types of research data can be deposited?

The Data Repositories are not suitable for:

  • Active storage or collaborative sharing during your research project.
  • Scholarly outputs like published articles, reports, protocols, theses and dissertations (for these resources, please refer to The Atrium).
  • Large volume data (e.g., datasets ranging from hundreds of gigabytes to terabytes.
  • Individual files that are greater than five gigabytes (5 GB).
  • Un-anonymized confidential data.
    • Please note, the data creator/depositor is responsible for obtaining Research Ethics Board (REB) approval and informed consent from research participants before sharing anonymized data via the Data Repositories.
  • Un-adapted secondary data.
    • Please note, if datasets contain adapted secondary data, the depositor must be able to provide proof of permission from the original source/creator or link to terms of use that allow for the deposit and sharing of the adapted data.
  • Third party/subscription-based datasets acquired from a data aggregator who does not collect the data themselves but compiles it from other sources (e.g., repurposed census data as aggregated by DMTI Spatial Inc.).

If your dataset falls into any of these categories, please book a Publishing & Author Support appointment to discuss your use case.

Are there any deposit requirements?

Submitted datasets must be organized, structured, and described in a manner that promotes the findability, accessibility, interoperability and reusability of the data prior to deposit.

Best practices for preparing your dataset include:

  1. Select and structure your data.
  2. Organize and name your folders and files consistently and clearly.
  3. Save your files in preservation-friendly formats.
  4. Describe your dataset with metadata.
  5. Include supplemental documentation (e.g., readme file) with your data.

For assistance with preparing your dataset for deposit in the Data Repositories, refer to the following resources:

If you have any questions or need assistance with preparing your dataset for deposit or using the Data Repositories, please book a Publishing & Author Support appointment.

Dataset accessibility compliance

Depositors are responsible for ensuring that all materials deposited in the Data Repositories meet accessibility standards under the Accessibility for Ontarians with Disabilities Act (AODA).

Available resources in support of making your dataset accessible include:

Not all content in the Data Repositories is fully accessible. If you would like to request an alternate version of any content, please use the Library Resource Alternate-Format Request Form.

Suggest an edit to this guide

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.