Skip to Main Content

Licensing Datasets in the U of G Research Data Repositories

How does copyright apply to research data?

Copyright grants time-limited exclusive rights to the copyright holder of an original work of authorship. These exclusive rights allow the copyright holder to decide how the work may be used by others.

Ideas and facts are not protected under Canadian copyright law. Only the creative expression of ideas and facts may be copyright protected.

For instance, the following pieces of information, would not be copyrightable:

  • A value collected using an instrument or sensor, such as an hourly air temperature reading or the concentration of heavy metals in a soil sample.
  • The age, gender, and occupation of an individual collected through the administration of a survey.

Example: factual information and copyright

I have administered a survey to collect community perspectives about on-street parking of trailers. The response data is presented in a tabular (spreadsheet) format with a simple readme file describing the survey questions and data coding values.

Simple table of coded data for a survey conducted on on-street parking of trailers.

Readme file that presents full survey questions and coding explanation for a survey conducted on on-street parking of trailers.

The spreadsheet and readme file are examples of factual data and would not be copyright protected as they are not original creative expressions of authorship.

Are any materials within a dataset copyrightable?

While factual information presented in a simple table or a readme file may not be protected under copyright, there are many types of research products that could compose a research dataset and some of these materials may be copyrightable such as photographs, audiovisual recordings, complex visualizations, databases, etc.

Therefore, different components of a dataset may have different copyright status and should be treated and marked as such.

Example: layers of copyright in a dataset

My research involves determining crop yield for maize (corn) grown under different herbicide treatments.

For each experimental treatment, I collect and record into a spreadsheet measurements of plants per acre, ears per plant, rows per ear, kernels per row, and kernel weight. This simple table of experimental data is composed of factual data. Using the data in this table, I design an infographic that explains the findings in plain language and pictorial form. I also took photographs throughout the experiment of the plot preparation, planting, crop growth, harvesting, and laboratory analysis process that I used to create a set of field diaries that I annotated with descriptions, personal reflections, and discussions.

  • Spreadsheet data – Factual data that is not protected under copyright.
  • Infographic - Creative expression of my research findings that may be protected under copyright.
  • Photo diary – Creative expression of my research process that may be protected under copyright.

Can a license be applied to a dataset?

If a dataset (or components of your dataset) is protected by copyright, it can be licensed by the copyright holder. Before licensing and sharing a dataset in the U of G Research Data Repositories you should ensure that you have:

  • The rights necessary to license the dataset.
  • Secured agreement and/or permission from any co-creators to license and share the dataset.

Can a Creative Commons license be applied to a dataset?

Creative Commons (CC) licenses provide the copyright holder of a work a mechanism to enable the copying, sharing, use, and remixing of their work by others, while retaining their copyright.

It is important to note that CC licenses are built on top of copyright and therefore they can only be applied to work that is copyright protected. If your dataset, or components of your dataset, are copyrightable, you may consider applying a CC license to your dataset.

There are six CC licenses to choose from including (in order of least to most restrictive):

  • Creative Commons Attribution (CC BY).
  • Creative Commons Attribution-ShareAlike (CC BY-SA).
  • Creative Commons Attribution-NonCommercial (CC BY-NC).
  • Creative Commons Attribution-NonCommercial-ShareAlike (CC BY-NC-SA).
  • Creative Commons Attribution-NoDerivatives (CC BY-ND).
  • Creative Commons Attribution-NonCommercial-NoDerivatives (CC BY-NC-ND).

Each CC license requires the user to attribute the creator / original work. To decide which license best fits your needs, you should then carefully consider the types of uses you want others to be able to engage in, such as:

  • Can they use the work for commercial purposes?
  • Can they modify, adapt, or remix (i.e., create derivatives of) your work?
  • Are works derived from your work required to be shared under the same (or compatible) license as your work.

Once you have answered these questions, you can select the CC license that best aligns with your desired permissible uses. For more detailed information about CC licenses, please refer to the Use Creative Commons Licenses guide or book a Publishing and Author Support Appointment.

How do CC licenses impact dataset reusability?

When selecting a license for your dataset, keep in mind that the license you choose impacts how and for what purposes the dataset can be used. Carefully consider how the permissible uses allowed under a ShareAlike, a NonCommercial, or a NoDerivatives CC license will impact the reuse potential of the dataset.

ShareAlike (SA)

Works based on modifications, adaptations, or remixes of the original SA CC-licensed work must also be distributed under the same SA (or compatible) license.

NonCommercial (NC)

The original dataset cannot be used for commercial purposes. Note that the NC element is related to the nature of the use, not the nature of the user. A commercial/for-profit entity could use the dataset, if they do not use it in a manner that garners market benefit or financial compensation.

NoDerivatives (ND)

Modifications or adaptations of the original dataset cannot be publicly distributed or shared.

Can a Public Domain Dedication waiver be applied to a dataset?

A Public Domain Dedication (CC0) waiver facilitates making copyrighted works fully open access. CC0 is not a license, but it is a legal tool that can be used by copyright holders to waive their copyright to their original work to the greatest extent possible, thereby dedicating the work to the public domain. CC0 works are available to the widest audience of users for the broadest types of uses (including reuse and adaptations for commercial and non-commercial purposes).

CC0 is an important tool for scientific research data as it:

  • Encourages open sharing of data, free of restrictions on use and barriers to access.
  • Removes uncertainty and ambiguity for potential users of the data regarding what they can and cannot do with the work.
  • Encourages maximum (re)use and sharing of research products, which in turn supports collaboration, knowledge sharing, transparency, and reproducibility.

CC0 is used by default in a growing number of research data repositories including the University of Guelph Research Data Repositories, hosted in Borealis, the Canadian Dataverse Repository.

How do I decide if a license or waiver meets my needs?

Selecting a license or waiver that permits copying, distribution, modifications, and adaptations for both commercial and non-commercial purposes is encouraged. The Public Domain Dedication (CC0) and the Creative Commons Attribution 4.0 International (CC BY) license are options which support and encourage open sharing and reuse of datasets.

If you wish to apply a CC license to your data, carefully consider the allowable uses under each CC license, then select the license that supports you in making your data as open as possible or as closed as necessary.

What if I just want to be attributed for my work?

All six CC licenses require attribution if the dataset has been used in any manner. If you are seeking to make your dataset open for the broadest set of users and uses possible, and simply wish to be attributed for your work, CC0 may be a suitable option for you.

While not legally required or enforceable for datasets released under CC0, community norms and good scientific practice encourage attribution and citation of these datasets. Community norms also encourage users to reference any dataset they have made use of with a full citation, including a persistent link (such as a digital object identifier) to the original dataset.

Can I apply a license to an adapted dataset?

An adapted dataset is one that is formed by combining, modifying, and/or remixing other copyrighted datasets to produce a significantly new dataset that is, itself, copyrightable. The adapted dataset can include your own primary datasets, copyrighted datasets you have been given permission to reuse and adapt, CC licensed datasets, datasets under CC0 or marked as in the Public Domain, or even Open Government licensed datasets.

When selecting datasets to include in an adapted dataset, you must ensure that the terms of use of the datasets chosen are compatible with each other. Additionally, if the resulting adapted dataset is copyrightable, the license you choose for the adaptation must be compatible with the individual licenses on the original datasets you have used in your adaptation. 

If you have questions about selecting compatibly licensed datasets to include in your adaptation or have questions about selecting an appropriate license for your adapted dataset, please book a Publishing and Author Support Appointment.

Can I apply a license or waiver to a controlled access dataset?

CC licenses and Public Domain Dedication (CC0) waivers support open access of creative works. CC licenses and the CC0 waiver should not be applied to works that will be accessed controlled as the right to copy and redistribute the work in any media or format is given by all six CC licenses and the CC0 waiver which is a direct contradiction of access controls.

Example: managed access dataset and licensing

I have decided to deposit my dataset in the U of G Research Data Repositories. In accordance with the terms of my research funding, I must control access to the dataset, so I have opted to place the dataset under restricted access and potential users must request access of which I may approve or reject. I am also considering assigning a Creative Commons Attribution 4.0 International (CC BY) license to the dataset.

The CC BY license allows a user to copy and redistribute the work. Therefore, once I give someone access to my restricted dataset, they are then free to copy, redistribute, and share the dataset with others without further obligation other than the requirement to attribute my work. Once I have provided an individual access to the dataset, by terms of the CC BY license, I can no longer control how and where the dataset is copied, (re)distributed and shared by that user. Therefore, managed/controlled access and the conditions of use under a CC BY license contradict each other.

How do I apply a license or waiver in the U of G Research Data Repositories?

The option to select a CC license or CC0 waiver is built into the U of G Research Data Repositories platform. When depositing in the U of G Research Data Repositories, all new datasets have CC0 applied to them by default.

Alternatively, you have the option to update the dataset terms of use and apply any of the six CC licenses to the dataset. 

You may also define custom terms of use, such as adding a Public Domain Mark.

You can set the terms of use for your dataset by:

  • Selecting either the CC0 or any of the six CC licenses ‘dataset template’ in the New Dataset form when you are creating a new dataset.
  • Selecting CC0 or any of the CC licenses from the drop-down menu in the ‘Terms of Use’ metadata field under the dataset Terms tab.
  • Adding the CC license or CC0 waiver HTML code (including code for the license/waiver name, link to the license/waiver deed, and the license/waiver badge) in the Terms of Use field under the dataset Terms tab.

Once applied to the dataset, the CC0 or CC license badge displays on the landing page for the dataset. Additionally, the full CC0 tool/CC license name, link to the legal deed, and badge are displayed under Terms tab of the dataset record.

How do I mark dataset materials that are under different copyright and license status?

You can customize the Terms of Use fields (e.g., terms of use, special permission, conditions, etc.) under the Terms tab of the dataset record to clearly define what materials in the dataset are under copyright and therefore licensed under your chosen CC license and what materials are not.

For assistance with formatting and inputting a customized terms of use statement for your dataset, please book a Publishing and Author Support Appointment.

Resources

Suggest an edit to this guide

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.