"It won't be long before open access is old hat, taken for granted by a new generation of tools and services that depend on unrestricted access to research literature and data. As those tools and services come along, they will be the hot story. But historians will note that they all depend on open access and that open access was not easily won."
Free and libre/open source software (F/LOSS) movements have spawned similar solutions in many other contexts, each at differing stages of development. As F/LOSS enters the routine and familiarity of middle age, the open content movement--open source for non-software copyright and best embodied by the work of Creative Commons --has just graduated university and is getting a feel for the world. Even younger is the open data movement, whose legal tools have just started to come online.
Some may be surprised to learn that data and databases are not a "rights free" area where no intellectual property rights (IPRs) apply. For example, the Agreement on Trade Related Aspects of Intellectual Property Rights (TRIPs) requires that members of the World Trade Organisation, including Canada, the United States, and the UK, provide legal protection for databases.
Rights covering databases can include:
- copyright: both for the selection and arrangement of the database contents and over the contents of the database itself (the data), though factual information will generally not be protected by copyright
- database rights: the European Union's Database Directive requires member states to implement a " sui generis database right" covering the extraction and re-utilisation of the contents of protected databases (Editors note: there is no North American equivalent to this directive)
- contract: contractual obligations about what users can and can't do with a database and its contents can also be used to provide for protection
- other rights: rights such as trade secrets and laws of unfair competition can also protect databases
This rights thicket protecting databases and data can form a significant obstacle for the use and re-use of data. This is true for both the scientific community wishing to expand knowledge through use of others' data and for the Internet and research community with aims to enable the semantic web.
Open Data Commons
With the funding and support of the information management company Talis, the Open Data Commons project (ODC) was founded in the autumn of 2007 to provide legal tools for sharing data. This project started through funding licence development by Jordan Hatcher and Dr. Charlotte Waelde of the University of Edinburgh. This resulted in the creation of the Public Domain Dedication & Licence (PDDL) legal tool which will be maintained by the Open Knowledge Foundation, a not-for-profit organisation promoting open knowledge. The PDDL dedicates the data and databases to the public domain, a position that offers a wide degree of flexibility for users of data and helps freely enable semantic web projects based on using large amounts of data.
Open Data Projects
Many projects of interest to the F/LOSS sector involve open data. These include:
Neurocommons: a Science Commons project which integrates data in the neurosciences.
CKAN: the Comprehensive Knowledge Archive Network, a registry of open knowledge projects maintained by the Open Knowledge Foundation and analogous to the freshmeat site for F/LOSS software.
Open Street Map: this site collaboratively produces open geodata.
Freebase: "an open database of the world's information" containing data from Wikipedia as well as US Government information.
Open data is also of enormous importance in the scientific community, where access to research data brings up many of the same issues as open access to scientific publications. For an overview of open access, see Peter Suber's introduction.
Science Commons Protocol
Science Commons was founded in 2005 and works on a variety of projects investigating rights issues related to scientific research. These include access to published research papers, material transfer agreements, and Neurocommons, a project creating an open source knowledge management platform for biological research. Science Commons is a project of Creative Commons and is overseen by its board. In December 2007, Science Commons released their Protocol for Implementing Open Access Data. This protocol, written in the same style as a Request For Comment (RFC), outlines a legal standard for open access to data based on three principles:
- the protocol must promote legal predictability and certainty
- the protocol must be easy to use and understand
- the protocol must impose the lowest possible transaction costs on users
Guided by these three principles and Science Commons' experience in maintaining their database FAQ on Creative Commons licences and data, they arrived at an approach that calls for waiver of relevant IPRs so that data could be treated as close to being in the public domain (without IPRs) as possible. Thus the protocol calls for waiver of:
- the sui generis database right in the European Union mentioned above and similar protections
- implied contract rights and rights in tort or delict such as unfair competition or trade secrets
This protocol gets enforced through the use of an "Open Access Data Mark", which will be managed by Science Commons and the sister organisation Creative Commons. They will limit use of the mark to licensing schemes that comply with the protocol, so that users can be assured that the data labeled with the mark meets the criteria of waiving IPRs. The Science Commons protocol thus sets a standard that any licensing scheme can implement.
Implementation in Open Data Commons
In implementing the Science Commons protocol, the ODC project set goals of:
- making the protocol international
- writing the legal document in plain language
- clearly stating what rights were and were not covered
From experience in the F/LOSS and open content communities, the ODC team thought it important to create a legal text as accessible as possible to its users. In terms of drafting style, ODC uses the same approach as the GPL in including such elements as a preamble, as well as the plain language approach of the Scottish implementation of the Creative Commons licences. Drafting efforts also drew heavily from the original Science Commons FAQ on databases, the Creative Commons unported licences, and the first generation Talis Community Licence. The result is the Open Data Commons Public Domain Dedication and Licence ( ODC-PDDL) and an accompanying Community Norms document.
Copyright law as it relates to waiving copyright is unclear. No international treaties, such as the Berne Convention, set a standard for waiver of copyright. Indeed, it is unclear whether or not copyright can be waived in the United Kingdom, the physical home of the ODC project, and the same could be said to be true of the EU's sui generis database right. As a result, ODC decided on a two prong approach to implementation:
- waiver of database rights and copyright for jurisdictions that allow for it (see PDDL Sections 3.1 and 3.2)
- licensing of the rights for jurisdictions that do not allow for waiver (see PDDL Sections 3.3)
This approach accommodates the many different jurisdictional approaches to copyright law throughout the world while still setting the goal of waiving rights.
Moral rights arise in some jurisdictions in connection with the creation of a copyrighted work. Because databases may attract copyright, they may also attract moral rights. These protect the rights of personal (as opposed to corporate) authors over their association with a work, including the right to be identified as the author of the work, and the right to object to derogatory treatment of the work.
Moral rights can be waived in the United Kingdom according to Section 87 of the Copyright Designs Patents Act. However, waiver of these rights may be impossible in many jurisdictions, especially those following the author's rights approach common in civil law jurisdictions. Thus, while the PDDL waives moral rights in the work, Section 3.4 advises users that these rights may still nevertheless be present in the work.
Rights Addressed in the PDDL
The protocol clearly calls for the waiver or licensing of copyright and database rights, but these rights do not cover all the legal rights that could be potentially at issue in a database. The PDDL approach specifically excluded patent rights and trademarks. Patents could have been included in the same style as the GPLv3 which requires software patent holders to license these rights as they relate to GPLv3 licensed software. However, waiver or licensing of patent rights are not required under the protocol and would have greatly limited the utility of the PDDL. The exclusion of patent rights is included in Section 4.0 of the PDDL.
With regards to trademarks, Section 4.0 of the PDDL provides that the creator of the database should be able to maintain any marks associated with their own use of the database, even if they allow others to use the underlying database or data. In all cases, it was important that the provider of the data under the PDDL be placed in the same position as anyone else using the data.
The Science Commons protocol calls for waiver of unfair competition in Section 4.1. Unfair competition in US law, home of Science Commons, broadly refers to a group of distinctly different rights of action, including:
- trade secrets
- publicity rights
- trade mark claims passing off, a rights similar to trademark and based on the goodwill of a business
- deceptive advertising
- other kinds of unfair methods of competition
As you can imagine, the areas outlined above have a variety of different legal requirements and fit differing social policies. These rights of action, however, all involve using some aspect of a business without permission. Because the PDDL grants permission to use the data, specifically addressing this area did not seem to be required, and the ODC team hasn't heard any feedback that the PDDL does not adequately address unfair competition.
As an example, the law protects secret or confidential information, and trade secrets come under the umbrella term for unfair competition. If you use the PDDL and make your data available via the Internet, this database is no longer a secret and thus addressing this again would be redundant.
Database versus its Contents
The PDDL can be applied to both a database and its contents or data, or to only a database without covering its contents, as follows:
- database and contents: the entire database and data are free to use and re-use under the waiver and licence of the PDDL
- database only: any rights, such as sui generis database rights, that would accrue by creating, maintaining, and designing the database are waived and licensed under the PDDL, but the contents remain under other licences
The option to include the database and data or just the database elements is implemented by the definition of "Work" in Section 1.0 of the PDDL.
The option to cover only the database and not the database plus contents is present in the PDDL so that users creating databases with information under varied rights status, such as Freebase's use of both US Government data in the public domain and Wikipedia content under the GFDL, can apply the PDDL to only any rights present as a result of their creation of a database.
Community Norms Document
The PDDL works in conjunction with a non-binding Community Norms document. This document outlines in plain language a group of norms that users of a PDDL-licensed database should follow in order to create social obligations for data users. These norms include:
- reciprocal use of the PDDL: like the reciprocal or copyleft portions of the GPL or the Share Alike element in the Creative Commons licence, this norm asks users to release any changes to the database also under the PDDL
- attribution: this norms asks users to "give credit where credit is due
- open formats: we suggest the use of data formats that are accessible to all
- technical protection measures: technical restrictions such as digital rights management (DRM) are discouraged
The Community Norms document is flexible and adaptable to the norms of specific communities. Within the context of the scientific community, for instance, they could specify norms of citation and attribution relevant to their discipline, such as archaeology or biology. The use of a non-binding and flexible Community Norms statement forms part of the Science Commons Protocol by not creating a strict legal obligation for the data.
Creative Commons (CC) has also implemented the Science Commons Protocol with their own public domain tool, CCZero, based in part on their earlier work on the Public Domain Dedication currently available on the CC site. CCZero is at the same time an implementation of the Protocol for data and an expanded and clarified version of their public domain dedication. The CCZero tool applies to all types of content, not just data. The following comments are based on the beta CCZero tool available for comment and discussion at the time of this writing.
CCZero uses two underlying legal tools: one waives rights and the other asserts that rights do not exist. The waiver works for authors and rights holders and the assertion means that someone believes that the work has no copyright in the United States.
Both variations of the CCZero tool are based on US law, and CC anticipates that CCZero will continue to be "ported" to jurisdiction-specific implementations via their international affiliates. The Canadian version is available here. In comparison to CCZero, the Open Data Commons PDDL is:
- based exclusively around databases and data
- international in scope in one single document
- integrates seamlessly with the Community Norms statement
These differences primarily arise from the different focus and infrastructure of the ODC project, though both projects implement the Science Commons Protocol.
The Open Data Commons PDDL interoperates with CCZero via the CCZero assertion tool. Under the current CCZero beta tool, users go through a point and click process and, when prompted, enter in the reason why they believe that the information covered by CCZero has public domain status. At this point, the user can indicate that the PDDL covers the work as the reason for public domain status. This way, ODC can take advantage of the framework being developed for CCZero and the high profile of Creative Commons licensing activities. Users are not confronted with stand-alone licence silos where information covered by one licence cannot be integrated with information under another licence: the PDDL and CCZero fully integrate.
The end result of the Science Commons Protocol and the implementation by ODC are solutions for those wishing to further data integration projects and to openly share their data. The PDDL together with the accompanying Community Norms statement will be particularly useful for scientists wishing to share their research data. But scientists are not the only anticipated users, as government sector data services, and private companies involved in data generation and sharing will all have an interest--as both consumers and producers of data--in having an option that allows for use and re-use of databases without restriction. The goal of the ODC project is to grow with the support of its users to meet the need for accessible legal tools for the creation of a web of open data of all types.
Should you wish to support the ODC's efforts to create data licensing solutions either financially or with your time, please contact us.