How can Blockchain technology support Open Science data infrastructures

To provide an illustration, we will refer to Blockcerts, a system developed for self-management of educational credentials, already in commercial exploitation. Blockcerts stores a fingerprint of an academic credential (= a hash of the underlying credential) on public blockchain. The solution relieves the issuing body, e.g. a university, from having to verify the correctness of credentials each time it is accessed. Using this technology, individuals can take control of their own credentials through the possession of verified records, which they can use as needed.

Accordingly, we will suggest a similar use case describing how selected metadata associated to an open science (OS) data set can be securely stored and shared on a blockchain, and in this way offer a mechanism for all relevant parties to be able to verify the correctness of the metadata. Relevant metadata can include data descriptors, author identification credentials, and possible licenses or other conditions for use. These metadata may be produced by the responsible researchers themselves, their institution or their publisher, and/or other relevant (authorised) bodies. In addition, a digital fingerprint of the dataset itself may be stored on-chain.

Metadata can e.g. be organised according to different categories and formatted using the Dublin Core metadata standard. A “fingerprint” of the metadata is calculated and stored on the blockchain together with the hash of a certificate identifying the responsible research organisation. The science data itself is not stored on the blockchain; only the address pointing to the data is part of the metadata. It is, however, important to note there needs to be a qualified assessment of the authenticity of the certificate when the fingerprint is first uploaded to the blockchain, most likely by the certificate issuer.

For a more robust and immutable storage of OS data a distributed solution such as the Interplanetary File System (IPFS) may be used, which implies that identical data is spread over a number of individual nodes (PCs) to prevent blocking or tampering of the data, but also to offer better accessibility. Such nodes can be part of an EU OS cloud infrastructure. This is, however, optional and the described process here will work just as fine with centrally stored data.

To summarize, the use of this system would involve the following steps:

  • A fingerprint of the OS data is stored as part of other relevant metadata related to the OS data
  • Selected metadata is hashed and stored on the blockchain, along with a digital certificate of the owner e.g. a research performing organisation (RPO)
  • The OS data together with the metadata is stored on a distributed system such as IPFS (or it could be stored centrally, e.g. components of the EOSC – European open science cloud)
  • The correctness of the metadata, and thus the research data, can be assured by comparing a “fingerprint” of the metadata obtained with the “fingerprint” stored on the blockchain
  • The correctness of the research data itself can be assured by comparing a fingerprint of the data with the fingerprint stored as part of the metadata

Important issues that are relevant when designing an OS data infrastructure including the use of Blockchain technology (BCT) will be:

  • The degree of decentralization and distribution of data
  • The capability of the participating RPOs to deliver digital certificates covering authenticity of the research data, compliance of the research protocols and other processes having yielded the data to the best RE/RI standards, etc.
  • The capability of OS research and data infrastructures to manage certificates and more generally the use of the BCT
  • The choice between permissionless versus permissioned blockchains, and public versus private blockchains
  • Transparency versus confidentiality
  • Authentication/authorisation, access control and logging of its use
  • Legal and ethical questions linked to auditability, accountability and responsibility etc.

Processes and procedures needed when parts of the research data has been updated


This passage is part of D6.3: Comparison of existing blockchain technologies to safeguard responsible OS written by Arild Johan Jansen & Svein Ølnes.