Challenges for Open Science Infrastructures
Challenges for Open Science Infrastructures
Examining the challenges related to research ethics, to research integrity and to the FAIR principles arising for Open Science Infrastructures
Based on the workshops that took place within the framework of the ROSiE project, the challenges encountered by OS Infrastructures (OSI) were examined. We classified those challenges in three main categories, according to the ROSiE research objectives: 1) challenges mainly related to research ethics (RE); 2) challenges mainly related to research integrity (RI) and 3) challenges mainly related to the FAIR principles. All challenges have RI implications and several challenges are interrelated.
1. Challenges related to research ethics
The challenges mainly related to research ethics (i.e., related to data collected on human subjects) were:
“The whole principle of open data is that you get unexpected results from open data, as people see new patterns that you don’t necessarily see in the first instance. So, giving a consideration on how you do that for really successful citizen science projects is absolutely crucial in my mind[1].”
“Our ethics committees are not equipped to help us with such complicated cases and methodology”
“The complaint about broad consent had been that it kind of limits the tissue donors’ autonomy because the particular uses of the data is not transparent to them”
“I searched for DMP models in order to benchmark how others had addressed these issues. Another important feature [of data sets] is that they often contain private information that cannot be made public. The available DMPs are not always super useful[2].”
This challenge is highly dependent on the nature of data (i.e., sensitive or personal data, digital or physical data) and thus indirectly dependant on the discipline involved. Those limitations are legally grounded in Europe – e.g., with GDPR requirements.
“I think the tagline is ‘as open as possible and as legally as necessary'’”
Privacy and Confidentiality may thus limit data accessibility and reusability if not properly addressed ab initio, which they cannot always be.
2. Challenges related to research integrity
“There’s always this hesitation: that releasing the data, somebody might discover something really interesting and then, you know, you put all that effort in and somebody else gets the Nobel Prize.”
Although this challenge is not specific to OS, the OS context (namely, wider dissemination of data, diversity of possible actors having access to it etc.) can increase the impression of a conducive environment for this issue to occur. This perception is important to be aware of, as it may constitute an obstacle to the opening of data.
Sometimes, what is difficult is to give the appropriate credit to all individuals that contributed to the data collection, for example for historical data:
“So usually, let’s say during a colonial expedition, the scientist were primarily the European scientists, when a lot of local helpers and local experts and local scientific helped them to collect the data. And they’re usually not mentioned. […] So how do you connect those things and find those persons and provide attribution to and give them the credit to that?”
Data curation is one of the 14 types of contribution to research recognized in the CRediT taxonomy[3], but a formalized system is missing to acknowledge the contribution of ‘data curators’ for instance when citing a paper: they are at the best mentioned in a footnote or in the ‘Acknowledgment’ section.
“You have identified a fundamental issue about how you guarantee that the quality of the data is high. And it is an issue that we, as a citizen science community, are going to have to solve if we want to see the scale of our activities growing.”
Part of this challenge is related to data collection and dissemination: to limit missing information and human errors (more likely to occur with a high number of users) and ensure that the data was not fabricated or falsified. Ensuring data quality and integrity is not only a challenge for OS, but OS practices amplify the impact of a lack of quality or integrity by collecting and disseminating data at large – even inaccurate or otherwise unsuitable data.
3. Challenges related to some of the FAIR principles
“Using images or video to support a discussion implies questions of copyright. Very often, in disciplines like history of art, free circulation of articles and publications is restricted, or it would imply removing images that are fully part of the scientific argument.[5]”
The difficulty for OSI to secure long-term funding can also hamper open access: the underlying question being how to reconcile the requirements of open access with the need for incomes (for example, editor’s incomes). The question of the possibility for open data to be freely accessible (without charging users) has been raised by some participants of the workshops. Access limitation may also differ according to the origin of the funds (public or private) and the specific requirements associated with their use (for example, geographical restriction). A need for guidance and policy in order to ensure the appropriate use of public fund in this situation was raised:
“We have some right concept […] and some proper policies that ensure that we don't waste taxpayers’ money but that we have it used according to the aims that we have and towards what we are supposed to deliver for getting this money that we have from the countries from commission and ultimately from taxpayers. So for me, this is still also a point to consider when you talk about responsible Open Science”.
Finally, some ethical and legal requirements may challenge open access (for example, for sensitive data, see Privacy and Confidentiality).
“I mean, we are dealing with a similar situation and our extra challenges is that we were working with 270 different museums across 23 different countries. So everybody has their own local practices and then we have to come up with the European practice and sometimes they don't match.”
“So the main problem there is, as I understand it, is if there was unity of the regulations across countries, there are also some cultural differences. Then how do you implement everything into an infrastructure research that tries to enable some transnational work?”
This challenge is strongly interrelated to the re-use challenge.
“So, making it available, you know, more generally, even not to the public, but just to a wider scientific community is not only difficult but also maybe not so meaningful because nobody else besides the people who actually gather the data, know how to analyse it.”
“Originally, sometimes, they were written for microbes (for instance). And so people working on plants or animals were interpreting as they wanted and it makes them very heterogeneous in the end on the dataset.”
“If you have metadata that, for instance, contains references on specific lab equipment that is used to create that data, and then one is state-of-the-art and another is not seen as state-of-the-art, then this devalues the data automatically through the metadata, and it's again a question of resources that some have and some don’t.”
“I was just thinking about […], you know, partnerships that you have within Europe... Do they tend to be, you know […] researchers from resource-rich institutions or do you feel that that's really that your resources so therefore anyone to use […], also from resource-poor areas.”
[1] All quotes provide from the discussions in the workshops.
[2] Translated in English by the authors of the report.
[4] Research papers may be considered as data, especially for Humanities and social sciences.
[5] Translated in English by the authors of the report.
This passage is part of D6.2: Final analysis and mapping of existing European and national OS infrastructures with regard to promoting responsible OS written by Carole Chapin, Nathalie Voarino.