Our repository list is one of the ways in which Springer Nature has engaged with the research community on data sharing, several years before Open Science became mainstream. In this post Varsha Khodiyar looks back on the historical context for our repository list, how it has evolved over the years, what we have learned along the way, and what comes next for repository guidance in the ever evolving research data landscape.
Written by Varsha Khodiyar, Data Curation Manager
The recommended repository list began life as an extension of author guidance for the Nature portfolio data journal Scientific Data. The data associated with manuscripts submitted to Scientific Data must be made available for peer review, and so to assist authors who may be unsure where to deposit their data, the journal launched with an initial list of 60 commonly used public repositories.
In response to increasing requests for additions to the repository list from newly launched repositories, in January 2015 Scientific Data implemented a ‘repository evaluation questionnaire’. This allowed us to gather basic information about each repository in a standardized way which helped to ensure consistency of the information requested. The completed questionnaire was the starting point for a conversation between Scientific Data editors and repository managers (sometimes taking place over several months), eventually leading to one of the following outcomes for the initial request:
Some of the major reasons why a repository was considered unsuitable included:
The decision as to whether or not a repository was suitable for wider recommendation via the repository list, was based solely on the repository’s scope for data deposition. Many repositories are only open to data deposition from researchers associated with a particular institution, project, funder or nation. Although limited deposition repositories are not suitable for wider recommendation via the repository list, we promoted and facilitated their use alongside generalist repositories which do not place limits on who can deposit data. We include limited deposition scope repositories in our FAIRsharing collection, and have implemented policies to ensure inclusivity of these repositories. For example, in cases where a limited deposition repository is unable to facilitate anonymous peer review of embargoed data, Scientific Data facilitates data peer review via temporary deposition to an integrated repository and, after manuscript acceptance, ensures citation and referencing of the now public data at the limited deposition repository.
Following feedback on the usefulness of Scientific Data’s repository list for multiple stakeholders in the research community, in July 2016 we mirrored Scientific Data’s repository list for use by all Springer Nature journals, editors and authors. At the same time we continued to provide guidance and feedback upon request to newly launched repositories.
In many cases the repository evaluation process led to improvements made by repositories (sometimes over several months or years), for the ultimate benefit of repository users and the wider research community. As an illustration of the types of outcomes observed, we can consider 46 repositories evaluated in the period 2017-2021. Of these 46 repositories, 19 (41%) were unsuitable for one or more of the reasons detailed above. Of the 27 repositories suitable for primary archiving of data associated with a journal article, 21 underwent improvements to some aspect of the repository as a result of the feedback process.
The conversation around research data sharing and management has significantly changed since 2014. The vital role of data repository infrastructure to support an Open Science future is now very well established; for example there have been multiple large-scale research data infrastructure initiatives launched in recent years, EOSC in Europe, CSTCloud in China, and ILDA in Latin America. With research data management now firmly established as an integral part of the research process, we have also seen the data repository community formally codifying much of the good practice Springer Nature had been glad to advise on. From overarching repository frameworks to practical guidance for repositories on data citations, there are now many community-generated resources available to repository managers looking to develop and improve the resources they provide for their user communities. As a result of this welcome evolution in research data practice and guidance, Springer Nature’s repository guidance will move away from ‘recommending’ repositories to endorsing the core principles of good research data management and sharing. We will support and encourage our authors to explore the practical guidance and data sharing standards established within their own research community; for example see our recommendation for Proteomics data.
However, we will still uphold the following guiding principles on data publishing as initially implemented and refined at the Scientific Data journal:
We are proud to have played a pivotal role in the conversation on research data repositories, and the subsequent improvements in data sharing standards and practices. We look forward to continuing to play our part in improving Open Science practice for the benefit of our authors and the wider research community.