publications

Send us a link

Subscribe to our newsletter

Knowledge in the dark: scientific challenges and ways forward

Knowledge in the dark: scientific challenges and ways forward

A key dimension of our current era is Big Data, the rapid rise in produced data and information; a key frustration is that we are nonetheless living in an age of ignorance, as the real knowledge and understanding of people does not seem to be substantially increasing. This development has critical consequences.

Certify Reproducibility with Confidential Data

Certify Reproducibility with Confidential Data

![][1] ILLUSTRATION: DAVIDE BONAZZI/SALZMANART Many government data, such as sensitive information on individuals' taxes, income, employment, or health, are available only to accredited users within a secure computing environment. Though they can be cumbersome to access, such microdata can allow researchers to pursue questions that could not be addressed with only public data ([ 1 ][2]). However, researchers using confidential data are inexorably challenged with regard to research reproducibility ([ 2 ][3]). Empirical results cannot be easily reproduced by peers and journal referees, as access to the underpinning data are restricted. We describe an approach that allows researchers who analyze confidential data to signal the reproducibility of their research. It relies on a certification process conducted by a specialized agency accredited by the confidential-data producers and which can guarantee that the code and the data used by a researcher indeed produce the results reported in a scientific paper. In general, research is said to be reproducible if the researchers provide all the resources such as computer code, data, and documentation required to obtain published results ([ 3 ][4], [ 4 ][5]). Reproducibility has the potential to serve as a minimum standard for judging scientific claims ([ 5 ][6], [ 6 ][7]). Recent promising initiatives to facilitate reproducible research include new research environments (e.g., Code Ocean, Popper convention, Whole Tale), workflow systems (e.g., Kepler, Kurator), and dissemination platforms or data repositories (e.g., DataHub, Dataverse, openICPSR, IEEE DataPort, Mendeley). The journal BioStatistics introduced a process in which an editorial board member aims to reproduce the results in a new submission using the code and data provided by the authors ([ 7 ][8]). Other examples of internal reproducibility assessments include the Applications and Case Studies of the Journal of the American Statistical Association ([ 8 ][9]) or the artifact evaluation process of the Principles and Practice of Parallel Programming (PPoPP) conferences. Alternatively, the reproducibility review can be outsourced to a third party, as in the partnership between the American Journal of Political Science (AJPS) and the Odum Institute ([ 9 ][10]). Yet, despite the proliferation of such efforts, current computational research does not always comply with the most basic principles of reproducibility ([ 10 ][11], [ 11 ][12]). Use of confidential data are often mentioned as a major impediment ([ 12 ][13]). Since the end of the 1970s, with the development of computers and a growing concern about privacy protection, legal frameworks regarding access to sensitive personal information have been reinforced in most countries. Though initially only direct identification (e.g., name, address, and social security number) was considered for defining confidentiality, the perimeter has been gradually enlarged to indirect identification that points to the use of multiple variables that potentially lead to a risk of identification ([ 6 ][7]). Given the extension of the legal spectrum of confidential data, a growing fraction of data used in science can fall into this category. One possible approach to engaging in reproducible research with confidential data is to generate synthetic data by applying an information-preserving but anonymizing transformation to the initial data ([ 13 ][14]). By contrast, all Public Library of Science (PLOS) journals request that whenever data cannot be accessed by other researchers, a public dataset that can be used to validate the original conclusions must be provided. An alternative approach relies on improving the accessibility of restricted data for researchers-for example, the research passport proposed by the Inter-university Consortium for Political and Social Research, or the DataTags framework developed within Dataverse and the multidisciplinary Privacy Tools Project. The natural tension between confidentiality and reproducibility could be alleviated by using a third-party certification agency to formally test whether the results in the tables and figures of a given scientific article can be reproduced from the computer code and the confidential data used by the researcher. A first attempt to implement such an external certification process is under way in France. Explicitly designed to deal with confidential data, the reproducibility assessment, like the original analysis, is conducted within a restricted-access data environment. In France, the Centre d'Accès Sécurisé aux Données (CASD) is a public research infrastructure that allows users to access and work with government confidential data under secured conditions. This center currently provides access to data from the French Statistical Institute and the French Ministries for Finance, Justice, Education, Labor, and Agriculture, as well as Social Security contributions and health data. The application process for researchers to get access to the CASD datasets takes around 6 months and requires a presentation of the research project before the French Statistical Secrecy Committee, which gathers data producers. Once the accreditation is granted, CASD creates a virtual machine allowing the researcher to access the specific source datasets required for the project, as well as the required statistical software. Remote access to the virtual machine is made possible thanks to a specific piece of hardware provided by CASD, which includes a fingerprint biometric reader. Since the inception of CASD in 2010, the question of allowing journal referees to get access to data used by researchers has been heavily debated. However, both the legal framework and the technical restrictions made intermittent and short-period access for referees difficult. The Certification Agency for Scientific Code and Data (cascad, [www.cascad.tech][15]) is a not-for-profit certification agency created by academics (all coauthors of this paper) with the support of the French National Centre for Scientific Research, foundations, universities, and local governments. During initial meetings between cascad and CASD teams, they quickly understood the mutual benefit of joining forces to design a reproducibility certification process for confidential data. Thanks to this partnership, cascad was granted a permanent accreditation by the French Statistical Secrecy Committee to all 280 datasets available on CASD. This first-of-its-kind accreditation was motivated by the fact that cascad was providing a solution to the long-recognized problem of the lack of reproducibility of research based on confidential data. Also key for the approval was that the whole certification process remains within the CASD environment and that no data can ever be downloaded. When an author requests a cascad certification for a paper, he or she needs to provide the paper, the computer code used in the analysis, and any additional information (software version, readme files, etc.) required to reproduce the results. Then, a reproducibility reviewer, who is a full-time cascad employee specialized in the software used by the author, accesses a CASD virtual machine that is a clone of the one used by the author. It includes a copy of the source datasets and of the author's computer code, as well as all software required to run the code. The reviewer executes the code, compares the output with the results displayed in the tables and figures of the paper, and lists any potential discrepancies in an execution report. In practice, such discrepancies can arise because of typos in the manuscript, numerical convergence issues, or differences in software package versions. This execution report is transferred to a cascad reproducibility editor, a senior researcher specialized in the author's research field, who ultimately decides on the reproducibility of the article. Lastly, a reproducibility certificate is sent to the author and is stored in the cascad database. An example of a study recently certified by cascad is one that proposes a direct measure of tax-filing inefficiency in French cohabiting couples ([ 14 ][16]). The analysis relies on the Echantillon Démographique Permanent , an administrative dataset only available through CASD that combines information from birth, death, and marriage registers; electoral registers; censuses; tax returns; and pay slips. All the tables and figures of the paper have been reproduced by a cascad reproducibility reviewer from the source datasets, and Python scripts provided by the authors [see certificate ([ 15 ][17])]. While complying with strict data confidentiality rules, the cascad-CASD partnership offers several advantages: (i) signal research reproducibility when data are confidential. The author can transfer the reproducibility certificate to an academic journal when submitting a new manuscript, similar to the reproducibility badges introduced by the Association for Computing Machinery; (ii) outsource reproducibility review. External certification enriches the peer review process, but this extra step is outsourced by academic journals, as in the AJPS-Odum partnership. The staff of the certification agency is specialized and has more time than editorial teams at academic journals; (iii) provide economies of scale to the research community. This model connects a data-provision organization (CASD, a single entry point to a large number of data producers) and a reproducibility certification organization (cascad, a single entry point to a large number of journals and researchers). The cascad-CASD model is a generalization of the standard reproducibility process in which a single researcher goes through the whole reproducibility process on his or her own, obtaining similar data and redoing the analysis; (iv) speed up reproducibility review. Any skeptical researcher can still seek to reproduce the work himself or herself, but unlike researchers who need to go through a 6-month application process, cascad reviewers benefit from a fast-track process (2 days) to access any data necessary to conduct the reproducibility assessment. The certification process is supposed to be completed within 2 weeks; (v) ease replication and robustness tests. Once certification is completed, the computer code and detailed information about the source datasets (metadata) can be publicly posted on the Zenodo archive and used to facilitate additional replication and robustness analyses. Undoubtedly, the biggest challenge for any new certification service is to build trust. To build trust and increase credibility, cascad implements a transparent and detailed certification process. For each certification, all the actions, interactions, and problems that occurred during the process are recorded. Furthermore, all operations carried out within the virtual machine by the reviewer are recorded and traceable. Once the reviewing process is over, the environment is closed, sealed, and archived. The recorded operations and output can be referenced externally and shared with journal editors after proper accreditation. Trust by data producers makes the process feasible, and trust by academic journals makes the process useful and worthwhile. Cascad has the trust of data producers at CASD, who want their data to be useful to society and accessible for reproducibility. Now cascad needs to convince researchers and journals to value its certificates. Overall, the experience with cascad thus far suggests that preserving confidentiality and privacy does not necessarily have to lead to opaque and nonreproducible research. 1. [↵][18]1. L. Einav, 2. J. Levin , Science 346, 1243089 (2014). [OpenUrl][19][Abstract/FREE Full Text][20] 2. [↵][21]1. C. Lagoze, 2. L. Vilhuber , Chance 30, 68 (2017). [OpenUrl][22] 3. [↵][23]1. A. Antoniadis, 2. G. Oppenheim 1. J. B. Buckheit, 2. D. L. Donoho , in Wavelets and Statistics, A. Antoniadis, G. Oppenheim, Eds., Lecture Notes in Statistics (Springer, New York, 1995), vol. 103, pp. 55-81. [OpenUrl][24] 4. [↵][25]1. V. Stodden et al ., Science 354, 1240 (2016). [OpenUrl][26][Abstract/FREE Full Text][27] 5. [↵][28]1. G. Christensen, 2. E. Miguel , J. Econ. Lit. 56, 920 (2018). [OpenUrl][29] 6. [↵][30]1. Y.-A. de Montjoye et al ., Sci. Data 5, 180286 (2018). [OpenUrl][31] 7. [↵][32]1. R. D. Peng , Science 334, 1226 (2011). [OpenUrl][33][Abstract/FREE Full Text][34] 8. [↵][35]1. M. Fuentes , Amstat News (2016); . 9. [↵][36]1. T.-M. Christian, 2. S. Lafferty-Hess, 3. W. G. Jacoby, 4. T. Carsey , Int. J. Digit. Curation 13, 114 (2018). [OpenUrl][37] 10. [↵][38]1. A. C. Chang, 2. P. Li , Am. Econ. Rev. 107, 60 (2017). [OpenUrl][39] 11. [↵][40]1. V. Stodden, 2. J. Seiler, 3. Z. Ma , Proc. Natl. Acad. Sci. U.S.A. 115, 2584 (2018). [OpenUrl][41][Abstract/FREE Full Text][42] 12. [↵][43]1. J. K. Harris et al ., PLOS ONE 13, e0202447 (2018). [OpenUrl][44] 13. [↵][45]1. B. E. Shepherd, 2. M. Blevins Peratikos, 3. P. F. Rebeiro, 4. S. N. Duda, 5. C. C. McGowan , Am. J. Epidemiol. 186, 387 (2017). [OpenUrl][46] 14. [↵][47]1. O. Bargain, 2. D. Echevin, 3. N. Moreau, 4. A. Pacifico , working paper (2019); . 15. [↵][48][https://doi.o rg/10.5281/zenodo.3256633][49]. Acknowledgments: We are grateful to the anonymous referees and to conference participants at the 2018 Conference of European Statistics Stakeholders (Bamberg, Germany) and 2019 Advances in Social Sciences using Administrative and Survey Data (Paris, France) for their comments. We thank the French Statistical Secrecy Committee and its president Jean-Eric Schoettl, the French National Research Agency (ANR-10-EQPX-17), the French National Center for Scientific Research (CNRS), the region Centre-Val de Loire, and the HEC Paris Foundation for their support. [1]: /embed/graphic-1.gif [2]: #ref-1 [3]: #ref-2 [4]: #ref-3 [5]: #ref-4 [6]: #ref-5 [7]: #ref-6 [8]: #ref-7 [9]: #ref-8 [10]: #ref-9 [11]: #ref-10 [12]: #ref-11 [13]: #ref-12 [14]: #ref-13 [15]: http://www.cascad.tech [16]: #ref-14 [17]: #ref-15 [18]: #xref-ref-1-1 "View reference 1 in text" [19]: {openurl}?query=rft.jtitle%253DScience%26rft.stitle%253DScience%26rft.aulast%253DEinav%26rft.auinit1%253DL.%26rft.volume%253D346%26rft.issue%253D6210%26rft.spage%253D1243089%26rft.epage%253D1243089%26rft.atitle%253DEconomics%2Bin%2Bthe%2Bage%2Bof%2Bbig%2Bdata%26rft_id%253Dinfo%253Adoi%252F10.1126%252Fscience.1243089%26rft_id%253Dinfo%253Apmid%252F25378629%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [20]: /lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjE2OiIzNDYvNjIxMC8xMjQzMDg5IjtzOjQ6ImF0b20iO3M6MjI6Ii9zY2kvMzY1LzY0NDkvMTI3LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ== [21]: #xref-ref-2-1 "View reference 2 in text" [22]: {openurl}?query=rft.jtitle%253DChance%26rft.volume%253D30%26rft.spage%253D68%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [23]: #xref-ref-3-1 "View reference 3 in text" [24]: {openurl}?query=rft.jtitle%253DWavelets%2Band%2BStatistics%26rft.volume%253D103%26rft.spage%253D55%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [25]: #xref-ref-4-1 "View reference 4 in text" [26]: {openurl}?query=rft.jtitle%253DScience%26rft.stitle%253DScience%26rft.aulast%253DStodden%26rft.auinit1%253DV.%26rft.volume%253D354%26rft.issue%253D6317%26rft.spage%253D1240%26rft.epage%253D1241%26rft.atitle%253DEnhancing%2Breproducibility%2Bfor%2Bcomputational%2Bmethods%26rft_id%253Dinfo%253Adoi%252F10.1126%252Fscience.aah6168%26rft_id%253Dinfo%253Apmid%252F27940837%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [27]: /lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEzOiIzNTQvNjMxNy8xMjQwIjtzOjQ6ImF0b20iO3M6MjI6Ii9zY2kvMzY1LzY0NDkvMTI3LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ== [28]: #xref-ref-5-1 "View reference 5 in text" [29]: {openurl}?query=rft.jtitle%253DJ.%2BEcon.%2BLit.%26rft.volume%253D56%26rft.spage%253D920%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [30]: #xref-ref-6-1 "View reference 6 in text" [31]: {openurl}?query=rft.jtitle%253DSci.%2BData%26rft.volume%253D5%26rft.spage%253D180286%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [32]: #xref-ref-7-1 "View reference 7 in text" [33]: {openurl}?query=rft.jtitle%253DScience%26rft.stitle%253DScience%26rft.aulast%253DPeng%26rft.auinit1%253DR.%2BD.%26rft.volume%253D334%26rft.issue%253D6060%26rft.spage%253D1226%26rft.epage%253D1227%26rft.atitle%253DReproducible%2BResearch%2Bin%2BComputational%2BScience%26rft_id%253Dinfo%253Adoi%252F10.1126%252Fscience.1213847%26rft_id%253Dinfo%253Apmid%252F22144613%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [34]: /lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEzOiIzMzQvNjA2MC8xMjI2IjtzOjQ6ImF0b20iO3M6MjI6Ii9zY2kvMzY1LzY0NDkvMTI3LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ== [35]: #xref-ref-8-1 "View reference 8 in text" [36]: #xref-ref-9-1 "View reference 9 in text" [37]: {openurl}?query=rft.jtitle%253DInt.%2BJ.%2BDigit.%2BCuration%26rft.volume%253D13%26rft.spage%253D114%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [38]: #xref-ref-10-1 "View reference 10 in text" [39]: {openurl}?query=rft.jtitle%253DAm.%2BEcon.%2BRev.%26rft.volume%253D107%26rft.spage%253D60%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [40]: #xref-ref-11-1 "View reference 11 in text" [41]: {openurl}?query=rft.jtitle%253DProc.%2BNatl.%2BAcad.%2BSci.%2BU.S.A.%26rft_id%253Dinfo%253Adoi%252F10.1073%252Fpnas.1708290115%26rft_id%253Dinfo%253Apmid%252F29531050%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [42]: /lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoicG5hcyI7czo1OiJyZXNpZCI7czoxMToiMTE1LzExLzI1ODQiO3M6NDoiYXRvbSI7czoyMjoiL3NjaS8zNjUvNjQ0OS8xMjcuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9 [43]: #xref-ref-12-1 "View reference 12 in text" [44]: {openurl}?query=rft.jtitle%253DPLOS%2BONE%26rft.volume%253D13%26rft.spage%253De0202447%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [45]: #xref-ref-13-1 "View reference 13 in text" [46]: {openurl}?query=rft.jtitle%253DAm.%2BJ.%2BEpidemiol.%26rft.volume%253D186%26rft.spage%253D387%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [47]: #xref-ref-14-1 "View reference 14 in text" [48]: #xref-ref-15-1 "View reference 15 in text" [49]: https://doi.org/10.5281/zenodo.3256633

Reproducible Research and GIScience: an Evaluation Using AGILE Conference Papers

Reproducible Research and GIScience: an Evaluation Using AGILE Conference Papers

We reviewed current recommendations for reproducible research and translated them into criteria for assessing the reproducibility of articles in the field of geographic information science (GIScience). Results from the author feedback indicate that although authors support the concept of performing reproducible research, the incentives for doing this in practice are too small. Therefore, we propose concrete actions for individual researchers and the GIScience conference series to improve transparency and reproducibility.

Digitisation of Higher Education: Systemic Framework Conditions and Influencing Political Factors

Digitisation of Higher Education: Systemic Framework Conditions and Influencing Political Factors

Which factors have a strong systemic influence on the digitisation of higher education? Which can be influenced politically? The authors look at areas of action related to open science and discuss the extent to which future scenarios such as “disruption" can endanger university locations.

Discrepancy in Scientific Authority and Media Visibility of Climate Change Scientists and Contrarians

Discrepancy in Scientific Authority and Media Visibility of Climate Change Scientists and Contrarians

The role of climate change (CC) contrarians is neglected in climate change communication studies. Here the authors used a data-driven approach to identify CC contrarians and CC scientists and found that CC scientists have much higher citation impact than those for contrarians but lower media visibility.

What Difference Do Retractions Make? An Estimate of the Epistemic Impact of Retractions on Recent Meta-analyses

What Difference Do Retractions Make? An Estimate of the Epistemic Impact of Retractions on Recent Meta-analyses

Every year, several hundred publications are retracted due to fabrication and falsification of data or plagiarism and other breeches of research integrity and ethics. However, the extent to which a retraction requires revising previous scientific estimates and beliefs is unknown.

Study Examines How Media Around the World Frame Climate Change News

Study Examines How Media Around the World Frame Climate Change News

While richer countries tend to frame climate change coverage as a political issue, poorer countries more often frame it as an international issue that the world at large needs to address.

Replication and the Manufacture of Scientific Inferences: A Formal Approach

Replication and the Manufacture of Scientific Inferences: A Formal Approach

The field of replication studies remains a controversial, misunderstood.To help bring order to the chaos, the author suggests a theory of manufactured inferences. 

Mind the Gap

A Landscape Analysis of Open Source Publishing Tools and Platforms catalogs and analyzes all available open-source software for publishing and warns that open publishing must grapple with the dual challenges of siloed development and organization of the community-owned ecosystem

The Impact of Open Access on Teaching-How Far Have We Come?

The Impact of Open Access on Teaching-How Far Have We Come?

This article seeks to understand how far the United Kingdom higher education (UK HE) sector has progressed towards open access (OA) availability of the scholarly literature it requires to support courses of study. It uses Google Scholar, Unpaywall and Open Access Button to identify OA copies of a random sample of articles copied under the Copyright Licensing Agency (CLA) HE Licence to support teaching. The quantitative data analysis is combined with interviews of, and a workshop with, HE practitioners to investigate four research questions. Firstly, what is the nature of the content being used to support courses of study? Secondly, do UK HE establishments regularly incorporate searches for open access availability into their acquisition processes to support teaching? Thirdly, what proportion of content used under the CLA Licence is also available on open access and appropriately licenced? Finally, what percentage of content used by UK HEIs under the CLA Licence is written by academics and thus has the potential for being made open access had there been support in place to enable this? Key findings include the fact that no interviewees incorporated OA searches into their acquisitions processes. Overall, 38% of articles required to support teaching were available as OA in some form but only 7% had a findable re-use licence; just 3% had licences that specifically permitted inclusion in an ‘electronic course-pack’. Eighty-nine percent of journal content was written by academics (34% by UK-based academics). Of these, 58% were written since 2000 and thus could arguably have been made available openly had academics been supported to do so.

A Literature Review of Scholarly Communications Metadata

A Literature Review of Scholarly Communications Metadata

The purpose of this literature review is to identify the challenges, opportunities, and gaps in knowledge with regard to the use of metadata in scholarly communications. This paper compiles and interprets literature in sections based on the professional groups, or stakeholders, within scholarly communications metadata: researchers, funders, publishers, librarians, service providers, and data curators.

Meta-Research: Use of the Journal Impact Factor in Academic Review, Promotion, and Tenure Evaluations

Meta-Research: Use of the Journal Impact Factor in Academic Review, Promotion, and Tenure Evaluations

Almost a quarter of faculty evaluation documents from US and Canadian universities mention Journal Impact Factor and often imply that it measures research quality.

Why We Publish Where We Do: Faculty Publishing Values and Their Relationship to Review, Promotion and Tenure Expectations

Why We Publish Where We Do: Faculty Publishing Values and Their Relationship to Review, Promotion and Tenure Expectations

A survey of academics finds that respondents most value journal readership, while they believe their peers most value prestige and related metrics such as impact factor when submitting their work for publication.

What Science Looks Like

What Science Looks Like

The publication of our first two Registered Reports marks a major milestone for Nature Human Behaviour. These studies demonstrate what many researchers know, but is often hidden from the published literature: confirmatory research doesn't always confirm the authors' hypotheses.

Interdisciplinary Comparison of Scientific Impact of Publications Using the Citation-Ratio

Interdisciplinary Comparison of Scientific Impact of Publications Using the Citation-Ratio

Article shows that the Citation-Ratio is more consistent across disciplines than total numbers of citations.

Establishing, Developing, and Sustaining a Community of Data Champions

Establishing, Developing, and Sustaining a Community of Data Champions

While research data support units now exist in many universities, these are typically not able to provide discipline-specific expertise or resources. This article focuses on the Data Champion Programme at the University of Cambridge, which empowers discipline-specific expertise already embedded within each unit to advocate for good RDM and to deliver support locally.

The Definition of Reuse

The Definition of Reuse

Article postulates that a clear definition of use and reuse is needed to establish better metrics for a comprehensive scholarly record of individuals, institutions, organizations, etc. Hence, this article presents a first definition of reuse of research data.

Releasing a Preprint is Associated with More Attention and Citations

Releasing a Preprint is Associated with More Attention and Citations

Preprint examines whether having a preprint on bioRxiv.org was associated with the Altmetric Attention Score and number of citations of the corresponding peer-reviewed article.

The Citation Advantage of Linking Publications to Research Data

The Citation Advantage of Linking Publications to Research Data

Efforts to make research results open and reproducible are increasingly reflected by journal policies encouraging authors to provide data availability statements. As a consequence of this, there has been a strong recent uptake of data availability statements, but it is still unclear what proportion of these statements actually contain well-formed links to data, and if there is an added value in providing them.

Comparing Journal and Paper Level Classifications of Science

Comparing Journal and Paper Level Classifications of Science

The classification of science into disciplines is at the heart of bibliometric analyses. While most classifications systems are implemented at the journal level, their accuracy has been questioned, and paper-level classifications have been considered by many to be more precise.

Ten Simple Rules for Researchers Collaborating on Massively Open Online Papers (MOOPs)

Ten Simple Rules for Researchers Collaborating on Massively Open Online Papers (MOOPs)

The authors provide recommendations for a highly open and participatory interactive process of collaboration using digital tools and environments, discuss potential issues that come with working with large and diverse authoring communities, and provide possible solutions should these arise.

Universities and Knowledge Sharing

Universities and Knowledge Sharing

The authors explore the extent to which universities are functioning as effective open knowledge institutions; as well as the types of information that universities, funders, and communities might need to understand an institution's open knowledge performance and how it might be improved. The challenges of data collection on open knowledge practices at scale, and across national, cultural and linguistic boundaries are also discussed.

OpenCitations

OpenCitations is a scholarly infrastructure organization dedicated to open scholarship and the publication of open bibliographic and citation data as Linked Open Data using Semantic Web technologies, to the development of software tools and services that enable convenient access to these open data, and to community advocacy for open citations. This paper describes OpenCitations and its datasets, tools, services and activities.

The Effect of BioRxiv Preprints on Citations and Altmetrics

The Effect of BioRxiv Preprints on Citations and Altmetrics

Article finds that bioRxiv-deposited journal articles received a sizeable citation and altmetric advantage over non-deposited articles.