published quarterly by the university of borås, sweden

vol. 27 no. 2, June, 2022

A light in the dark: open access to medical literature and the COVID-19 pandemic

Marco Capocasa, Paolo Anagnostou, and Giovanni Destro Bisol

Introduction. This study was designed to evaluate the accessibility of peer-reviewed literature regarding COVID-19 and the ten diseases with the highest death toll worldwide.
Method. We conducted extensive searches of studies concerning COVID-19 and other diseases using the Web of Science, and the Google and Google Scholar search engines.
Analysis. Open access rates were obtained from the Web of Science database, taking into account different types of publications and research areas. Quantitative analyses based on random samplings were used to estimate the potential increase of open access rates achievable with open archiving of post-prints.
Results. The open access rate of COVID-19 papers (89.5%) largely outnumbered that of the ten most deadly human diseases (48.8%, on average). We estimated that most of the gap (70%) could be bridged by making available online, post-print manuscripts.
Conclusions. The pandemic represents a real breakthrough, in scientific publishing, towards the goal of health information for all, demonstrating that much greater access to medical literature is possible. The green road may be the best way to bring open access rates of peer review of other major diseases closer to that of COVID-19. However, it needs to be implemented more effectively, combining bottom-up and top-down actions and making the open science culture more widespread.

DOI: https://doi.org/10.47989/irpaper929


The availability of scientific information is widely recognised as a key factor in fostering medical research and intervention activities, in supporting programmes to reduce health disparities and facilitate informed health choices (Bell et al., 2016; Brown et al., 2019; Spring, 2020). In 2013, the World Health Organization recognised the importance of the sharing and accessibility of research methods, results and data as the basis for their exploitation ‘for practical purposes, including the improvement of health’ (World Health Organization, 2013, p. 110). In 2019, this principle has been reaffirmed by the World Medical Association (2019), with the promotion of initiatives for the improvement of access to ‘timely, current, evidence-based healthcare information’. These objectives echo the goal of a ‘transparent and accessible knowledge that is shared and developed through collaborative networks’ pursued by the open science movement (Vicente-Sáez and Martínez-Fuentes, 2018, p. 434). However, although many funders and academic institutions now promote open access practices, they are still not the norm in global health research (Smith et al., 2017).

The past three decades have seen an increase in outbreaks of highly contagious and pathogenic diseases, a trend that could be further pushed by zoonotic spill over events caused by the occupation of new land by human populations and the expansion of the vector distribution favoured by climate change (Houlihan and Whitworth, 2019). Immediate and full sharing of new scientific evidence could play a decisive role in contrasting outbreaks of infectious diseases (Centers for Disease Control and Prevention, 2003; King et al., 2012). As an example, the Ebola virus outbreak in Liberia in 1982 remained hidden to some public health institutions because the paper reporting this information was published in a subscription-only journal (Knobloch et al., 1982) A more timely dissemination of this study would likely have led to faster and more effective actions to reduce the scale of the epidemics that occurred later in 2014 (Smith et al., 2017).

Today, all the national health systems are facing the Coronavirus disease-19 (COVID-19) challenge. Due to its novelty, the infectivity, pathogenesis and clinical course of COVID-19 remain under scrutiny, and while waiting for a vaccine, a wide range of pharmacological approaches are being tested. In this rapidly evolving situation, having readily available new knowledge on COVID-19 may be of great help for researchers and practitioners. Furthermore, combined with effective data sharing, full and immediate access to scientific articles could allow more researchers to compare and verify the results of experimental and clinical studies, an essential step towards ensuring scientific integrity and reproducibility (Moorthy et al., 2020; Wallach et al., 2018). The adoption of open access policies for COVID-19 papers by major publishers (e.g., Elsevier, the Nature Publishing Group, Wiley, and the American Medical Association) represents a move in this direction (Tavernier, 2020; Wellcome, 2020). However, the proportion of the peer-reviewed literature on COVID-19 that is openly accessible and how this compares to other major diseases is unknown. In this study, the rates of COVID-19 peer-reviewed articles published in open access were evaluated. Second, the results were compared with those obtained for the ten diseases with the highest number of deaths worldwide. Third, the usefulness of the green road was evaluated, the quickest and least expensive way to share peer-reviewed literature to bring open access rates closer to studies on COVID-19 and other diseases. Fourth and finally, a scalable strategy is proposed, aimed at maximising the effectiveness of the green road as a means of increasing comprehensive and timely access to peer-reviewed medical literature.

Materials and methods

The dataset was built by querying the Web of Science database (core collection), which has been preferred over others (PubMed or Scopus) as it is based on a more rigorous selection for peer-reviewed articles and provides open access rates for different research fields (Bosman & Kramer, 2018; Pranckutė, 2021).

All the data supporting the results of this study are included in Appendix. Web of Science was accessed on 9 July 2020 to retrieve information regarding published papers concerning COVID-19, four past outbreaks (avian influenza, Middle East respiratory syndrome, severe acute respiratory syndrome, swine influenza) and the ten human diseases with the highest worldwide death toll (listed in Figure 2). Topic was set as a research field and considered five types (article, review, editorial material, letter, book) and four sub-types (early access, proceedings paper, book chapter, data paper, clinical trial and case report) of publications. In order to narrow the search to peer-reviewed articles with an obvious impact on medical research and intervention activities, publications belonging to the following Web of Science categories were excluded: abstracts, meeting abstracts, corrections, news items, book reviews, retractions, biographical items, reprints, fiction creative prose papers, reference materials, biographies, other and unspecified papers. It should also be noted that Web of Science does not currently index pre-prints.

For each disease, multiple search terms were used to ensure adequate coverage of published papers (see Table S1 in Appendix). For diseases other than COVID-19, they were taken from levels 3 and 4 of the death cause hierarchy in Global Burden of Disease 2017 Causes of Death Collaborators (2018). A six-month period was considered in all cases. Since the data can only be obtained for whole months, for the aforementioned four past outbreaks the Web of Science database was queried using the month subsequent to the one reported by WHO for their emergence as the lower limit (see Table S1 in Appendix). For COVID-19 and the ten above-mentioned diseases, we searched papers published between 1 January to 9 July using Web of Science, in order to encompass a six months period of the etiological agent of COVID-19 (SARS-CoV-2; 9 January 2020; Wang et al., 2020).

The open access rates were obtained directly from Web of Science, which includes “freely accessible peer-reviewed versions of an article from a publisher's website or repository” under the open access category (Clarivate Analytics, 2020). When a paper was assigned more than one open access status, the one with the highest rank was selected (gold> bronze> green). In order to estimate the potential increase in open access rates for non-COVID-19 diseases achievable with self-archiving of post-prints (i.e., drafts of articles that have been peer-reviewed and accepted for publication but that have yet to be formatted by the journals, also referred to as author accepted manuscripts) fifty papers were randomly extracted from the dataset for each disease category which had been published in closed access according to Web of Science. As an exception, only forty-two papers were available for maternal and neonatal disorders. The selection was limited to papers published within the first three months of 2020 (as reported in Pubmed), in order to avoid false open access negatives due to latency in post-print archiving procedures. Information on journal policies was obtained from Sherpa Romeo (Curry, 2017). To check their availability online, Web of Science was used, together with Google and Google Scholar, using the title and DOI as search terms (accessed on July 9 for both 2020 and 2021). All data were obtained using double-blind procedures.

The number of global deaths for COVID-19 (from 9 January to 9 July) and the ten human diseases with the highest death toll worldwide were obtained from Johns Hopkins Coronavirus Resource Center and Global Burden of Disease 2017 Causes of Death Collaborators (2018), respectively (Table S2 in Appendix). To make figures comparable, the average number of non-COVID-19 deaths per day (365 days in 2017, the year of observation) was multiplied by the number of days between 9 January and 9 July (181 days) and rounded down to the nearest integer (Table S2 in Appendix).


Estimate of COVID-19 peer-reviewed papers

Using the Web of Science database, a total of 13,655 peer-reviewed papers were retrieved on COVID-19 published in a six-month period. This number was 9.5 times more than the sum of papers concerning four recent viral outbreaks published in a comparable length of time (Table S3 in Appendix). The largest proportion of papers (40.0%) was in the form of research articles (including data and proceedings papers), followed by editorial material (27.1%), letters (23.1%) and reviews (9.8%).

Previously published counts of COVID-19 documents have produced counts different from those noted above, lower (Arrizabalaga et al., 2020; Yeo-Teh and Tang, 2021) or higher (Else, 2020). This is expected due to the different databases chosen (other studies used PubMed) and further differences in search terms, database coverage and type of publications considered. Of particular significance, this study was the only one to focus on peer-reviewed papers (see above).

Open access to peer-reviewed papers on COVID-19 and other diseases

In this dataset, the overall open access rate of papers indexed by Web of Science within six months of Sars-CoV-2 identification as the aetiological agent of COVID-19 was 89.5% (Figure 1). These values ranged from 88.3% to 93.5% for the letters and reviews, respectively, with editorial material (88.4%) and research articles (90.0%) in between. Most of the articles (56.8%) were published under indefinite or non-CC license (bronze), followed by the gold road (30.4%; articles published in DOAJ or hybrid journals), while the green road was used to a much lesser extent (2.3%). Interestingly, a notable difference was observed in the type of open access depending on the type of publication: bronze outperformed the open access of gold in editorial material (64.7% versus 22.5%) and in letters (68, 2% versus 17.8%), while the difference was moderate for articles (47.6% versus 39.3%) and very small for reviews (45.2% versus 45.8%). Most papers published in journals with a first quartile impact factor were in open access: values ranged from 92.9% to 99.2% for medicine, general & internal and medicine, research & experimental, respectively, with multidisciplinary sciences (97.0%) and public, environmental & occupational health (97.1%) in between (see Table S4 in Appendix). According to data collected exactly a year later, the open access rate of COVID-19 documents decreased only slightly (from 89.5% to 86.7%).

Figure 1: Rates for papers on COVID-19 published with gold, bronze and green open access status
Figure 1: Rates for papers on COVID-19 published with gold, bronze and green open access status based on the Web of Science definition (Clarivate Analytics, 2020). When a paper was found with more than one open access status the one with the highest rank was selected (open access status rank: Any type of gold, bronze, green).

Wanting to compare the open access rates obtained for COVID-19 papers with that for other diseases of primary public health importance, the ten diseases with the highest death toll were identified (Global Burden of Disease 2017 Causes of Death Collaborators, 2018). The current number of deaths attributed to the new viral pandemic six months after its emergence (547,931) is lower than that all of them in an equivalent time period and corresponds to 23.0% of their average value (2,379,931; Global Burden of Disease 2017 Causes of Death Collaborators, 2018).

The open access rates obtained from Web of Science ranged from 44.0% for maternal and neonatal disorders to 58.9% for respiratory infections and tuberculosis, with an average of 48.8% (see Figure 2 and Table S5 in Appendix).

The potential impact of post-prints on open access rates

Concerning papers randomly selected among those classified as closed access by Web of Science, 273 out of 492 (55.5% of the total random sample) could have been immediately shared as post-prints complying with the publishers' policies, with values ranging from 48.0% (other non-communicable diseases) to 66% (chronic respiratory diseases) (see Table S6 and S7 in Appendix for more details). However, only nine post-prints were available online (3.3% of the total) related to six different diseases (Figure 2). Two were found through Web of Science, five through Google or Google Scholar, and the other two were found by both methods. Overall, only 1.5% of the post-prints were available to download by Web of Science, while a more time-consuming Google or Google Scholar search only produced a further 1.8%. Repeating the same research exactly one year later, the total number of online post-prints increased to 47 (9.3% of potentially shareable post-prints), with all but one disease (maternal and neonatal disorders) represented. Three out of 47 were found by Web of Science, 10 by Google or Google Scholar, and the remaining 34 using both search methods (see Table S8).

By making these post-prints available online, the average open access rate across the ten diseases would increase by 28.3% (from 48.8% to 77.1%), which corresponds to 70% of the gap with COVID- 19 (Δ = 40.7%). For single diseases, the smallest increase could be obtained for respiratory infections and tuberculosis (22.2%; 95% confidence interval = 16.5% to 27.9%), and the highest for chronic respiratory diseases (31.4%; 95% confidence interval = 25.2% to 37.6%) (green bars in Figure 2; see Table S9 in Appendix for more details). Publishers' policies were found to preclude the immediate online post-print archiving of 210 papers (42.7% of the total random sample), with the lowest (32.0%) and highest (50.0%) percentages for chronic respiratory diseases and other non-communicable diseases, respectively (see Table S6 and S7 in Appendix for more details). The increase of open access rate achievable with the removal of limitations to open online archiving was 21.8%, with values for single diseases ranging from 15.2 (95% confidence interval = 9.1 % to 21.3%) of chronic respiratory diseases to 27.3 (95% confidence interval = 19.7 % to 34.8 %) of other non-communicable diseases (orange bars in Figure 2; see Table S8 in Appendix for more details). The remaining 1.8 % of the total random sample (nine papers) were published in journals which do not allow post-print archiving in any condition (see Table S6 and S7 in Appendix for more details).

Figure 2: Open access rates for papers
Figure 2: Open access rates for papers (publishers' PDFs and post-prints) on COVID-19 and other human diseases. All observed and estimated data are provided in the Appendix (Tables S6 and S9).

Combining the potential of post-prints shareable immediately, and after removal of publishers' restrictions, with open access publisher's PDF file would potentially produce open access rates close to 100% (99.4%), with the smallest increase achievable for maternal and neonatal disorders (96%) and the greatest (100%) for cardiovascular diseases, respiratory infections and tuberculosis, diabetes and kidney diseases and digestive diseases.


The response of the scientific community to the pandemic

The very large number of COVID-19 related articles published in just six months highlights an unprecedented effort by the scientific community to help combat the global threat posed by the pandemic (Else, 2020). The most significant finding of this study is the high rate of open access observed for COVID-19 related documents. The original estimate is slightly higher than that observed by Teixeira da Silva et al. (2021) for the same duration (83%; 1 January-30 June 2020), but less than that of Arrizabalaga et al. (2020) (97.4%) which were based on a period of only four months (January-April 2020). A more rigorous approach was applied than the latter authors by excluding articles of lesser scientific relevance (See Materials and methods section).

In general, these estimates are significantly higher than the open access rates observed in the past, both overall and for each research area. For example, Bosman and Kramer (2018) estimated that the levels of open access for research and review papers published between 2010 and 2017 are less than 30%. A slightly higher value (36.1%) was observed by Piwowar et al. (2018) from a random sample of 100,000 journal articles published between 2009 and 2015. More recently estimated open access rates exceed 50%, probably driven by the choices of researchers, publishers and funding agencies (Brainard, 2021). The estimates based on the most recent papers (year 2019, data from Curtin Open Knowledge Initiative; see Brainard, 2021) reported a positive picture for the so-called "hard science" disciplines, for some of which more than half of the published articles were found to be freely accessible. In particular, biology and physics have shown open access rates above 60%.

While witnessing a growing trend, these values are very far from those observed for COVID-19 rates. This gap underlines the role of the current cooperation agreement entered into by many scientific publishers to ensure the rapid open access of scientific publications related to the coronavirus outbreak (Tavernier, 2020). Eighteen months after the start of the pandemic, these policies are still in place as, unfortunately, the health emergency is far from over.

In analogy to the broad open attitude of sharing data observed among human paleo-geneticists (Anagnostou et al., 2015), it is argued that this finding should not be seen simply as a symbol for the open science movement. In fact, it suggests that the need to open up access to new knowledge regarding high-impact human diseases is finally making headway in the medical field.

Why and how the green road may help increase open access

The data presented in the results section asked us an inevitable question: How can we bridge the gap between open access rates for COVID-19 and other high-impact human diseases? The green road is to be the most feasible and easiest way to make scientific information available to the widest possible public (Goben and Akers, 2020; Zhang and Watson, 2017). In this study, we provided evidence that it is not widely used in the health sciences: only a small minority (around 3-10%) of the publications potentially available as post-print were found online. These results are consistent with previous studies, whose estimates range from 9.7% for clinical medicine to 14.7% for health sciences (Martín-Martín et al., 2018; Piwowar et al., 2018).

All of this tells us that while the green road can be a valuable approach, it needs to be implemented more effectively. To this end, we have developed a strategy consisting of three scalable steps designed along a bottom-up-top-down gradient (see Table 1):

Table 1: Schematic representation of the proposed strategy to increase the rates of open access of the articles concerning the ten diseases responsible for the highest number of deaths in the world.
 Step 1Step 2Step 3
LevelIndividualInstitutions and associations individuallyInstitutions, associations, and some stakeholders, individually or, better, in partnership
ActionAuthors archive post-prints online whenever publishers' rules allow it.Any academic, research and health centres, scientific and professional association, and funding agency, incentivise the open archiving of post-prints.Any academic, research and health centre, scientific professional and patient associations, undertake to make publishers remove restrictions to online post-print archiving.
Time neededShorter: can be realised immediately after acceptance of the paper, depending on the authors' willingness.Intermediate: requires institutional governance, deliberation, and action.Longer: requires creating synergies and conducting negotiation.
Estimated effectsIncrease up to 28.3% in open access rates.Increase up to 21.1% in open access rates.

1. Researchers and clinicians should be more careful from now on about archiving their post-prints in the quickest and most easily accessible way that complies with publishers' policies (Curry, 2017). We estimated that the authors could close a substantial part (70%) of the gap with the COVID-19 open access rate, with just a little additional effort which, however, would also give them a potential benefit. On the one hand, since most journals permit the immediate download of post-prints only from personal and departmental Web pages, authors should optimise their search engine indexing to make sharing effective. On the other hand, through open post-print archiving, they would be able to comply with the open access mandates from institutions and/or funders without the need for additional resources.

2. As the data indicates, the possibility to share post-prints in accordance with the publishers' policies results in their open archiving in a small minority of cases, raising a problem that could be addressed more effectively by individual institutions than large-scale initiatives (see Scholarly Publishing and Academic Resources Coalition, 2018). Any academic or health centre, scientific or professional association, or funding agency should go beyond simply indicating the green road as a way for open access in their policies. Using the COVID-19 case as a flagship reference, they should make trainees and investigators more aware of the value of combining complete and immediate open access with effective open data practices for research advancements and applications to health care. In addition to committing their members to share papers as openly as possible, given the publisher's policies and privacy constraints, they should recognise the open online archiving of accepted manuscripts in funding policies, and in hiring and promotion decisions. Finally, authors should be supported in making their papers easily findable online as outlined above. These lines of action overlap with those defined by Piwowar et al. (2008) for data sharing in academic health centres.

3. Representatives of academic, research and health centres, and scientific and professional associations, which are directly involved in negotiations with publishers, should undertake actions to make publishers remove the embargo and other restrictions to online post-print archiving either singularly or, more effectively in reciprocal partnership, and in synergy with patients, caregivers, and patient advocacy organizations. Combined with that of step 1, the resulting gain could lead to open access rates higher than COVID-19 and even close to totality. Such an objective is reasonable for two reasons. First, it would be not only coherent with the principle of opening access to medical information of primary importance for human health which has been implemented with COVID-19, but would also be less demanding for publishers than making final PDF files available, which they did in response to the pandemic. Secondly, the large and increasing profit margins of major publishers (Larivière et al., 2015) might be used as an argument.

Putting our proposal in context

Through the implementation of this strategy, more demanding initiatives could be complemented for open access to medical literature. In fact, by exploiting the potential of post-prints as a tool for sharing information, an alternative and feasible path can be introduced to achieve immediate open access to publicly funded research. Recently, an intense debate has arisen about the feasibility of increasing access to scientific literature through transformative agreements; contracts negotiated between publishers and institutions that combine subscription access to journals with the possibility of open access publishing, shifting costs from authors to institutions (Borrego et al., 2021; Farley et al., 2021). It is worth noting that the green road has three fundamental advantages over the transformative agreements: firstly, it does not risk increasing the costs that should be paid by institutions to publishers, which is a concern for countries with limited financial research support (European Research Council, 2020); secondly, it does not suffer from the uncertainties and latency of negotiation processes; thirdly, the green road poses no equity problem, whereas transformative agreements can be more or less advantageous, depending on the negotiating strength of institutions or their consortia with the publishers.

Several tools may help increase access to medical scientific literature (Masic and Milinovic, 2012). Of particular importance is the Health Internetwork Access to Research Initiative (HINARI), launched in January 2002 by the World Health Organization in collaboration with the main scientific publishers to provide access to papers by research groups operating in low and middle-income countries (Aronson and Long, 2003; Katikireddi, 2004). Together with other similar programmes (AGORA, OARE, ARDI and GOALI), HINARI established a partnership named Research4life, providing free or low-cost access to scientific literature and databases in the developing world (Bartol, 2013).

However, concerns have been raised regarding the effectiveness of the accessibility of research products by HINARI. It has been pointed out that even the low-cost model can be too expensive for institutions in middle-income countries, while the highest-impact journals of the major publishers (e.g., Blackwell, Elsevier and Nature Publishing Group) have become unavailable, just a few years after the HINARI launch (Villafuerte-Gálvez et al., 2007). Putting our proposal in practice, staff members and students of non-profit institutions (e.g., healthcare centres, professional schools, and research institutes) might also use peer-reviewed literature which is not currently covered by HINARI. More particularly, researchers and practitioners who belong to institutions that are ineligible to join HINARI (e.g., rural and community-based practitioners) or are unable to pay the fees would benefit from more widespread post-print sharing, having a readily available range of peer-reviewed information, which is significantly wider than that offered by open access journals (Villafuerte-Gálvez et al., 2007). In btil19oth cases, increasing post-print availability would reduce the use of the pirated repository of scientific articles (Bendezú-Quispe et al., 2016; Till et al., 2019).


Overall, the actions proposed could help move towards the goal of ‘health information for all’ (Godlee, 2004). Obviously, such an ambitious aim cannot be achieved without a more widespread culture of scientific and human cooperation (Anagnostou et al., 2021; Bolukbasi, 2013; Destro Bisol et al., 2014; Gurteen, 1999). Today there is something that can help in this difficult task, a small but intense light in the dark of the immense tragedy of COVID-19: a better awareness among all those involved in the research cycle of the importance of open science for human health and a demonstration that much greater access to health information is possible. It's up to us to make sure the light does not go out.


This work was funded by the University of Rome "La Sapienza" (Italy), through the project "Setting up a network for Responsible Research and Innovation - RRI-Net" (PI116154C845CA8D) and received support from the Istituto Italiano di Antropologia (project "Open Access and Open Data"). We would like to thank Mauro Mandrioli (University of Modena and Reggio Emilia, Italy) for his helpful comments. We dedicate our work to the late Carlo Urbani, who first warned the world about SARS and helped prevent a global epidemic.

About the authors

Marco Capocasa deals with the ethical issues related to data sharing and in the dissemination of scientific knowledge in anthropological and biomedical research. He is also interested in the study of the relationships among genetic, geographical and cultural isolation and between social structures and genetic diversity in human populations. He can be contacted at: marco.capocasa@uniroma1.it
Paolo Anagnostou is a professor of Anthropology at the University of Rome “La Sapienza”. He is currently involved in several research activities related to Responsible Research and Innovation, which mainly focus on Open Access and Open Data in Life Sciences. His interests also include human genetic diversity, more specifically in relation to cultural and social factors. He can be contacted at: paolo.anagnostou@uniroma1.it
Giovanni Destro Bisol, currently professor of Anthropology at the University of Rome “La Sapienza”, secretary of the Istituto Italiano di Antropologia and member of the board of the Associazione Italiana per la Scienza Aperta (AISA), is a biological anthropologist interested in interdisciplinary studies of Open Science and Open Data, both from an empirical and theoretical point of view, and works on the implementation of Open Access policies. He can be contacted at: destrobisol@uniroma1.it


Note: A link from the title, or from (Internet Archive) is to an open access document. A link from the DOI is to the publisher's page for the document.

How to cite this paper

Capocasa, M., Anagnostou, P., & Destro Bisol, G. (2022). A light in the dark: open access to medical literature and the COVID-19 pandemic. Information Research, 27(2), paper 929. Retrieved from http://InformationR.net/ir/27-2/paper929.html (Internet Archive https://bit.ly/3zymX5J) https://doi.org/10.47989/irpaper929

Appendix - Supplementary tables S1 to S9

Check for citations, using Google Scholar