header
published quarterly by the university of borås, sweden

vol. 26 no. 1, March, 2021



An analysis of use statistics of electronic papers in a Korean scholarly information repository


Jayhoon Kim, Su-yeong Bae, Hyekyong Hwang, and Sung-Seok Ko.


Introduction. This study aimed to analyse the current use status of Korean scholarly papers accessible in the repository of the Korea Institute of Science and Technology Information in order to assess the economic validity of the maintenance and operation of the repository.
Method. This study used the modified historical cost method and performed regression analysis on the use of Korean scholarly papers by year and subject area.
Analysis. The development cost of the repository and the use volumes were analysed based on 1,154,549 Korean scholarly papers deposited in the Institute repository.
Results. Approximately 86% of the deposited papers were downloaded at least once and on average, a paper was downloaded over twenty-six times. Regression analysis showed that the ratio of use of currently deposited papers is likely to decrease by 7.6% annually, as new ones are added.
Conclusions. The need to manage currently deposited papers for at least thirteen years into the future and provide empirical proof that the repository has contributed to Korean researchers conducting research and development in the fields of science and technology. The benefit-cost ratio was above nineteen, confirming the economic validity of the repository.

DOI: https://doi.org/10.47989/irpaper892

Introduction

Recent advances in information technology have facilitated the sharing of and access to scholarly papers in digital format. However, researchers’ free access to scholarly papers is still limited due to commercial publishers charging for their content. The open-access initiative emerged as an alternative to overcome the paradox in the academic environment. This initiative aims to facilitate the sharing of information between researchers and users such that anyone can freely access and utilise information without legal, financial, and technical barriers (Suber, 2005). Open access refers to the removal of sharing and free access limitations posed by existing commercial publishers and the return of the academic environment to its original aspirations of sharing and communication. Specific open access strategies include gold open access, where an author often pays an article processing charge to publish their paper in an open access journal. The journal then allows everyone to freely view academic papers. Green open access is where an author publishes their paper in any journal and then self-archives a copy in a freely accessible institutional or specialist online archive (a repository or a Website).

Repositories are important sources of easy access to the scholarly information needed in academic research and practice. It is a model that allows free use of copyrighted scholarly publications (even for publications whose free access is limited by commercial publishers) and offers a new direction for scholarly information management systems (Bhardwaj, 2014). The first repository launched in 1991 was the physics repository now known as arXiv, which started as a server for articles on theoretical physics and included other areas of physics, mathematics and computer science as the Internet developed. Other early repositories include EconPapers, CogPrints and PubMed Central.

As in many countries, Korea has witnessed an increasing number of institutional repositories being developed by universities and research institutes, making a major contribution to vital scholarly exchanges. Among these, the oldest and largest repository of science and technology scholarly papers was established by the Korea Institute of Science and Technology Information (hereafter, the Institute), an institution that has provided scholarly information services since 1962. The repository is made available in Korea through the National Digital Science Library, a science and technology information service platform developed to foster scientific and technological innovations in the country by providing high-quality information to industry and academic research collaborators. The platform offers a content search function consisting of approximately 100 million records of diverse scholarly information such as scholarly papers, patent information and technical reports, as well as a specialised search service by type of content. As of March 2020, the Institute repository of scholarly papers had 100 million entries of metadata and full text of scholarly papers published inside and outside Korea. For scholarly papers published overseas, typically, only the metadata are deposited and a link to the full paper is provided. For papers published in Korea, original texts and metadata are stored in the repository, as the Institute has signed agreements with Korean academic institutions in the fields of science, technology, interdisciplinary convergence and integration. The Institute increases the quality of scholarly information in the repository through several curation processes, including author identification, institution identification, funding organisation identification, personal information processing, encryption, digital object identifier registration, reference extraction and quantitative analysis. The Institute also processes the original texts of papers provided by academic institutions, in addition to developing metadata and making them freely accessible in a timely manner. Yoon and Tibbo (2012) verified that the Institute repository has been effective according to several criteria, by enabling reuse through content curation, storing and maintaining original data, developing ways to use even more data in future projects, and enhancing the ability to verify research findings to promote the use of data in education.

However, a considerable amount of resources is necessary for the development and maintenance of a repository. In addition, continuous investment in a repository requires the assessment of its economic validity. The value of offline academic libraries handling scholarly information has been evaluated in many studies, among which Oakleaf (2010) stands out for having summarised years of work on the value of academic libraries.

With regards to the economic assessment of online-based repositories, Houghton and Beagrie’s research (2012) is a representative example. They conducted a fully developed quantified economic impact study on one of the UK Data Archives, the Economic and Social Data Service (ESDS), and found that quantifiable benefits and returns significantly exceeded the value of the funding invested in the platform. The study predicted a range of values, moving from those focusing on minimum values towards methods measuring some of the broader impacts, and, simultaneously performed qualitative analysis, such as case studies and the Keep Research Data Safe (KRDS) benefit framework. The KRDS benefit framework is intended to identify, assess and communicate the benefits of investing resources in the curation and long-term preservation of research data (Beagrie, 2011). The authors conducted a study on the Archaeology Data Service (ADS) by applying the framework used in the evaluation of ESDS (Beagrie and Houghton, 2013), and further proceeded to conduct a study which compared three data centre services, including the British Atmospheric Data Centre (BADC) (Beagrie and Houghton 2014). In these studies, qualitative assessment was conducted using the contingent value method (CVM) mainly based on surveys and interviews. The contingent value of a non-market good or service refers to the amount that users are willing to pay for it and/or willing to accept in return for it.

The value of scholarly information stored in a repository can be assessed from a data or information perspective. This valuation methodology is largely based on asset valuation methods on exchange value and use value. However, it is not easy to evaluate the exchange value of scholarly information as the market is not fully developed. Regarding the use value, it is not easy to estimate the economic effect through use. Moody and Walsh (1999) argued that information has seven unique attributes, distinguished from conventional tangible or intangible assets, and suggested the modified historical cost method, which takes the unique attributes into account. According to this method, data producers are very rational, which leads them to invest based on the assumption that the future economic value will be greater than the current investment. However, the value of the data ranges from potentially no value to enormous value. For example, sometimes data are never used and are only an overhead. At the other end of the spectrum, data can be used in crucial decision-making processes. In this sense, the method adjusts for the value of the data based on their unique attributes, such as its quality and the potential for an infinite number of users, in addition to considering data duplication. The most prominent feature of the modified historical cost method is that it relies heavily on statistical data on the current status of information use. Because statistical data can be easily obtained from information systems and the modified historical cost method is based on objective data, this method is advantageous in comparison to others in that assumptions and subjective viewpoints are removed.

This study aimed to analyse the current use of Korean scholarly papers available in the Institute repository in order to assess the economic validity – benefit-cost (B/C) ratio – of the maintenance and operation of the repository. To this end, this study was based on the modified historical cost method and analysed the use of Korean scholarly articles by year and by subject area adopting regression analysis to show the economic validity of the repository and provide basic data to be used in the development of operational strategies.

The remainder of this article is as follows: the methods section provides a description of the data used in the study and the methodological approach; the results section presents the results of the overall analysis and of the specific analysis by year and by subject area. In the discussion section, economic validity and future operational strategies are described and the analysis of the results is presented in the results section. The final section presents the conclusions of the study.

Methods

Dataset

The data used in this study were the numbers of full-text scholarly papers published in Korean academic journals deposited in the Institute repository between 2007 and 2019 and their corresponding use. For the period considered, 1,154,549 Korean scholarly papers across twenty-six subject areas were identified with their respective information on years of deposit, years of publication and subject area, as well as general information (including descriptive metadata on publication). The details of the deposited papers, that is the numbers of papers by year of deposit and year of publication, are summarised in Tables 6 and 7 in the Appendix, respectively.

The use volume through the National Digital Science Library was defined as the number of downloads of Korean scholarly articles between 2013 and November 29, 2019. Papers published outside Korea were excluded from this study because their full texts were not deposited in the repository, and this is often a gateway for overseas publishers. Prior to conducting the main analysis, abnormal use values were examined during data cleansing. Several papers showed an abnormally high number of downloads, albeit not very many. After a review of citation information and the contents of the papers, three papers were excluded from the study as it was concluded that they displayed over 10,000 downloads per year due to a system error or an abnormal phenomenon. In the final dataset, it was observed that during the study period, 996,151 articles were downloaded at least once and 26,164,642 downloads occurred in total.

Procedure

This study is based on the modified historical cost method suggested by Moody and Walsh (1999). They propose the following modifications to the historical cost method for valuing information, which incorporates the unique attributes of information.

This method consists of three parts: the cost of data collection, the quality of the data collected and the use of the data. However, in this study, we are mainly concerned with the use of data.

First, due to the confidentiality of the cost data, it is not appropriate to provide a detailed view of the cost of data collection. Two types of cost data, however, were considered in this study: labour cost and repository system operating cost. Labour cost and system operating cost consist of acquisition cost and maintenance cost, respectively. Taking into account all these costs, we have obtained the value of each deposited product. Second, the quality of the collected data is guaranteed by the well-established Institute data acquisition and standardised data management processes.

Therefore, to apply the modified historical cost method in this paper, we analysed the use of the collected papers. The use of papers tends to decrease over time, but since a collected paper will continue to be used in the future, we estimate the total use through regression analysis of current and future use.

Results

Basic statistics

As shown in Figure 1, a Pareto analysis was performed to examine the total number of deposited papers by subject area, as classified by the Korean Institute of Science and Technology Information (full names are given in Table 5 of the Appendix). Engineering accounted for 25% of all deposited papers. Although areas such as chemical were listed separately, it is speculated that engineering still encompasses numerous subareas with many academic journals and papers produced each year. The subject area with the second-highest number of deposited papers was medicine, comprising 15.8% of the total. As many academic journals and research papers are produced in medicine, as in engineering; however, a large number of them are published in overseas academic journals and, thus, the number of papers published in Korean academic journals is smaller compared to engineering. With the addition of agriculture, the top three areas accounted for 50% of all deposited papers. In addition, the upper half of the subject areas represented approximately 90% of all deposited papers, showing a bias towards particular subject areas. This phenomenon is believed to be due more to the distinctive general features of scholarly information rather than to a feature of Korean academic policy.

Figure 1: Number of deposited papers by subject area
Figure 1: Number of deposited papers by subject area

Data comparing the volumes of papers deposited and published each year are presented in Figure 2.

The yearly volume of deposited papers varied considerably over the study period. In 2010, 230,000 papers were deposited, while only approximately 36,000 papers were deposited in 2019. The number of papers in 2010 was relatively high because, in that year, several scholarly databases were integrated into the repository. Thereafter, a similar number of papers were deposited each year with the exception of 2016, suggesting a steady acquisition of scholarly papers.

In comparison with article deposits, the yearly volume of publications was strongly constant. However, the number of deposited papers is decreasing in more recent years, which seems to be due to the difference between the years of publication and deposit and an embargo period on academic publishing.

Figure 2: Deposited year vs publication year
Figure 2: Deposited year vs. publication year

Data on the use volumes of the papers are summarised in Table 1. From 2013 through 2019, annually, approximately 574,000 papers were downloaded at least once and a total of 996,151 papers were downloaded during the same period. Thus, approximately 86% of all deposited papers were downloaded at least once. This rate is considered significantly high, given that researchers can download papers not only from NDSL, but also through various other routes, such as publisher homepages and portals. Except for 2016, the number of papers downloaded did not vary greatly, which suggests the presence of a stratum of fixed users.

Nevertheless, the number of downloads fluctuated over time. With the exception of 2017, when articles were downloaded 2.9 million times, the annual number of downloads was over three million. Over the past three years, both the number of downloads and the number of deposited papers has been on an increasing trend. However, such an increase has occurred only recently, and it is therefore difficult to determine whether the number of downloads would increase in the long run.


Table 1: Basic use statistics
Year Number of papers Total number of downloads Average number of downloads per paper
2013 518,777 4,199,139 8.09
2014 582,475 4,512,645 7.75
2015 547,646 3,856,069 7.04
2016 752,429 3,902,914 5.19
2017 530,040 2,959,242 5.58
2018 536,067 3,221,081 6.01
2019 550,863 3,513,552 6.38
Mean 574,042 3,737,806 6.58
Total 996,151 26,164,642 26.27

During the study period, a paper was downloaded 26.27 times on average and a total of approximately 26 million downloads occurred. However, the number of times each article was downloaded varied considerably, as shown by the Pareto analysis presented in Figure 3. According to Figure 3, 10.91% of the papers were downloaded only once and the number of papers decreased exponentially as the number of downloads increased. Overall, however, Figure 3 reveals that the long tail principle, rather than the Pareto principle, was at work. This means that not many papers were downloaded numerous times, but such papers did exist. Due to this phenomenon, although the mean was 26.27, the first and the third quartiles were three and twenty-seven, respectively, and the median was ten; that is, 50% of all deposited papers were downloaded ten or fewer times during the study period (2013-2019), with a yearly average of 1.5 or fewer times.

Figure 3: Pareto analysis of the number of downloads
Figure 3: Pareto analysis of the number of downloads

Use statistics by subject area

Figure 4 shows the number of downloaded papers, the number of deposited papers, and the average number of downloads per paper for each subject area. In general, the number of deposited papers was high in subject areas with a high number of downloads. Hence, it is believed that the subject areas of the deposited papers have been properly chosen. Among the subject areas, mathematics showed a ratio of downloaded to deposited articles of 61.25%, while the corresponding ratio was 77.58% for building & construction and 79.59% for technology. With the exception of these three subject areas, the remaining twenty-three areas showed a ratio above 80%. Hence, the ratio of use was very high. Although the number of articles was on the low side, in home & family management and history, the ratios were above 95%, clearly showing that almost all of the deposited papers in these subject areas were used. This finding can be interpreted as a demonstration that the characteristics of academic areas were highly influential, and it seemed that the influence of a handful of papers raising issues from a socio-environmental perspective was significant.

However, the average number of downloads per article varied widely by subject area. In home & family management and history (subject areas with very high use ratios), the average number of downloads per article was the highest (forty-five and thirty-seven, respectively), while in the subject areas with low use ratios (such as fossils & prehistoric life and technology), the average number of downloads per paper was also low. The correlation between the use ratio and the average number of downloads per paper was positive, with a coefficient of ρ=0.48.

Figure 4: Number of downloaded papers by subject area and average download per article
Figure 4: Number of downloaded papers by subject area and average download per article

Based on their amount of use and the use trends, the subject areas were divided into three groups: heavy use subject area, medium use subject area, and light use subject area. The summary statistics for each group are presented in Table 2.

The heavy use group consisted of two subject areas, engineering and medicine, which showed ratios slightly over 40%. When this group and the medium use group were combined, a total of seven subject areas showed ratios higher than 70%. Given the comparability between the ratios of deposited articles and usage, it is believed that the scholarly information deposited in the repository adequately reflects the usage volume.


Table 2: Statistics for each user group
Group Subject Ratio of deposited papers Ratio of downloaded papers Download ratio
Heavy use Engineering, Medicine 40.2% 40.8% 41.3%
Medium usage Agriculture, Chemical, Computers, Home, Social 32.4% 32.5% 36.3%
Light usage Others 27.4% 26.7% 22.4%

Use statistics by year

Table 8 in the Appendix presents the annual number of downloaded papers by year of publication. The column shows the year of download and the row shows the year of publication. For example, in 2013, 518,777 papers were downloaded, of which 330,184 were published in or before 2007, while 31,240 were published in 2008.

Although the specific values shown in the table are meaningful in themselves, the overall trend is even more significant. In each column, the values continuously increase up to the most recent one or two years, showing that the most recently published papers were downloaded more frequently. However, the most recently published papers don’t follow this trend, apparently because the volume of recently deposited papers is not very high and because more time should pass for papers to be exposed to researchers.

An examination of the values in each row showed that the number of downloaded papers decreased, except for in 2016 (the year when the number of downloads was higher compared with other years), suggesting a steady demand for the most recent papers. However, the number did not drop abruptly indicating that the demand for papers published in the past did not decrease suddenly. This finding is consistent with what Starr and Williams (2008) reported and demonstrates the need to continue managing papers deposited in the past, as well as recent publications.

Table 9 in Appendix shows the yearly volume of downloads by the paper’s year of publication. The column lists the year of the download and the rows list the years of publication. The trend in the volume of downloads is the same as the result of the analysis conducted using the data in Table 8.

Figure 5: Download ratio by difference between year of use and year of publication
Figure 5: Download ratio by difference between year of use and year of publication

For a more detailed analysis, the data in Table 9 were manipulated to obtain the results shown in Figure 5. In Figure 5, the x-axis shows the difference in years between publication and download, while the y-axis shows the ratio of number of downloads of papers published in a given year to the total number of downloads in that year. For example, in 2016, the number of downloads of papers published in 2010 was 219,404, representing 5.62 % of the total use volume in 2016 (219,404/3,902,914 = 5.62%). This data point is the fourth of the six points corresponding to the x value of 6 (2016-2010) in Figure 5.

The graph reveals the trend of downloads according to the number of years passed since publication, which also has a bearing on the lifespan of the deposited papers. That is, the trend of the data points in Figure 5 is that the ratio of downloads is gradually decreasing, and the rate of decrease appears to be non-linear, indicating that the number of downloads decreases over time. Accordingly, although it is crucial to acquire new papers to meet information needs, it is also important to manage those already deposited. Another trend that can be seen in the graph is that the volatility in the ratio of downloads decreases over time. This finding could be interpreted as convergence over time; however, there is a limitation in conducting statistical analysis because, as the number of years between publication and download increased, the size of available data was small.

Figure 6: Cumulative download ratio by difference between use year and year of publication
Figure 6: Cumulative download ratio by difference between use year and year of publication

In addition to the ratio of papers by year of publication, it is also crucial to examine cumulative ratios. The cumulative ratio enables the prediction of the lifespan of the papers currently deposited, which can be strategically used in the management of the information deposited in the repository. Cumulative download ratios are shown in Figure 6. For example, in 2016, the number of downloads of papers published before 2014 was 3,399,951 representing 86.88% of the total use volume in 2016 (3,399,951/3,902,914 = 86.88%). This data point is the third one of the six points corresponding to the x value of 2 (2016-2014) in Figure 6. This figure presents useful information to identify the use trends of the information currently deposited, as it provides information on future use levels. According to Figure 6, the cumulative ratios are gradually decreasing. Regression analysis was conducted to statistically confirm the observation.

The response variable, Y, was the cumulative download ratio and the independent variable, X, was the difference between use year and publication year. Cases with X≥8 were excluded due to lack of data and a total of fifty-three data points were submitted to regression analysis. The results are presented in Table 3 below.

  df SS MS F Significance F
Regression 1 1.510 1.510 271.480 0.000
Residual 51 0.284 0.006    
Total 52 1.794      


Table 3: Results of regression analysis
  Coefficients Standard error t Star P-value
Intercept 0.995 0.018 53.977 0.000
Difference -0.076 0.005 -16.477 0.000

The explanatory power of the model, R2=0.84, was sufficiently high. The equation was as follows:

Y=0.995-0.076 X

indicating that the use ratio for articles currently deposited decreases by 7.6% per year. This finding shows that the download ratio is likely to reach 0% after approximately thirteen years (that is, the lifespan of the papers currently deposited) and, thereafter, the most downloaded papers will be those deposited after the current point in time. This means that the average lifespan of deposited papers is thirteen years or more. Therefore, it is necessary to set the minimum period for the management of papers at above thirteen years. As there are wide variations in scholarly articles, it is reasonable to set the management period to be much longer than thirteen years. In addition, as stated in Kryder’s Law that information storage costs continue to decrease (Rosenthal, 2017), the costs of storing and managing already deposited articles are relatively low compared to the cost of acquiring new papers, and the unit cost continuously becomes cheaper. Therefore, it would not be difficult to plan for a management period longer than the average lifespan.

Table 4 shows the ratio of downloaded papers to deposited papers. The data after 2008 show that at least 50% of the deposited papers were downloaded. In other words, more than half of the deposited papers were downloaded at least once in a given year. The numbers in each column indicate a trend of more recent papers having higher download ratios. However, the ratios of download in the same year of publication are somewhat low because not enough time has passed for the papers to be exposed to researchers. An examination of the numbers in each row (except in 2016, a year that showed abnormally high download ratios) clearly confirms the aforementioned finding that the download ratio decreased over time, although the rate of decrease was not high.


Table 4: Ratio of number of downloaded papers to number of deposited papers
Pub Year 2013 2014 2015 2016 2017 2018 2019
~2007 52% 55% 49% 65% 42% 41% 40%
2008 65% 66% 58% 73% 51% 49% 51%
2009 67% 68% 60% 75% 52% 50% 53%
2010 70% 72% 65% 79% 55% 53% 56%
2011 73% 73% 65% 80% 56% 54% 56%
2012 77% 78% 69% 86% 57% 55% 56%
2013 37% 79% 72% 91% 62% 59% 59%
2014 - 49% 72% 89% 63% 59% 58%
2015 - - 32% 96% 66% 62% 62%
2016 - - - 55% 68% 75% 62%
2017 - - - - 44% 71% 66%
2018 - - - - - 31% 61%
2019 - - - - - - 49%
Total 56% 60% 54% 71% 48% 47% 48%

Discussion

This study examined the current use status of the Korea Institute of Science and Technology Information repository of Korean scholarly papers by year and by subject area. The findings indicate various implications for the repository’s economic validity and efficient operation.

First, over the past eight years, approximately 50,000 papers per year were deposited in the Institute's repository and the average number of downloads per year was 3.7 million. On average, the number of downloads was approximately seventy-five times higher than the number of deposited papers, indicating that the repository has been actively used. Second, approximately 86% of the deposited papers were downloaded at least once, which demonstrates that the scholarly information stored in the repository is useful. This finding also justifies the need to continue operating the National Digital Science Library (NDSL). Third, the twenty-six subject areas identified were categorised into three use groups: heavy, medium and light. This classification was applied to the current status of both the use and deposit of papers, showing that the NDSL operational strategy adequately reflects the needs of researchers. Lastly, in general, the older the publications, the lower the use ratio. The rate of decrease was analysed in a linear relationship and the lifespan of the articles currently deposited was estimated at thirteen years. Thus, the deposited papers should be managed for at least thirteen years in order to support researchers’ academic activities.

The analysis of the current use status of the Institute's repository of Korean scholarly papers showed that, currently, the papers deposited in the repository have been used properly. This finding means that the current operational strategy is working well and suggests the need to acquire scholarly articles more proactively. In addition, the economic validity of the repository can be evaluated based on the modified historical cost method, as follows.

First, cleansed data were used in the study, which allows one to assume that problems such as data quality and duplication had been resolved. Therefore, from the perspective of the benefit to cost ratio (BCR), the value of scholarly information stored in the Institute's repository of Korean scholarly papers can be assessed by defining benefit as the total number of uses per paper and total investment cost as the sum of the costs for deposit (acquisition) and storage per paper.

By assigning a value of one to acquisition cost and assuming that the annual management cost is 10% (a conventional estimate used in Korea) with a decrease rate of 50% per year on the basis of Kryder’s Law (Rosenthal, et al., 2012), the cost per paper is estimated to be 1.2. As shown in Table 3, a paper was downloaded on average of 26.27 times and approximately 86% of the deposited papers were downloaded. Therefore, the average number of downloads per paper among the downloaded papers is 26.27 ×0.86= 22.6; accordingly, B/C=22.6/1.2=18.8. This indicates that the acquisition and operation of the repository is estimated to have a benefit approximately nineteen times greater than the investment.

Conclusion

In this study, the current use status of the Institute's repository of Korean scholarly papers was examined to provide basic data for economic validity analysis and operational strategy development for continuous operation of the repository. Specifically, the status of the papers deposited in the repository as at 2019 and the use status of deposited articles from 2013 through 2019 were analysed from a variety of viewpoints (including by subject area and by year).

According to the analysis, it was found that approximately 86% of all deposited papers have been used and that each paper was downloaded on average twenty-six times. Approximately 50,000 papers were deposited per year and the annual number of downloads was 3.5 million, a number seventy times higher than the volume of deposited papers, suggesting that the repository has been actively used. Regarding the economic validity of the repository, the benefit:cost ratio was estimated to be 18.8 on the basis of the concept of the modified historical cost method, demonstrating that benefit was much greater compared to investment.

This research was initiated to evaluate the economic value of the Institute's repository of Korean scholarly papers. However, given the intrinsic attributes of the research, there was a limitation in conducting accurate cost analysis, and the valuation was performed in a simplified manner due to a lack of holistic methods to evaluate the economic value of scholarly papers.

Hence, this study mainly focused on quantitative analysis on the use of deposited papers. However, it is important to develop an effective holistic methodology for evaluating the development and operation of institutional repositories of scholarly papers that should consider both economic and social values including qualitative analysis of the user's benefit or scenario. It is also necessary to consider the characteristics of the domestic academic situation in each subject area.

Acknowledgement

This research was supported by Korea Institute of Science and Technology Information.

About the authors

Jayhoon Kim is a Principal Researcher at the Korea Institute of Science and Technology Information, Daejeon, Korea. He is a Ph.D. candidate in Library and Information Science at the Chungnam National University. His research focuses on digital curation of scholarly information. He can be contacted at jay.kim@kisti.re.kr
Su-yeong Bae is a Research Engineer at the Korea Institute of Science and Technology Information, Daejeon, Korea. He received his MS degree in Computer Engineering from Hongik University. His research focuses on scholarly information service. He can be contacted at sybae@kisti.re.kr
Hyekyong Hwang is a Principal Researcher, Director of Content Curation Center at the Korea Institute of Science and Technology Information, Daejeon, Korea. She received her Ph.D.degree in Library and Information Science from Yonsei University, Korea. Her research areas are Content Curation of the Scholarly Communication, Open Access, Citizen Science, and AI Services. She can be contacted at hkhwang@kisti.re.kr
Sung-Seok Ko (Corresponding author) is a Professor in the Department of Industrial Engineering, Konkuk University, Seoul, Korea. He received his Ph.D. degree in Industrial and Systems Engineering from Georgia Institute of Technology, USA. His areas of research interests include Operations Research, Stochastic Process and Data Science. He can be contacted at ssko@konkuk.ac.kr

References


How to cite this paper

Kim, J., Bae, S-y., Hwang, H., & Ko, S-S. (2021). An analysis of use statistics of electronic papers in a Korean scholarly information repository Information Research, 26(1), paper 892. Retrieved from http://InformationR.net/ir/26-1/paper892.html (Archived by the Internet Archive at https://bit.ly/2OOXT55) https://doi.org/10.47989/irpaper892

Appendices


Table 5: Subject full name
Subject Subject full name
Agriculture Agriculture
Animals Animals (zoology)
Arts Arts & sports, entertainment
Astronomy Astronomy
Biology Biology & life sciences
Building Building & construction
Chemical Chemical engineering
Chemistry Chemistry
Computers Computers, Internet & systems
Earth Earth sciences & geology
Engineering Engineering
Fossils Fossils & prehistoric life
General General reference, library & information science
History History & geography
Home Home & family management
Language Language and literature
Management Management & public relations
Manufacturing Manufacturing (specific products)
Mathematics Mathematics
Medicine Medicine
Philosophy Philosophy, psychology, religion
Physics Physics
Plants Plants (botany)
Science Science
Social Social sciences
Technology Technology


Table 6: Number of articles in each subject area by year of deposit
Subject ~ 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 Total
Agriculture 21,048 2,223 4,781 39,245 8,745 4,420 4,807 4,354 8,399 7,067 3,554 2,473 2,317 113,433
Animals 2,953 217 422 553 349 384 284 421 142 249 255 153 274 6,656
Arts 4,698 549 1,126 13,087 1,156 1,035 619 1,018 790 2,623 954 809 1,493 29,957
Astronomy 3,806 394 1,662 1,480 499 615 759 502 508 956 581 365 119 12,246
Biology 12,332 1,156 2,528 2,842 2,070 1,252 930 1,189 826 1,302 1,134 954 985 29,500
Building 3,281 665 1,066 6,267 1,299 1,535 1,522 1,527 1,447 1,567 1,289 1,003 1,037 23,505
Chemical 19,959 2,323 4,225 18,795 2,851 2,561 2,465 3,697 2,107 3,872 2,381 1,993 2,213 69,442
Chemistry 5,413 973 1,322 3,387 7,463 1,447 1,294 1,273 1,971 406 380 595 236 26,160
Computers 20,474 2,515 5,218 6,748 3,359 4,484 3,240 4,084 2,526 4,596 3,860 3,347 2,791 67,242
Earth 3,604 626 937 511 678 650 765 676 1,247 807 553 548 512 12,114
Engineering 88,436 10,165 17,809 41,076 24,414 13,235 13,275 14,856 11,280 17,357 12,580 9,391 7,795 281,669
Fossils 0 0 0 0 0 0 0 0 60 370 78 108 96 712
General 2,164 421 669 14,645 550 318 270 525 235 594 386 370 876 22,023
History 1,226 95 1,246 646 271 275 951 302 294 336 229 142 130 6,143
Home 16,079 1,075 2,954 3,672 1,494 1,592 1,341 1,487 958 1,619 1,313 750 520 34,854
Language 1,611 81 170 81 457 88 184 84 63 116 48 44 48 3,075
Management 6,789 1,322 2,214 1,422 2,235 1,246 1,129 995 835 2,332 1,254 924 1,102 23,799
Manufacturing 11,301 1,333 3,770 10,658 2,140 3,021 2,307 1,950 1,232 2,305 1,591 1,167 1,053 43,828
Mathematics 8,884 1,266 1,704 1,100 1,306 1,038 1,187 1,693 2,654 1,671 1,657 1,428 1,845 27,433
Medicine 47,409 10,323 15,435 33,901 9,903 8,333 7,524 9,255 8,371 12,792 7,137 5,049 6,876 182,308
Philosophy 561 34 219 88 128 122 69 49 99 115 98 76 57 1,715
Physics 3,734 284 471 1644 270 267 247 283 193 299 252 176 198 8,318
Plants 4,175 219 387 267 401 393 359 272 369 620 332 427 470 8,691
Science 10 11 1,235 691 114 122 376 146 114 307 181 128 92 3,527
Social 14,588 2,018 6,076 22,291 14,378 3,478 2,795 3,385 3,357 6,035 3,183 4,137 3,026 88,747
Technology 1,093 150 261 4,950 16,858 456 404 338 438 606 247 1,440 211 27,452
Total 305,628 40,438 77,907 230,047 103,388 52,367 49,103 54,361 50,515 70,919 45,507 37,997 36,372 1,154,549


Table 7: Number of deposited articles in each subject area by year of publication
Subject ~ 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 Total
Agriculture 70,813 4,423 4,066 4,411 4,467 4,157 3,906 3,814 3,502 3,096 2,826 2,379 1,611 113,471
Animals 3,746 275 312 306 314 351 259 318 182 218 128 158 89 6,656
Arts 18,311 1,128 1,019 1,144 1,176 1,123 1,039 1,020 930 862 880 799 526 29,957
Astronomy 5,786 574 720 542 577 817 471 512 717 540 593 315 86 12,250
Biology 17,145 997 1,076 1,288 1,283 1,149 890 1,000 960 1,048 891 970 710 29,407
Building 9,870 1,191 859 1,368 1,689 1,336 1,047 1,336 1,040 957 1,056 994 764 23,507
Chemical 40,850 2,702 2,966 2,901 2,797 2,762 2,298 2,640 2,165 2,478 1,953 1,857 1,122 69,491
Chemistry 14,846 1,094 1,501 1,472 1,483 1,508 1,273 1,150 404 413 240 587 189 26,160
Computers 27,185 3,168 3,504 3,532 3,490 3,624 3,565 3,570 3,385 3,515 3,592 3,050 2,182 67,362
Earth 5,889 560 487 502 523 548 552 590 516 514 531 524 378 12,114
Engineering 143,792 12,042 12,403 13,121 13,107 13,097 12,555 12,906 11,976 12,219 10,221 8,715 5,715 281,869
Fossils 0 46 49 42 45 33 42 34 73 84 79 115 70 712
General 17,087 588 482 408 426 421 376 388 360 442 398 369 278 22,023
History 2,926 365 392 320 325 311 306 309 265 239 153 149 91 6,151
Home 19,529 1,703 1,778 1,666 1,497 1,409 1,309 1,321 1,294 1,214 1,042 680 412 34,854
Language 2,019 101 158 127 129 103 68 84 100 62 53 47 24 3,075
Management 11,206 972 1,033 1,054 1,067 1,242 1,109 1,180 1,138 1,145 1,065 898 713 23,822
Manufacturing 25,424 1,884 1,707 1,649 1,743 1,736 1,682 1,869 1,689 1,447 1,114 1,106 791 43,841
Mathematics 12,632 1,128 1,230 1,237 1,183 1,036 1,168 1,149 1,238 1,248 1,466 1,509 1,267 27,491
Medicine 104,940 7,262 7,006 7,038 6,972 7,372 6,966 7,309 6,831 6,478 4,997 4,939 3,991 182,101
Philosophy 725 90 87 117 99 88 79 81 74 79 82 75 39 1,715
Physics 5,318 316 318 271 254 274 240 290 236 255 220 186 140 8,318
Plants 4,747 312 362 333 375 385 294 292 260 381 324 341 269 8,675
Science 1,643 181 175 165 169 190 166 145 178 164 165 124 62 3,527
Social 52,296 3,850 3,684 3,322 3,554 3,494 3,132 3,202 2,585 2,666 2,587 2,526 1,897 88,795
Technology 22,406 947 495 525 681 399 458 411 318 252 254 205 101 27,452
Total 641,131 47,899 47,869 48,861 49,425 48,965 45,250 46,920 42,416 42,016 36,910 33,617 23,517 1,154,796


Table 8: Number of articles downloaded by year of publication
Publication year 2013 2014 2015 2016 2017 2018 2019
~ 2007 330,184 350,596 313,362 414,519 267,845 260,056 256,773
2008 31,240 31,606 27,948 34,950 24,491 23,557 24,429
2009 32,266 32,516 28,728 35,774 25,070 24,111 25,250
2010 34,421 35,211 31,668 38,697 26,834 25,866 27,168
2011 35,860 35,992 32,247 39,738 27,623 26,749 27,659
2012 37,884 38,070 33,679 42,318 28,137 27,044 27,242
2013 16,922 35,676 32,613 41,055 28,076 26,697 26,517
2014 - 22,808 33,708 41,705 29,334 27,627 27,436
2015 - - 13,693 40,680 28,025 26,368 26,276
2016 - - - 22,993 28,499 31,376 25,845
2017 - - - - 16,106 26,273 24,397
2018 - - - - - 10,343 20,413
2019 - - - - - - 11,458
Total 518,777 582,475 547,646 752,429 530,040 536,067 550,863


Table 9: Number of downloads by year of publication
Publication year 2013 2014 2015 2016 2017 2018 2019
~ 2007 1,651,968 1,650,208 1,438,021 1,562,802 1,106,106 1,190,678 1,220,344
2008 265,423 235,737 180,322 181,331 133,517 140,258 143,452
2009 306,449 266,183 193,681 187,956 137,771 143,742 156,207
2010 413,542 342,658 244,966 219,404 154,967 161,171 176,295
2011 540,514 414,261 282,368 242,929 166,755 171,919 185,519
2012 775,769 553,519 354,334 289,999 187,297 186,720 197,336
2013 245,474 727,886 434,797 326,353 195,773 195,745 206,652
2014 - 322,193 548,561 380,177 218,628 214,444 224,528
2015 - - 179,019 351,403 231,353 208,643 218,254
2016 - - - 160,560 300,144 266,684 243,099
2017 - - - - 126,931 283,524 268,429
2018 - - - - - 57,553 196,597
2019 - - - - - - 76,840
Total 4,199,139 4,512,645 3,856,069 3,902,914 2,959,242 3,221,081 3,513,552

Check for citations, using Google Scholar