header
published quarterly by the university of borås, sweden

vol. 27 no. 1, March, 2022



Bibliographic coupling: a main path analysis from 1963 to 2020


Tsu-Jui Ma, Gwo-Guang Lee, John S. Liu, Rae Lan, and Jung-Hsiu Weng.


Introduction. This paper applies a unique citation-based method to analyse the development trajectory of bibliographic coupling research from 1963 to 2020.
Method. Main path analysis is applied to a citation network created based on bibliographic data retrieved from the Science Citation Index Expanded and Social Sciences Citation Index of the Web of Science.
Analysis. The analysis incorporated two major phases of basic bibliometrics (publication and citation trends, highly cited authors and core journals) and main path analysis.
Results. A total of 223 papers on bibliographic coupling by 461 authors were identified. The papers were cited 5,575 times across ninety-three journals. The main paths, consisting of twenty-six documents, were further divided into emergence, opening, and consolidation periods on the basis of their publication date.
Conclusions. This paper is the first to present the development trajectory of bibliographic coupling research. It not only emphasises how current state-of-the-art bibliographic coupling developed but also highlights critical studies in the field.

DOI: https://doi.org/10.47989/irpaper918


Introduction

In academic research, documents cite other published documents such as journal articles, books, research reports, news sources and online sources. This practice is referred to as citation or direct citation, to be more specific. Since the 1960s, numerous studies on citations have been conducted, with their scopes ranging from simple citation counts to more complex representations of citations such as bibliographic coupling, co-citation and citation networks (Garfield, 1988; Hummon and Doreian, 1989; Liu, 1993).

Bibliographic coupling occurs when two documents cite a common third document (Kessler, 1963a). It is frequently used as a measure of similarity among documents (Small, 1973; Egghe and Rousseau, 2002; Cobo et al., 2011; Hjørland, 2013; Zupic and Čater, 2015; Aria and Cuccurullo, 2017). Co-citation occurs when two documents are cited together by a common third document (Small, 1973), and bibliographic coupling and co-citation are often confused or wrongly equated (Garfield,1988; Glänzel and Czerwon, 1996), even though bibliographic coupling (Kessler, 1963a) was proposed a decade earlier than co-citation (the notion of co-citation was introduced independently by Marshakova-Shaikevich (1973) and Small (1973).

This research highlighted the differences between bibliographic coupling and co-citation. First, bibliographic coupling analyses the citing documents, whereas co-citation analyses the cited documents (Small, 1973; Weinberg, 1974; Bichteler and Eaton III, 1980; Garfield, 1988; Shibata et al., 2009; Cobo et al., 2011; Hjørland, 2013; Vogel and Güttel, 2013; Fujita et al., 2014; Zupic and Čater, 2015; Aria and Cuccurullo, 2017). Second, bibliographic coupling is retrospective and commonly deemed as static, whereas co-citation is prospective and dynamic (Weinberg, 1974; Bichteler and Eaton III, 1980; Hjørland, 2013; Vogel and Güttel, 2013; Zupic and Čater, 2015; Aria and Cuccurullo, 2017). In other words, bibliographic coupling does not change over time, but co-citation does. Neverthless, it should be noted that the number of documents with whom a fixed document is bibliographically coupled also increases over time. That is an often forgotten dynamic aspect of bibliographic coupling and is one of reasons why bibliographically coupling is useful for document clustering. In recent years, an increasing amount of research has been conducted on bibliographic coupling. All related research can be traced back to the sentinel paper by Kessler (1963a). The present study focuses on bibliographic coupling and traces its development trajectory.

Literature review

The concept of bibliographic coupling

The term bibliographic coupling was first introduced by Kessler (1963a). Most scholars thereafter followed Kessler's usage, although a few notable papers on bibliographic coupling have used the term bibliometric coupling (e.g., Hummon and Doreian, 1989; Thelwall and Wilkinson, 2004; Nicolaisen and Frandsen, 2015; Suominen et al., 2019). Yet the term bibliographic coupling is more commonly used today.

The concept of bibliographic coupling relates to two documents when they both cite a common document (Kessler, 1963a; Shibata et al., 2009; Fujita et al., 2014). For instance, Figure 1 shows that if both documents A and B cite document C, there is bibliographic coupling between documents A and B. The arrow indicates direct citation from the new and citing document (arrow end) to the old and cited document (arrowhead).

p918fig1

Figure 1: Concept of bibliographic coupling (adapted from Garfield, 1988; Shibata et al., 2009; Vogel and Güttel, 2013; Fujita et al., 2014; Zupic and Čater, 2015)

The coupling strength of bibliographic coupling refers to the number of times a single document is cited by two target documents, where a higher coupling strength indicates more citations shared by the two target documents (Kessler, 1963a; Vladutz and Cook, 1984). Coupling strength is often regarded as an indicator of document-document (semantic) similarity (Kessler, 1963a; Ahlgren and Jarneving, 2008; Ahlgren and Colliander, 2009). In other words, it measures the similarity of the subject matter of the two target documents. Furthermore, the coupling strength for document-document similarity is larger if the number of common documents cited by the two target documents is more important (Kessler, 1963a; Vladutz and Cook, 1984; Ahlgren and Jarneving, 2008; Ahlgren and Colliander, 2009). Coupling strength does not change over time because bibliographic coupling is retrospective and static (Kessler, 1963a; Martyn, 1964; Bichteler and Eaton III, 1980; Vladutz and Cook, 1984; Vogel and Güttel, 2013; Zupic and Čater, 2015; Aria and Cuccurullo, 2017).

Bibliographic coupling can be used to measure the similarity between units other than documents; for example, author-author or journal-journal relationships (Egghe and Rousseau, 2002; Cobo et al., 2011; Zupic and Čater, 2015; Aria and Cuccurullo, 2017). Examples as a single unit of analysis of bibliographic coupling are document bibliographic coupling (Kessler, 1963a), author bibliographic coupling (Zhao and Strotmann, 2008) and journal bibliographic coupling (Small and Koenig, 1977).

Previous reviews on bibliographic coupling

This subsection discusses the previous reviews on bibliographic coupling. Weinberg (1974) published the first review on bibliographic coupling, which reviewed the theory and practice of bibliographic coupling from 1963 to 1973. The significance of this article lies in that it elaborates on bibliographic coupling's development in its first decade, beginning with Kessler's (1963a) first introduction of bibliographic coupling and the next milestone by Small's (1973) first introduction of co-citation. Liu (1993) summarised the progress of citation studies between the early 1960s and early 1990s including the role of citation, analytical citation tools (e.g., Science Citation Index), techniques for citation analysis (e.g., bibliographic coupling and co-citation), and various topics on citation research (e.g., citation concepts, functions, motivation, and quality). The aforementioned study highlighted the essential role of bibliographic coupling in the development of citation studies. Recent years have seen a shift in the contents of reviews on bibliographic coupling. Specifically, by combining bibliographic coupling with other methods (e.g., co-citation) for citation analysis, these reviews (e.g., Vogel and Güttel, 2013; Merigó et al.,2017; Wang and Tang, 2018; Dominko and Verbič, 2019; Shukla et al., 2020) have focused on various academic topics (e.g., the dynamic capability view in strategic management, market mechanisms for carbon emission reduction and the economics of subjective well-being) or discussed how bibliographic coupling was embraced in specified academic journals (e.g., International Journal of Intelligent Systems and Computer Methods and Programs in Biomedicine).

In summary, previous reviews on bibliographic coupling served their purposes in their time. As bibliographic coupling research continues to transform gradually, a review with a deeper historical perspective is necessary to reflect on its development. The current study revisits the development of bibliographic coupling from its introduction by Kessler (1963a) onward by employing bibliometric analysis and main path analysis.

Research method

Data and bibliometric analysis

In 1934, the term bibliometrics was first coined by Otlet (Rousseau, 2014). Bibliometric analysis (also known as bibliometrics) refers to the application of mathematical and statistical methods to distinguish models in bibliographic data to provide useful information on documents, specifically regarding publication, citations, and authorship (Pritchard, 1969; Reitz, 2014).

In this paper, the Web of Science was adopted as the data source from which scholarly papers and citations were gathered. Two citation indexes within the Web of Science were selected, namely the Science Citation Index Expanded and Social Sciences Citation Index. The papers and citations from the Science Citation Index Expanded or Social Sciences Citation Index were assumed to involve relevant scholarly information.

This study retrieved data on March 29-31, 2020 (Web of Science data updated on March 27, 2020). Given that the concept of bibliographic coupling was proposed by Kessler (1963a), the data set contains bibliographic information from 1963 to 2020. A total of 237 documents containing the terms bibliographic coupling, bibliographic couplings, bibliometric coupling or bibliometric couplings in their title, abstract or keywords were identified and retrieved. After extracting the articles or reviews and excluding the early accesses in the document types of Web of Science, the total number of papers was reduced to 223, and the bibliographic and citation information of these papers was used for analysis.

Main path analysis

Concept of main path analysis. Main path analysis was first introduced by Hummon and Doreian (1989). It uses the relations among the documents in a citation network to trace the main development of a topic. In general, documents in a citation network refer to scholarly papers (e.g., Hummon and Doreian, 1989; Batagelj, 2003; Liu and Lu, 2012), patents (e.g., Batagelj, 2003; Verspagen, 2007) or legal cases (e.g., Liu et al., 2014). The present study focused on scholarly papers.

A citation network is composed of many citation chains, which is a sequence of documents connected by citation relationships. In a citation chain, starting nodes are indicated as sources, which are the origins of knowledge that do not cite any documents, and ending nodes are referred to sinks, which are the terminals of knowledge diffusion and are not cited by any document. The term intermediates refer to documents that cite other documents and are also cited by other documents (Liu and Lu, 2012; Ma and Liu, 2016). The results of main path analysis are presented in the form of citation chains. In the main path drawing, the line thickness is proportional to the traversal weights, reflecting the importance of the citation relation. In other words, thicker lines play more significant roles in knowledge diffusion (Liu and Lu, 2012; Chen et al., 2013).

Procedure of main path analysis. Main path analysis involves two steps (Hummon and Doreian, 1989; Batagelj, 2003; Liu and Lu, 2012; Liu et al., 2019). The first step assigns traversal weight, which measures the importance of a citation link. In the second step, major paths are searched through the implementation of a proper search algorithm. The resulting main paths are considered to be the representative paths in the citation network.

Hummon and Doreian (1989) recommended three alternatives to traversal weights: node pair projection count, search path link count and search path node pair. Batagelj (2003) introduced efficient algorithms for counting search path link count and search path node pair. Specifically, he proposed a new traversal weight: search path count, and further suggested that search path count is a good option because of its favourable properties (Batagelj, 2003; Liu and Lu, 2012; Liu et al., 2019, 2020). Nevertheless, Liu et al.suggested adopting search path link count since it regards every node (article) as an independent knowledge source. Comparing with the other traversal weights, it fits better with the academic knowledge diffusion scenario. Therefore, this study adopted search path link count to count traversal weights.

Hummon and Doreian proposed the use of a priority-first search algorithm after a traversal weight has been assigned for each citation link, whereby one starts from a source and always selects the next citation link that has the locally largest traversal weight until a sink is reached (Verspagen, 2007; Liu et al., 2019, 2020). The algorithm was referred to by Liu and Lu (2012) as local search. Batagelj (2003) alternatively recommended the critical path search algorithm, which distinguishes citation chains according to the value of the overall traversal weights (Liu et al., 2019, 2020). Liu and Lu (2012) termed such an approach global search. They found that neither local search nor global search guarantees the inclusion of the link with the largest traversal weight. Therefore, they proposed the use of a key-route search algorithm, which starts with a seed link, usually the link with the largest traversal weight, and then searches forward until a sink is reached while simultaneously searching backward until a source is reached (Liu et al., 2019). Notably, in the key-route search algorithm, one can apply either local or global search when searching forward or backward (Liu and Lu, 2012; Liu et al., 2019). This algorithm warrants that the top citation link is included in the resulting main paths since it is served as the seed link. Furthermore, one can specify as many seed links as possible (Liu and Lu, 2012; Chen et al., 2013; Liu et al., 2019).

Efficient algorithms for finding traversal weights and search algorithms are implemented in Pajek, a program for the analysis of large networks (Batagelj and Mrvar, 1998; Batagelj, 2003; de Nooy et al., 2018). This study used Pajek to conduct the necessary main path analysis.

Research results

Basic bibliometrics of bibliographic coupling

Publication and citation trends of bibliographic coupling. The publication statistics of the 223 identified papers are shown in Figure 2. The trends of publication can be divided into four stages. Stage one marks a period of forty-nine years, from the initiation of the term bibliographic coupling by Kessler (1963a) to 2011, during which period only a few papers are published. Stage two (2012-2016) witnessed a sharp increase in the number of publications, ranging from thirteen papers (5.830%) to eighteen papers (8.072%). In stage three, from 2017 to 2019, more than twenty papers were published yearly (from twenty-five papers [11.211%] to thirty-nine papers [17.489%]). However, only seven papers (3.139%) were published in 2020 (up until March) when stage four began.

p918fig2

Figure 2: Publication trends of bibliographic coupling (Source: Web of Science)

Regarding citation trends (Figure 3), the total citation count grew slowly in the early years but increase rapidly since 2011 and reached 5,575 by March 2020. From 1963 to 1986 (twenty-four years), yearly citation counts were as low as only 2 (0.036%) to 14 (0.251%). The yearly citation counts in the period between 1987 and 2010 (twenty-four years) increased from 11 (0.197%) to 95 (1.704%). The citation count exceeded 100 from 2011 to 2012, ranging from 121 (2.170%) to 134 (2.404%) and exceeded 200 in the period from 2013 to 2014, ranging from 253 (4.538%) to 259 (4.646%). This increasing trend continued in the period from 2015 to 2016, ranging from 393 (7.049%) to 411 (7.372%). Then one witnessed a peak in yearly citation count exceeding 500 from 2017 to 2018 (586 [10.511%] up to 792 [14.206%]). Unexpectedly, a new record was reached in 2019 when the citation count increased over 1,000 (1,481 times [26.565%]). As for 2020, because of the limited data for only three months, the citation count was only 213 (3.821%).

p918fig3

Figure 3: Citation trends of bibliographic coupling (Source: Web of Science)

Highly cited authors of bibliographic coupling. In this study, a total of 223 papers on bibliographic coupling by 461 authors were identified. Table 1 presents the results of the analysis of the top ten authors who are most cited (a total of 3,528 times [forty-three papers published]). Namely the following are the top three highly cited authors, who were each cited for their three papers: Kessler (866 times), Boyack (416 times) and Klavans (416 times). Furthermore, Merigó (275 times cited) was a prolific author with seventeen papers published. Porter (three papers published), van Eck (two papers published) and Waltman (two papers published) were cited 274 times respectively. On the same note, the remaining three authors (Li, Glänzel and Wang) were each cited 252 times (two papers published), 245 times (seven papers published) and 236 times (one paper published).


Table 1: Highly cited authors of bibliographic coupling (Source: Web of Science)
Author's namesNumber of times citedNumber of papers published
Kessler, Maxwell Mirton8663
Boyack, Kevin W.4163
Klavans, Richard4163
Merigó, José M.27517
Porter, Alan L.2743
van Eck, Nees Jan2742
Waltman, Ludo2742
Li, Munan2522
Glänzel, Wolfgang2457
Wang, Zhong Lin2361
Total3,52843

Core journals of bibliographic coupling. As presented in Table 2, the 223 bibliographic coupling papers were published in ninety-three different journals. The top three core journals published ninety-eight papers (43.946%), including Scientometrics (sixty-one papers [27.354%]), Journal of Informetrics (nineteen papers [8.520%]) and Journal of the Association for Information Science and Technology (eighteen papers [8.072%]). Furthermore, seven papers (3.139%) were published by two journals each, six papers (2.691%) by one journal, four papers (1.794%) by one journal, three papers (1.345%) by three journals each, two papers (0.897%) by nine journals each. Accordingly, the remaining seventy-four papers (33.184%) were each published by seventy-four journals.


Table 2: Core journals of bibliographic coupling (Source: Web of Science)
Journal titlesNumber of papers% of the total
Scientometrics6127.354%
Journal of Informetrics198.520%
Journal of the Association for Information Science and Technology*188.072%
PLOS ONE73.139%
Technological Forecasting and Social Change73.139%
Information Processing & Management**62.691%
Sustainability41.794%
IEEE Access31.345%
Information Research31.345%
Journal of Documentation31.345%
Computers & Industrial Engineering20.897%
International Journal of Management Reviews20.897%
Journal of Cleaner Production20.897%
Journal of Engineering and Technology Management20.897%
Journal of Information Science20.897%
Research Evaluation20.897%
Research Policy20.897%
Social Science Information20.897%
Turkish Journal of Electrical Engineering & Computer Sciences20.897%
All the other 74 journals combined7433.184%
Total223100.000%
* The journal was entitled American Documentation (two papers) and changed its title to Journal of the American Society for Information Science (two papers), Journal of the American Society for Information Science and Technology (nine papers) and Journal of the Association for Information Science and Technology (five papers).
** The journal was entitled Information Storage and Retrieval (two papers) and changed its title to Information Processing & Management (four papers).

Main path analysis is employed on a citation network to explore the research trajectory of bibliographic coupling. The main paths of bibliographic coupling at key-route 20 are shown in Figure 4, which is drawn with Pajek. In Figure 4, each node represents a paper and is attached with a label that comprises information of the author(s) and publication year. For example, Meyer-brötzSB2017 indicates that Meyer-brötz is the first author, the initials of the second and third authors' last names are S and B, and the paper was published in 2017. Furthermore, for duplicate labels, a lower letter is appended at the end. For instance, the 'a' in Kessler1963a and the 'b' in Kessler1963b indicate two papers published by Kessler in 1963. The arrows highlight knowledge flow direction, and their thicknesses imply the relative significance of the knowledge flow. The main paths exhibit a typical divergence-convergence pattern (Liu and Lu, 2012; Liu et al., 2019) in which VladutzC1984 (Vladutz and Cook, 1984) and BoyackK2010 (Boyack and Klavans, 2010) are two distinct divergence-converging points. Accordingly, the paths can be divided into three different periods: the emergence period (1963-1984), the opening period (1984-2010) and the consolidation period (2010-2020).

p918fig4

Figure 4: Main paths of bibliographic coupling (key-route 20

The emergence period: 1963-1984. The twenty-one-year emergence period started with the source Kessler1963a and ended with VladutzC1984. This early period presents basically a one-person show by Kessler, who subsequently published Kessler1963b (Kessler, 1963b) and Kessler1965 (Kessler, 1965).

Kessler1963a first introduced bibliographic coupling as 'a single item of reference used by two papers'. In Kessler's series of research, he took papers from the journal Physical Review as data to conduct his research, 265 papers in Vol. 97 (Kessler, 1963a), 8,186 papers from Vol. 77 to Vol. 111 (Kessler, 1963b) and 334 papers in Vol. 112 (Kessler, 1965), respectively. Kessler1963a tested the two graded criteria, whereas Kessler1963b extended the research to a larger population. Eventually, Kessler1965 compared the results of bibliographic coupling and analytic subject indexing. Nearly 20 years later, VladutzC1984 introduced a variant on Kessler's original formula and examined 9,996 publications from the 1981 annual Science Citation Index. The study applied the coupling strength measures and found that in over 85% of cases, two documents with higher coupling strength are justified to be closely related by subject.

The opening period: 1984-2010. The twenty-six-year opening period diverges from VladutzC1984 to Persson1994 (Persson, 1994), Jarneving2001 (Jarneving, 2001), Jarneving2005 (Jarneving, 2005), Jarneving2007a (Jarneving, 2007a), Jarneving2007b (Jarneving, 2007b), AhlgrenJ2008 (Ahlgren and Jarneving, 2008), AhlgrenC2009 (Ahlgren and Colliander, 2009), and then converges to BoyackK2010. This intermediate period indicates the opening of a combination of bibliographic coupling with the other citation-based (or bibliometric) methods.

Persson1994 chose 209 articles from Journal of the American Society for Information Science (1986-1990) and revealed the intellectual base and research front based on document bibliographic coupling and author co-citation. In addition, Jarneving2001 selected 1,334 source publications in cardiovascular research in Science Citation Index Expanded (1999-2000). He used document bibliographic coupling and journal co-citation in combination with quantitative analysis of the titles for citation analysis. Thus, this citation analysis in clusters was based on bibliographic coupling, as well as co-citation, which was apparent in the cognitive structure. Furthermore, Jarneving2005 gathered 7,239 articles of environmental journals in Science Citation Index (1991-2000) and compared the results of the two bibliometric methods (document bibliographic coupling and document co-citation) in mapping the research front. Later, Jarneving2007a adopted a multidisciplinary approach by collecting 619,570 articles from the 2003 annual Science Citation Index. He applied a method of document bibliographic coupling in combination with the complete link cluster method for science mapping. This method provided adequate descriptions of the research front and other core documents, although its application had some tight restrictions. Similarly, Jarneving2007b compiled 268 articles in the field of organic chemistry in Science Citation Index (2002-2003) and applied the method proposed in Jarneving2007a. The results confirmed the presumptions for coherent clusters of documents in the research front.

Furthermore, AhlgrenJ2008 extracted forty-three articles from the journal Information Retrieval (2004-2006). They compared two document-document similarity approaches (bibliographic coupling and common abstract stems) that occurred in cluster solutions for science mapping. Similarly, AhlgrenC2009 extracted the same original data used by AhlgrenJ2008 and experimentally compared five document-document similarity approaches (two were text-based, one was citation-based bibliographic coupling and two were a combination of text-based and bibliographic coupling) under first-order and second-order similarities that achieved in cluster solutions from science mapping. In 2010, a large-scale study, BoyackK2010, generated 2,153,769 articles in the biomedical literature from MEDLINE (2004-2008). The study compared document bibliographic coupling, document co-citation and document direct citation, and suggested that the citation approach represents the research front most accurately. Among the three citation approaches, bibliographic coupling slightly outperformed co-citation when both accuracy estimates were used, and direct citation was found to be the least accurate mapping method.

The consolidation period: 2010-2020. Beginning with BoyackK2010, the consolidation period branches into two streams. The branch on the top begins with GlänzelT2011 (Glänzel and Thijs, 2011) and GlänzelT2012 (Glänzel and Thijs, 2012). It then flows through ThijsSG2013 (Thijs et al., 2013), ZupicČ2015 (Zupic and Čater, 2015), Díez-vialM2017 (Díez-Vial and Montoro-Sánchez, 2017), and finally ends at the two sinks SkuteZHD2019 (Skute et al., 2019) and Skute2019 (Skute, 2019). The branch at the bottom consists of a series of works: CollianderA2012 (Colliander and Ahlgren, 2012), WaltmanV2012 (Waltman and van Eck, 2012), BoyackK2014 (Boyack and Klavans, 2014), KlavansB2017 (Klavans and Boyack, 2017), Meyer-brötzSB2017 (Meyer-Brötz et al., 2017), YuWZZL2017 (Yu et al., 2017) and ZhangXZ2019 (Zhang et al., 2019). This latest period marks the consolidation of bibliographic coupling used in citation analysis.

For the branch at the top of Figure 4, GlänzelT2011 selected 20,726 and 31,462 documents in Web of Science, respectively, from 1999-2000 and 1999-2003 in the subject category of Public, Environment & Occupational Health, and Energy & Fuels. They used a hybrid approach in the context of document bibliographic coupling for the notion of core documents. Thus, the core documents defined with this hybrid approach can be used to represent clusters and topics at different levels of aggregation. As an extension of this study, GlänzelT2012 selected over 1,000 documents each from the category of Public, Environment & Occupational Health, Biomedical Engineering, Geography, and Obstetrics & Gynaecology in Web of Science from 1999-2003 (time window is 2004-2008). They used the same hybrid approach in the different time windows in the context of document bibliographic coupling for the notion of core documents. The core documents defined with this hybrid approach can be used to detect and label new emerging topics. Similarly, ThijsSG2013 selected 14,264 documents in the category of Public, Environment & Occupational Health from the 2006 annual Web of Science. They also used a hybrid approach (with document bibliographic coupling and textual similarity) to identify second-order similarity. Accordingly, this hybrid approach revealed that second-order similarity could provide added-value for the more general clusters. ZupicČ2015 later presented the bibliometric methods of direct citation, bibliographic coupling, co-citation, co-word and co-author. Further, it recommended a workflow for conducting science mapping studies after searching 465 articles from the journal Organizational Research Methods (2001-2014) and performed a document direct citation and document co-citation to map the intellectual structure of this journal.

Furthermore, Díez-vialM2017 found 318 articles on science parks and incubators from Social Sciences Citation Index (1990-2015). They applied document co-citation to identify foundations of topics during four periods (1996-2000, 2001-2005, 2006-2010 and 2011-2015) and used document bibliographic coupling to identify new trends of topics for the five-year period (2011-2015). Additionally, SkuteZHD2019 retrieved 435 articles in the field of university-industry collaborations (Web of Science, 2011-2016). They employed document bibliographic coupling and document co-citation for mapping this field that can be systematically clustered, resulting in three interconnected levels: individual, organisational and institutional. Skute2019 identified 587 articles on the topic of academic entrepreneurship in Web of Science from 2008-2017; utilised document bibliographic coupling for mapping this topic that can be clustered yielding four interconnected clusters: the anatomy of entrepreneurial university, university spinoff development and technology commercialisation, the identity of academic entrepreneurs and knowledge transfer and regional economic impacts.

Regarding the branch at the bottom of Figure 4, CollianderA2012 retrieved 58,885 articles in a scientific context of Abridged Index Medicus (2008-2009). They used document bibliographic coupling to measure document-document similarity and found that the second-order similarity consistently outperforms the first-order similarity. WaltmanV2012 conducted a study of 10.2 million publications in the fields of sciences and social sciences of Web of Science (2001-2010). They introduced a new methodology (based on direct citation relations between documents) for constructing a publication-level classification system of science, and they found that the primary limitation lies in its sole reliance on direct citation. In other words, this methodology may have more comprehensive applications in bibliographic coupling relations. Similarly, BoyackK2014 selected 19,804,633 documents from Scopus (1996-2011). They used a document-level model and map of science using a combined document co-citation and document bibliographic coupling process. This combined dynamic global model could be used to answer planning-related questions such as those pertaining to the identification of emerging topics in science.

In the same vein, KlavansB2017 conducted a large-scale study by selecting 2,044,538, 18,095,283, and 48,533,301 documents from Scopus (2010, 2011-2013 and 1996-2012, respectively) to identify which type of citation analysis generates the most accurate taxonomy of scientific and technical information. They compared document clustering using bibliographic coupling, co-citation and direct citation, and revealed that direct citation provided a more accurate representation than either bibliographic coupling or co-citation. Additionally, Meyer-brötzSB2017 selected 8,577 articles in the top ten journals of the Engineering Manufacturing category from the Journal Citation Reports (2011-2015). They employed an experimental method for evaluating parameter settings (effects of first- and second-order similarity, edge cutting, and weighting factors) in the calculation of hybrid similarities. Notably, this paper was the first to demonstrate the effects of several parameters in the context of hybrid similarities (the combination of citation-based [e.g., bibliographic coupling] and textual similarities) at the document level. YuWZZL2017 chose 7,303 papers on the topic of data envelopment analysis from Web of Science (1980-2017). To identify research topics, they proposed the Louvain model, which is a hybrid self-optimised clustering model based on citation links (document bibliographic coupling and document co-citation links in Amsler network) and textual features (statistical and topological feature). This Louvain model could acquire realistic and valid clustering results. Finally, ZhangXZ2019 collected 10,996 documents in the field of oncology from PubMed (2007-2016). They proposed an extended citation model (composed of four networks: direct citation, bibliographic coupling, co-citation and textual similarity) for scientific document clustering and demonstrated that it could acquire realistic clustering results more than models of bibliographic coupling and textual similarity.

Discussion

Since the 1960s, numerous studies on citations have been conducted, with early research focused on bibliographic coupling (Garfield, 1988; Hummon and Doreian, 1989; Liu, 1993). No extant literature on bibliographic coupling, however, reviews its development based on rigour and mathematically-based methodology. Therefore, this paper traces back to the seminal paper by Kessler (1963a) to review its development trajectory, applying a state-of-the-art method – main path analysis.

Although methodologically interesting, this study bears limitations in its data, which is collected from citation indexes (the Science Citation Index Expanded and Social Sciences Citation Index) and therefore may not include all the relevant works. In addition, main path analysis is known to be somewhat imprecise for recent literature as recent articles have yet to receive the citations they deserve. For example, Shen et al. (2019) seems to have made interesting methodological contribution but is not on the main paths. Taking the algorithmic idea of TF-IDF (term frequency-inverse document frequency), they propose to compute bibliographic coupling strength by weighting the number of references and their frequency in the database. Such a fundamental contribution is likely to be on the main paths were our analysis to be conducted sometime later.

As shown in Figure 1, a groundbreaking development in the publication trends occurred around 2012. Forty-nine years after the first introduction of bibliographic coupling by Kessler, the average yearly publication number increased from merely zero to five to more than ten. In parallel, as shown in Figure 2, citation counts also exhibit a noticeable increasing trend. Beginning in 2011, a similar increase was seen in the same pattern with the average yearly citation count, which rose from two digits to hundreds. This evidence indicates an apparent expansion of publications and citations on bibliographic coupling in 2011-2012.

According to Table 1, Kessler, Boyack and Klavans were the top three highly cited authors. They together published a total of six out of the twenty-six papers (three by Kessler and three by Boyack and Klavans) on the main paths. This suggests that the works of these three authors play a vital role in bibliographic coupling research.

As mentioned, ninety-eight papers were published in the top three core journals (sixty-one, nineteen and eighteen papers, in Scientometrics, Journal of Informetrics, and Journal of the Association for Information Science and Technology, respectively) (Table 2). Among the twenty-six papers in the main paths (Figure 4), twenty were published by these top three core journals (ten, three and seven papers, respectively): Scientometrics , Journal of Informetrics, and Journal of the Association for Information Science and Technology. Taken together, the findings support that these three are the core journals on bibliographic coupling research.

In Figure 4, two research streams extend from BoyackK2010. There is no clear-cut difference in the early development of the two streams as they all focus on document clustering or classification. However, the recent studies of the two streams discuss very different topics. The upper stream focuses on applying bibliometric methods including bibliographic coupling to conduct literature reviews while the lower stream continues the discussions on document clustering related methodologies. For example, ZupicČ2015, Díez-vialM2017, SkuteZHD2019, and Skute2019 in the upper stream apply bibliographic coupling to review literature in the management and innovation fields, while ZhangXZ2019, YuWZZL2017, Meyer-brötzSB2017, etc. in the lower stream involve discussions on similarity and clustering.

Conclusions

To summarise, the contents of the twenty-six nodes (documents) on the main paths indicate that bibliographic coupling research can be divided into three periods: the emergence period (1963-1984), the opening period (1984-2010) and the consolidation period (2010-2020). In the emergence period, bibliographic coupling research was initiated solely by Kessler in 1963 and 1965. In the opening period, bibliographic coupling research was conducted in combination with other citation analysis methods. In the consolidation period, bibliographic coupling research was consolidated with other citation-based methods. Furthermore, three key papers, namely the source (Kessler1963a) and two distinct divergence-converging points (VladutzC1984 and BoyackK2010) were identified. Kessler1963a introduced bibliographic coupling. VladutzC1984 applied the coupling strength measures and found that two documents with higher coupling strength are justified to be closely related by subject. BoyackK2010 compared bibliographic coupling, co-citation and direct citation, and suggested that the citation approach can most accurately represent the research front. Among the three citation approaches, bibliographic coupling slightly outperformed co-citation; direct citation was found to be the least accurate mapping method.

This study is the first to present the main paths of bibliographic coupling research in addition to basic bibliographic analyses of authors and journals. The results of main path analysis indicate that consolidated analysis based on various citation structure and textual similarity will be a trend in future research.

Acknowledgements

The authors thank the reviewers (for their helpful comments on the original paper) and copy-editors (for their helpful work in enabling the author to satisfy the style requirements of the journal).

About the authors

Tsu-Jui Ma, corresponding author, is a senior librarian at the library of National Taipei University. He received his PhD candidacy from Graduate Institute of Management, National Taiwan University of Science and Technology. His contact address is matsujui@live.com
Gwo-Guang Lee is a professor of Information Management at National Taiwan University of Science and Technology. His contact address is lgg@cs.ntust.edu.tw
John S. Liu is a professor of Technology Management at National Taiwan University of Science and Technology. His contact address is johnliu@mail.ntust.edu.tw
Rae Lan is an associate professor of Foreign Languages and Applied Linguistics at National Taipei University. Her contact address is raelan@gm.ntpu.edu.tw
Jung-Hsiu Weng is a postgraduate of Library, Information and Archival Studies, National Chengchi University. Her contact address is angel19981020@gmail.com

References

Note: A link from the title, or from "(Internet Archive)", is to an open access document. A link from the DOI is to the publisher's page for the document.


How to cite this paper

Ma, T-J., Lee, G-G., Liu, J.S., Lan, R., & Weng, J-H. (2022). Bibliographic coupling: a main path analysis from 1963 to 2020 Information Research, 27(1), paper 918. http://InformationR.net/ir/27-1/paper918.html. (Archived by the Internet Archive at https://bit.ly/3MBzglX) https://doi.org/10.47989/irpaper918

Check for citations, using Google Scholar