header
published quarterly by the university of borås, sweden

vol. 26 no. 4, December, 2021



Mapping science: tools for bibliometric and altmetric studies


Veslava Osinska, and Radoslaw Klimas


Introduction. The study investigates whether online attention, carried out on social media or by video tutorials, affects the popularity of these tools in the research community.
Method. We collected data from the Web of Science, Scopus, YouTube, Facebook, Twitter, and Instagram, using web-scraping tools. Bibliometrics, altmetrics and webometrics were applied to process the data and to analyse Gephi, Sci2 Tool, VOSviewer, Pajek, CiteSpace and HistCite.
Analysis. Statistical and network analyses, and YouTube analytics, were used. The tools’ interfaces were assessed in the preliminary stage of the comparison. The results were plotted on charts and graphs, and compared.
Results. Social media and video tutorials had minimal influence on the popularity of different tools, as reflected by the number of papers within the Web of Science and Scopus where they featured. However, the small but constant growth of publications mentioning Gephi could be a result of Twitter promotion and a high number of video tutorials. The authors proposed four directions for further comparisons of science mapping software.
Conclusions. This work shows that biblio- and scientometricians are not influenced by social media visibility or accessibility of video tutorials. Future research on this topic could focus on evaluating the tools, their features and usability, or the availability of workshops.

DOI: https://doi.org/10.47989/irpaper909


Introduction

Science mapping or science visualisation is a quickly developing practice among researchers. It is used not only in widely defined information science from which it sprung, but also in other disciplines. Recent studies report that in the last few years, science mapping has gained more attention in scientific fields beyond information science (He, et al., 2019). The increasing popularity of science mapping creates a demand for new specialised software. Understood as a ‘generic process of domain analysis and visualisation’ (Chen, 2017a), this field of study should provide insight into the visual distribution of scientific data, as well as the groundwork for arguments. Science mapping software facilitates analysis, statistical measures and spectacular visual layouts, illuminating the specificity and nature of scientific knowledge. It can be expected that specialised software will be useful to researchers regardless of their disciplines, experience and seniority.

Bibliometric studies require varied and constantly developing tools for science mapping. We may wonder what the reasons are behind the researchers’ choice of tools. The international accessibility of tools also raises questions regarding the distribution of users according to nationality, and the impact of the language used in video tutorials. A study of science mapping should examine the teaching materials related to selected tools. It should also consider the contribution of software designers and users to the promotion of a specific tool.

Thanks to the growth of the Internet and social media, scholars may easily share their work and track the publications of their colleagues, as well as lead discussions, disseminate their own specific knowledge and propagate the results of their research (Levin, et al., 2016). Open Science involves not only sharing research findings and data but also the tools created and the codes of programs written to accomplish specific tasks. Therefore, the visibility of a given tool and the opinions shared online may decide about the tool’s further popularity and its position in the research community. This is also determined by the tool’s usability. Thus, the availability and quality of relevant online tutorials are also significant.

Nowadays, it is more convenient to watch a video tutorial, which allows the viewer to quickly examine the presented tool, than to read a massive manual, available as a large PDF file. To better understand this phenomenon, many scholars study the patterns of tool usage, and the effectiveness of interactive videos and illustrated textbooks (Merkt, et al., 2011). Attractive video tutorials not only help a researcher to make a decision, but also educate them on the often complex functions of a given tool. High-quality free tutorials allow the software designers to find a larger audience, popularising the tool among less experienced users. A thoughtful positioning strategy can make a significant contribution to the tool’s success. Studies show that the number of videos on a given topic correlates with its popularity and future potential (Munaro, et al., 2020).

The authors of this article define users’ activity on social media in terms of interactions with video tutorials, and analyse it with the use of altmetrics. This article aims to compare certain tools in terms of bibliometrics and altmetrics. Additionally, it examines the relationship between the online visibility of a given tool (representation in social media and the accessibility of tutorials) and usage (reflected by the number of publications mentioning the tool).

Social media and websites can be used to promote the tool and to communicate with the users. The authors hypothesise that social media activity and accessibility of video tutorials may increase the tool’s popularity and usage, reflected in scientific publications. The authors described analysing the tutorials’ space as tutmetrics. The bibliographic data collected from the Web of Science and Scopus, as well as YouTube, Facebook, Twitter, and Instagram statistics, were used to test this hypothesis for the most significant scientometric tools of the last decade, such as Gephi (Bastian et al., 2009), VOSviewer or Pajek. This is the first study of the effect of online attention on usage.

Science mapping terms and tools in related studies

The science of science studies in the scientometric approach draws not only on quantitative research, but also on the qualitative analyses of scientific landscapes (Chen, 2004). Science mapping may depict fields of research focused on specific issues, or relations between scientific groups or individuals. This practice appears under different names. The most popular is science maps, which focuses on the end result, with science mapping as a synonym, encompassing the entire process of data acquisition, analysis, and visualisation, including all the stages of data processing. Scholars agree that the aim of science mapping is to analyse a domain of scientific knowledge that is reflected in an aggregated collection of intellectual contributions from members of a scientific community, or more precisely defined specialities (Chen, 2017a). Chen (2004) has coined a placeholder term for science mapping, knowledge-domain visualisation. Scholars of knowledge organization studies also refer to this practice as domain analysis (Smiraglia, 2015).

The most prominent bibliometrician, who introduced the idea of a citation index for science measuring, Eugene Garfield, was one of the first who explored the visual representations of scientific research (Garfield, 1955; 1964). In his revolutionary work, he analysed citation data to write the history of science. He traced the discovery of DNA following the publication of Isaac Asimov’s book The genetic code, and analysed the citation linkages among forty related works. His paper presented the development of the concept of the DNA structure on a chronological chart. The chart showed that a reference system does not have to be a one-dimensional linear diagram; instead, it might be coded on a plane with more than three variables: time (sequence), domain category (square corner), Asimov’s network metrics (colour) and cited name (hub) (Allgood, et al., 2007). The practice of mapping the connections between key papers allowing for a reconstruction of the intellectual history of a significant discovery was called historiography (Garfield, 1964). The practice was further applied and developed by the designers of the HistCite application (which can be downloaded from this website, to be used at different stages of scientometric analysis).

Most of the later scientometric studies established relations between different papers by co-citation analysis, which used statistical techniques such as factor analysis, multi-dimensional scaling, cluster analysis and Kohonen networks (Osinska and Bala, 2015; van Eck and Waltman, 2010; Muñoz-Muñoz and Mirón-Valdivieso, 2017; White, 1998). In his work on citation patterns, Garfield (1994) also coined terms referring to visual scientometrics such as scientography and scientographs . Despite the fact that the term ‘scientography’ is the most appropriate description of ‘graphing of science,’ it is not widely used.

Gradually, the amount of bibliographic data analysed in a single article increased. Co-citations, co-words and co-authors have been treated as nodes and links in constructing network visualisations. Next, network science, open-data initiatives and network analysis software were developed in response to the interest in science mapping; an interdisciplinary research field emerged on the borders of scientometrics, graph theory and information visualisation.

Börner and colleagues founded the Web service Places and Spaces to popularise science mapping and promote maps with the widest influence (Allgood, et al., 2007). She described the practice of science mapping in her Atlas of science ( Börner, 2010). The readers were introduced to the issues in science mapping, as well as to basic scientometric laws, paradigms, and scientometrics’ relation with scientific disciplines. Börner presented the significant developments in science mapping on a timeline, starting with the late 1950s. Until the 1970s, visualisations of relationships between publications and their authors were made manually or semi-manually, with drafting tools and millimetre paper. It should be mentioned here that basic skills such as graph construction were a required element of high school education.

The timeline presented several layers of milestone events in the history of science mapping (Börner, 2010, p.26-47). Börner identified these layers as algorithms, visualisations, tools, and books. The first layer allows the viewers to track the history of emerging algorithms and techniques that facilitated the reduction of input data and their presentation. Examples such as the Kamada-Kawai algorithm, pathfinder network scaling, latent semantic analysis, and self-organizing maps constitute one of the historical sequences, known today as the basis for graphic layouts (Chen, 2004). The timeline also shows key publications which established the main directions in science mapping. We see the importance of visualisation when we refer to the tools which implemented these algorithms. The majority of software is open-source and free because it was produced at research labs for non-commercial use. The names of tools we use nowadays appeared in scholarship in the second half of the 1990s. Pajek, the first free software for network analysis and visualisation, was designed in 1996 at Ljubljana University in Slovenia; it is one of the tools considered in this study.

Since 2000, the timeline has extended significantly because a wide variety of mapping software was introduced. These new programs are designed for the analysis of large and complex networks: UCINET, TouchGraph (both currently paid applications), visone, and SoNIA. Other programs are oriented towards dynamics analysis (DyNet, NetVis, and CiteSpace) and changes in time (HistCite), or geospatial and temporal analysis (Geotime). A variety of tools allows for the analysis of research trends, using bibliography and other document data for a given period. Thanks to the development of graph modelling, some researchers call this practice bibliographic netography (Börner, 2010; van Eck and Waltman, 2010). The wide applicability of the generation of software specialising in social network analysis can be observed, such as Facebook friend groups.

The evolution of visualisation interfaces followed the development of information and communications technologies as well as application software. Since the era of personal computers which began in the 1970s, computer-made maps have been appearing in scientific publications. The first interactive visual maps emerged in the 1980s. Since the 1990s, 3D graphics have been increasingly popular with the users; followed by 3D visualisation projects (for example, IN-SPIRE in 1994) entering academic writing (Börner, 2010).

Science mapping and visualisation depend on the accessibility of tools and the users’ proficiency, as well as their familiarity with bibliographic databases, statistics and visualisation techniques, which together determine whether the analysis will be successful. Therefore, a consistent characterisation of all mapping software would be desirable. Young researchers are often unfamiliar with specific scientometric practices and their history. Since 2011, library and information science scholars have been conducting comparative studies of science mapping and visualisation tools (Börner, et al., 2010; Cobo, et al., 2011; Pradhan, 2017; Bankar and Lihitkar, 2019). Mingers and Leydesdorff (2015) relied on those findings in constructing their own scientometric framework. Chen (2004, 2017b, 2017a), the designer of the popular CiteSpace application and an interface expert, has reviewed science-mapping scholarship and software in a number of books and papers.

Authors of previous studies on science mapping selected various tools, considered to be representative. Thus, the earliest list of science mapping software includes the most widely used programs (Börner, 2010; Cobe, 2011): BibExcel, CiteSpace, In-SPIRE, and others. The longest and most complete set lists twenty software applications, including the R Bibliometrix package for scientists with programming skills (Pradhan, 2017). In contrast, Chen’s (2017b) study focused on only three applications: his own CiteSpace, VOSviewer and CitNetExplorer. Developers presented CitNetExplorer as an extension module of VOSviewer (van Eck and Waltman, 2014), and therefore, this article treats them as one unit. These last two programs, designed by van Eck and Waltman, are particularly significant for bibliometrics today.

The definitive list of science mapping software was composed based on the following criteria:

Therefore, the final set included seven programs, listed in chronological order: Pajek (1996), CiteSpace (1999), HistCite (2004), Gephi (2006), Sci2 Tool (2009), VOSviewer (2010) and CitNetExplorer (2017). It is worth mentioning that the selected tools are equipped with bibliographic data-processing modules, required of specialised software today, which should perform many tasks. What makes the selected list different from those used in previous works is the exclusion of BibExcel, an older generation (created on the DOS operating system but updated on the Windows platform) portable program for the advanced processing of multivariate data derived from global databases such as the Web of Science and Scopus.

Methods

The rapid growth of information and communication technologies means that programs quickly become outdated. Academic software evolves to match the needs of the scientific community. This justifies the authors’ decision to repeat a comparative study of the specialised software.

This article introduces a new hybrid method. It is believed that combining the following four different approaches will allow for a more precise evaluation of the considered software:

  1. Literature review to trace the usage of a given tool. The usage can be determined by the number of publications featuring the tool and their citation patterns, analysed with the use of bibliometrics.
  2. Analysis of the accessibility and viewership of the tutorials published on social media, particularly on YouTube. This is a novel approach in the evaluation of specified software/method popularity, as online teaching is a relatively recent phenomenon.
  3. Analysis of the tools’ online visibility and their promotion on various social media platforms. The public’s interest in a given program can be quantified by current users’ online activity, analysed with the use of altmerics.
  4. A comparison of features and functionalities. The tools are designed for different purposes and as such, they perform different functions.

Bibliometric analysis

The Web of Science and Scopus databases contain data regarding papers, which reflect the usage of science mapping and visualisation tools. It was assumed that the particular software used for the bibliometric research would be mentioned not only in the metadata, such as title, abstract, or keywords, but will also leave a footprint in the references. Indeed, the researchers might mention the tool in the full text of the paper, for example, in the section on the methodology, or in the captions under figures. Because the databases do not provide access to the full text of the publication dataset, the research was limited to metadata as well as references contents. The aim was to find all papers mentioning the tools listed above, and then to conduct a network analysis and topic analysis of the found data. This would show which programs were more popular and used more often in research.

Video tutorial statistics

The video statistics of software tutorials (official and unofficial) can be sources of valuable information. In order to analyse these, a table of 404 YouTube video tutorials was prepared, with the columns as follows:

  1. Program name.
  2. The name of the video.
  3. Year of uploading.
  4. YouTube link.
  5. View count.
  6. Language.

YouTube was selected because of its popularity and accessibility. The videos were selected manually, with the names of the tools used as a search term. The relevant items were compiled into a playlist, which was then converted into a table by a YouTube playlist scraper. The tool converted the YouTube playlist into a table containing the first four data elements listed above (from a to d). View counts (as of 29 April 2020) were retrieved manually, as was the information regarding language, which was determined from the title or/and the content of the video.

Altmetric analysis

Altmetrics is an alternative form of measuring academic reputation, which accounts for online research outputs, such as social media, online news media, or online reference managers (Priem, et al., 2011; Mcfedries, 2012). Previous research results (Maricato, et al., 2018) indicate that the impact measured using altmetrics gives an accurate image of scientific relationships, like bibliometrics or scientometrics.

Other studies (Ryan, et al., 2016, Ryan, et al., 2019) show that information practices of academics are often replicated in informal online environments such as social media. Because of this, Facebook, Twitter and Instagram were carefully examined for user frequencies. The goal was to determine if the developers of a given tool promoted it and how often were the relevant posts published. This kind of self-promotion is a significant activity that ensures communication between users and developers, and shows the developer’s involvement in the tool’s development and perception.

Additionally, official web pages were examined to define the promoted image of the tools. All online activity affects a tool’s popularity and influences the public’s perception of it. The number of posts was found manually for Facebook, or by Foller.me also for Twitter. Foller.me is a free tool for the automatic measurement of selected social pages, focusing on specific statistics, such as the number of posts.

Comparison of features

Basic information about the abovementioned tools is shown in Table 1.


Table 1: Basic information regarding popular scientometric tools
Name YearAuthor/sAssociated institutionWebpage Operating system
VOSviewer2010Nees Jan van Eck, Ludo WaltmanLeiden University (The Netherlands) www.vosviewer.comMS Windows, Mac OS X, other
CitNetExplorer2017 as aboveas abovewww.citnetexplorer.nl as above
CiteSpace 1999Chaomei ChenDrexel University (USA)cluster.ischool.drexel.edu/~cchen/citespace/ MS Windows, Mac OS X
Science of Science (Sci2 Tool )2009Cyberinfrastructure for Network Science CentreIndiana University (USA)Sci2.cns.iu.eduMS Windows, Mac OS X
Gephi 2006 Mathieu Bastian, Mathieu Jacomy, and community nonegephi.org/MS Windows, Mac OS X, Linux
Pajek1996 Vladimir Batagejl, Andrej MrvarUniversity of Ljubljana (Slovenia) vlado.fmf.uni-lj.si/pub/networks/pajekMS Windows, Mac OS X, Linux
HistCite2004Eugene Garfieldnonehistcite.software.informer.comMS Windows

The two tools: Sci2 and Histcite were not updated within the last year, which may indicate that the developers have ceased working on them. Information about the operating system that the software runs on was taken from the manual or instructions. Visual scientometric tools had different timespans, and likely the numbers of users they attracted changed over time. Many users could have switched to other tools, which offered features better suited to their needs. Science mapping is a complex process that consists of several phases, such as: 1) data gathering, 2) data processing (namely preparing to the required format), 3) mapping into the output space using selected clustering or aggregation algorithms (visualisation), 4) data analysis and modelling through a changing visualisation layout, filtering and so on, 5) graphic design like manipulating visual variables (size, colour, shape, patterns), and 6) the validation and interpretation of a given pattern (Börner, et al., 2003; Osinska and Malak, 2014). It should not be surprising that tool designers are trying to provide all these tasks in one application and the integration of multifunctions can be observed within contemporary software. However, some differences in the scope of functionality between the above-mentioned tools are detected and presented in the following table.


Table 2: Basic information regarding popular scientometric tools
NameMain data functionsInput data formats/sourcesOutput data formatsVisual layouts Documentation
VOSViewer

CitNetExplorer
processing, visualisation, analysis, reporting CSV, TXT, GML, NET, MAT, CLU, VEC, RIS, ENW, WoS plaintext, Scopus, PubMed, Dimensions, RefWorks, Microsoft Academic, EuropePMC, Semantic Scholar, OCC, COCI, Wikidata CSV, TXT, GML, NET, CLU, VEC, PNG, SVG, JPG, PDF3attached free ebook, linked free video tutorial
CiteSpacegathering, processing, visualisation, analysis, graphic design reporting (also in the form of movie )CSV, XML, PDF, TXT, WoS plaintext, Scopus, PubMed, CrossRef, Dimensions, ADS, arXiv, CNKI, CCSSI, Derwent, NSF, ProquestNET, TSV, PNG, VIZ, LAYOUT, HTML, CSV, PAJ, LIST, GraphML6linked paid ebook, link to unavailable file of manual
Science of Science (Sci2 Tool)gathering, processing, visualisation, analysis, reporting GraphML, XML, NET, NWB, WoS plaintext, BIB, ENW, Scopus, NSF, MAT, EDGE, CSV XML, GraphML, MAT, NET, NWB, JPG, PDF, PS19 linked free ebooks and linked free online manual
Gephivisualisation, filtering, analysis, graphic design, reporting (also in the form of movie)CSV, TSV, EDGES, XLS, XLSX, GraphML, GEXF, GDF, NET, DL, DOT, TPL, VNA, GV, TGF, GEPHI SVG, PDF, PNG, CSV, TSV, EDGES, GraphML, GEXF, GDF, NET, DL, VNA, GEPHI12linked free online manual, linked free video tutorials
Pajek processing, visualisation, analysis, graphic design, reporting (also in the form of movie).NET, MAT, VGR, GED, DAT, BS, MAC, MOL, CLU, VEC, PER, CLS, HIEEPS, PS, SVG, JPEG, BMP, VOSViewer, X3D, Kinemages, VRML, MOL, R, SPSS, TXT, XLS, GEDCOM9 attached ebook
HistCiteprocessing, analysis, visualisation reporting.WoS plaintext,TXT, HCINET, PAJ, DOT, PS1built-in manual

The comparison of different functionalities shows that tools are sometimes compatible with each other (through the use of common formats) and offer different methods of visualising the scientometric data.

Results

Bibliometric analysis of science mapping and visualisation software

The analysed applications were created at different times, at intervals of 10 or more years. Thus, it was necessary to choose one period, when all the programs were being used. Eventually, 2010–2019 was chosen for the comparison. We gathered data from the Web of Science and Scopus databases by constructing a query that would yield publications, which mentioned only one science-mapping tool. For example, for the Pajek application, we used the following expression for Topic (or Title-Abstract-Keywords in Scopus): ‘Pajek’ AND NOT ‘CiteSpace’ AND NOT ‘VOSViewer’ AND NOT ‘Sci2’ AND NOT ‘Gephi’ AND NOT ‘CitNetExplorer’ AND NOT ‘HistCite. In this way, it was possible to obtain papers where only Pajek was used for science mapping and domain analysis. Table 1 presents the query results for the two databases (from May 22, 2020). As can be expected from the stronger coverage of scientific literature by Scopus in comparison with Web of Science (Wouters, et al., 2015), Scopus provided more data. An additional advantage is that this index analyses the cited references of articles with no Scopus record, which is in contrast with the related function of the Web of Science - “Cited Reference Search” (Wagner, 2015; Pranckute, 2021). Using the following query for the same Pajek software in Scopus ‘( REF ( Pajek ) AND NOT REF ( VOSviewer ) AND NOT REF ( CiteSpace ) AND NOT REF ( HistCite ) AND NOT REF ( Sci2 ) AND NOT REF ( Gephi ) ) AND PUBYEAR > 2009 AND PUBYEAR < 2020’, we received 3782 publications citing this application name at least once. All software query results demonstrate (Table 1, column 4) values in order of magnitude greater than the metadata search. Assuming that the references data is the most representative for revealing the differences between the tools, we used it for further comparative analyses.

The third query used an OR operator between search terms and collected data on all software tools mentioned, which was then used for further network analysis. This operation returned 10755 records. The difference between the sum and negative query results was 736 (Scopus). This was the number of publications featuring several tools (at least more than one) to visualise data, or, possibly, to compare them.


Table 3: The number of publications related to software in the Web of Science and Scopus databases.
ApplicationWeb of Science (Topic) Scopus (TI-AB-KW)Scopus (Ref)
Pajek142 1543782
Gephi 1752453781
VOSviewer3693621165
CiteSpace443 498991
HistCite 4759153
Sci2 Tool17 14105
CitNetExplorer142042
Total1207135210019

In Table 3, through the Scopus references results, we can see four leaders (bold): Pajek, Gephi, VOSviewer and Citepsace. HistCite and Sci2 Tool citations clearly stand out from this group. This made it necessary to visualise the results of the publications’ dynamics on two charts (Figure 1). The number of records for VOSviewer has increased exponentially in the last five years. The same observations can be attributed to CiteSpace. However, Gephi, which gathered a large community of users since 2010, has constant and most significant growth of articles where Pajek’s usage is referenced, characterised by stable dynamics with no greater changes, which confirms that this traditional software has its own user groups.

p909fig1
Figure 1: Dynamics of publications regarding the seven tools since 2010 by Scopus data.

It is worth noting that programs originate in, or maintain ties with different countries: Pajek (Slovenia); CiteSpace (USA, China); HistCite and Sci2 Tool (USA); VOSviewer and CitNetExplorer (Netherlands); and Gephi (France). The publications from the country of the developer(s) were analysed specifically to determine whether the software remained local or reached the global community of researchers. The top records showed that more than 10% of the publications were authored by scholars connected with China and the US: they mentioned CiteSpace (84% and 9%), VOSviewer (24% and 12%), Gephi (13% and 13%), Pajek (23% and 17%) and HistCite (36% and 28%). The Sci2 Tool was not used by Chinese researchers who authored the publications in this sample; bibliographic data led to the US, Germany, Netherland and England (47%, 24% 12% and 11%, respectively). Other countries featuring prominently in the metadata were Spain (VOSviewer, Pajek); Brazil and India (Gephi, Pajek); Slovenia (Pajek); and Russia (HistCite).

We also investigated the dissemination of knowledge regarding the programs, taking into account the source of information. Usually, software developers first publish papers reporting on the new product. Accordingly, we strived to find whether the most cited work related to a given software application was published by one of the designers. The results of this investigation, which was based on the data from the Web of Science database, are shown in Table 4. We can see that the developers were often responsible for the popularity of their software. However, in the case of Pajek, Leydesdorff (2006), as well as Batagejl and Mrvar (2002), promoted the application.


Table 4: Most cited paper(s) regarding the seven tools in the Scopus database.
ApplicationCited paperCitations count Ratio to Total, %
Pajek1) Batagejl, V., and Mrvar, A. (2002).
2) Leydesdorff, L., and Vaughan, L. (2006)
135
261
10
12
CiteSpaceChen, C. (2006).1527 27
HistCite1) Garfield, E. (2004).
2) Garfield, E. (2009).
169
177
17
15
VOSviewer
CitNetExplorer
van Eck, N.J., and Waltman, L. (2010).
Van Eck, N.J., and Waltman, L. (2014)
2596
190
42
67
Sci2 ToolLight, R.P., Polley, D.E., and Börner, K. (2014). 30 24
GephiJacomy, M., Venturini, T., Heymann, S., and Bastian, M. (2014).90241

The study also identified the disciplines or research areas where the science-mapping software was applied. This article follows the Web of Science’s classification of disciplines. The six disciplines were analysed where the use of the tools studied was most prominent. Thus, seven disciplines where the seven programs were most often used formed the basis for the matrix (Table 5). The percentage values indicate the ratio of publications related to a selected discipline to the total number of publications where a given tool was used.


Table 5: Disciplinary distribution of publications featuring the seven programs.
 Information and
Library science
Computer
science
Environmental
sciences
Ecology
EngineeringBusiness
economics
Social sciences
other topics
Chemistry
Pajek25%38%1%8%6%9%1%
CiteSpace11%24%17% 16%14%16%1%
HistCite47%35%4%0%8%4%1%
VOSViewer18%16%12% 10%16%5%0%
Sci2 Tool 30%6%0% 22%0%0%41%
Gephi10%31%1%12%7%13%1%
CitNetExplorer28%28%0%0%7%7%0%

We can see (from the percentages in bold) that information science (library and information studies) and computer science dominated. Sci2 Tool is an exception, as it featured prominently in chemistry publications and quite strongly in engineering. It should be noted that social sciences include educational research. The sum query (based on logical OR) included all tools studied. The metadata (i.e., the titles, keywords, and abstracts) were analysed for co-words by setting a co-occurrence threshold (the minimal number of pairs of words taken into account), at 10. The final network shows two separate clusters (Figure 2). In the left cluster, we can find Pajek and Gephi, and in the right - CiteSpace, VOSViewer (and CitNetExplorer) and HistCite. The latter includes typical bibliometric software applications and needs that may be identified by terms and expressions occurring around the cluster (Figure 2 presents the most relevant part of the network). These are citation analysis, bibliography-based studies, and visualisations. The former cluster represented primarily researchers specialising in social network analyses, algorithms, and studies closer to computer sciences. The Sci2 Tool, which appeared in two forms (‘SCI’ and ‘SCI2’), was near to both clusters.

p909fig2
Figure 2: Network of co-words, characterising publications mentioning the seven tools

Availability of video tutorials

The analysis of video tutorials was limited to videos uploaded between 2010 and 2019; 339 videos fit the criteria. It was based on metadata such as title, year of uploading, view count and language. The data regarding the year of uploading allowed us to track the patterns of uploading videos focused on specific tools and analyse how they changed over time. Figure 3 shows the annual number of uploaded videos for every program studied. There were no tutorials concerning CitNetExplorer. Only data for Gephi is continuous; the rest comes in discrete segments.

p909fig3

Figure 3: Dynamics of the yearly numbers of uploaded videos concerning the seven tools (2010–2019)

We can conclude that Gephi was popular enough that video tutorials were uploaded every year, while other programs remained unacknowledged for year-long periods. For example, no new video tutorials on Sci2 Tool were uploaded between 2012 and 2017. We can also see that the number of videos focused on VOSviewer had been rising rapidly from 2018 until the end of the decade. Overall, Gephi was the subject of a higher number of tutorials than the rest of the programs combined. The significant decrease in the numbers from 2013 to 2015 may raise some questions. According to the information on Gephi’s history available on Github (Levallois, 2017), the program was not updated between 2012 and 2015. The project had been developed chiefly by volunteers, who at the time were PhD students. It is likely that they devoted these three years to their academic careers.


Table 6: The numbers of uploaded video tutorials according to language (2010–2019).
Language 201020112012 20132014201520162017 20182019TOTAL
TOTAL 9 121437242445474879339
English911918 20162330 25 22183
Portuguese   634510 101654
Spanish    16131282345
Chinese    3  112  420
German   1     3127
French   6  1   7
Arabic        24 6
Indonesian         55
Korean  1      214
Turkish        21  3
Persian          22
Hebrew        1  1
Catalan    1      1
Russian       1   1

The next issue is the linguistic diversity of the uploaded video tutorials. Table 6 shows that English tutorials constitute the vast majority (183 of 339) of all videos studied. We can also analyse the changes in the numbers of videos uploaded over time according to language. Again, English tutorials seem to be a staple, the numbers consistently increasing.

However, by the end of the decade, the number of Portuguese and Spanish materials has increased. The sample also featured small numbers of videos in different languages (Persian, Turkish, Korean) and individual videos in others (Russian, Catalan, and Hebrew). It is worth noting that the language of the uploaded video constitutes information not only about the uploader, but also about their expectations regarding the audience.

The view counts of different videos were analysed to assess their popularity. Figure 4 shows the distribution of view counts per program The most popular tool (at the time when the data was collected) was Gephi (79%), the least popular, Sci2 Tool (with less than 1% of the views). Taking into account that Gephi tutorials were uploaded onto YouTube before any others, this is not surprising. The average yearly number of views of video tutorials devoted to Gephi is close to 150,000. Unfortunately, YouTube channels do not provide the data on the view count dynamics.

p909fig4

Figure 4: Total views of video tutorials per tool (2010–2019)

The names of YouTube videos can be as informative as the titles of academic articles. Because of the small number of items (339), text statistics were applied to the entire dataset, and the frequencies of title words were calculated. Table 7 presents the results of this analysis for the 10 most frequent terms. The most common phrases in the titles were the names of the tools, such as Gephi, Pajek, VOSviewer, HistCite, as well as words ‘tutorial’, ‘network’, ‘analysis’, ‘data’, ‘using’ and ‘how.

As far as text analysis is concerned, the network of material from short texts is presented, often used in science mapping (Leydesdorff and Nerghes, 2017). A network of title terms was constructed by means of co-word analysis, and measured. The betweenness centrality, which measures centrality based on the shortest paths in a graph, shows which nodes are the most pivotal for information flow in a network (Barabási, 2016). The four most essential words in our network are Gephi (0.874), VOSViewer (0.124), Pajek (0.070) and CiteSpace (0.054). High values of Gephi and VOSViewer indicate that they featured most often in the names of video tutorials. They were located inside the network of terms and were the most influential for the overall information flow.


Table 7: Video tutorial name word frequencies for the top ten records
TermFrequency%
Gephi 1797.99%
tutorial 492.19%
network381.70%
HistCite 311,38%
Pajek311.38%
VOSviewer291.30%
Data231.03%
analysis 220.98%
Using210.94%
How200.89%

Online visibility of the science mapping software

As commercial products, science mapping software needs to be promoted. It has become a standard of online promotion to advertise the product on social media and to encourage public discussion. Social media activity can be considered in terms of altmetrics, i.e. as a measure of the users’ interest in scientific research and tools. For the purposes of the comparison, we identified four primary activities that will allow us to assess the tool’s online visibility, which may reflect its popularity. These activities are:

  1. Official Facebook posts
  2. Official Twitter posts
  3. All Instagram posts with hashtags
  4. An official webpage, if available

Only two tools had official fanpages on Facebook - Gephi and CiteSpace. The social activity in terms of publishing posts was different for each. Since 2010, the Gephi fanpage has published only twelve posts, with the last one in 2014. On the other hand, the CiteSpace fanpage has published tens of posts yearly since 2011, coming to a total of 203 posts.

The activity on Twitter followed similar patterns. The only programs with their own accounts were Gephi and CiteSpace, but the activity there was significantly higher. Gephi had sent 1673 tweets since 2009 and CiteSpace had sent 470 tweets since 2010. It is worth noting that tweets were sent regularly. The other tools did not have dedicated Twitter accounts. The site did not make available information regarding the number of posts with a specific hashtag. Future research, using paid tools such as TweetBinder, may examine the date of this type.

There were no official accounts on Instagram, but the website makes statistics available to all users. Table 8 shows the number of posts featuring a hashtag with the name of the tool, such as #histcite or #sci2, as of June 2020. CitNetExplorer, as the newest application, had no mentions on Instagram.


Table 8: Data gathered from social networks
ApplicationFacebook postsTwitter postsInstagram hashtag posts
CiteSpace3247030 (#citespace)
Gephi2031673504 (#gephi)
Pajek001015 (#pajek)
HistCite 002 (#histcite)
Sci2 Tool0 0256 (#sci2)
VOSViewer0039 (#vosviewer)

Unfortunately, browsing the results showed that not all the posts were relevant to our study: for example, posts with the hashtag #pajek also contained images of spiders (the word pajek means 'spider' in Slovenian), so the data would have to be verified (manually or automatically) to eliminate such irrelevant posts.

The availability of the tools’ official websites was discussed above in relation to Table 1. Most of these are specialised pages with full descriptions, a download section and links to manuals. The only tool without a website was HistCite, but a link to download the previous version was available on the Software Informer Website, while the Clarivate site informs that it is no longer officially supported.

Summary and discussion

The development of science mapping tools is discussed with increasing frequency in biblio- and scientometric scholarship. Global bibliographic databases allow scholars to compare different tools. From 2013 onwards, the numbers of both publications and citations related to this specialised software have increased (He, et al., 2019; Lou, et al., 2020). A kinescope-like area chart, used in this study, clearly shows how these numbers changed.

The authors selected seven tools intended for scholars. The tools were compared in terms of bibliometrics, altmetrics, and online presence. In the first part (Figure 1), a significant increase can be seen in the number of articles concerning two of the programs: VOSviewer and CiteSpace, as well as moderate growth for Gephi and Pajek. The remaining programs, Sci2 Tool and HistCite, were used by a smaller number of authors. We cannot definitively characterise the usage of the more recent CitNetExplorer.

The altmetric analysis of users’ online activity took account of the alternative scientometric material. It encompassed tutorials concerning the tools discussed, as well as their digital footprint on social media: Facebook, Twitter, Instagram. We can formulate several observations regarding the tutorials:

The results of bibliometric and altmetric analyses may show whether the popularity of the programs among the researchers, reflected by the mentions in publications, corresponds with the popularity of related tutorials. The number of papers did not increase after the tutorials were uploaded. This means that they might have no direct impact on the researchers’ decisions regarding the selection of their tools. However, in the case of VOSviewer, the rapid increase in the number of tutorials uploaded might be a result of the increased number of publications (for the years 2017–2019).

The online presence had no discernible effect on the popularity as reflected by publications. For example, VOSviewer featured in a high number of publications despite the lack of an official social media presence. A webpage might have promoted the tool among the broader audience, but the academics’ decisions are influenced by different factors. However, it is worth noting that:

The most cited papers concerned with the tools studied were written by the developers, which shows that they effectively disseminated information regarding their products in the scientific community. The range and dissemination of learning materials depend on both the authors’ and the users’ activity on YouTube and other social media. In theory, preparing tutorials and strong social media-driven communication with users show that the developer is involved in developing and creating a public image, which the users will appreciate. However, the results of the study showed that scholars for whom the programs are intended have different needs.

The analysed tools represent different generations of software, with different user interfaces, features, and purposes, but they are all related to bibliographic data analysis and visualisation. The co-word analysis of bibliographic data of publications featuring the programs showed that they are used for two primary purposes: typical bibliometric analysis (VOSviewer, CiteSpace and CitNetExplorer, HistCite) and a turn towards social network analysis (Gephi, Pajek, Sci2 Tool). The results also revealed that both CiteSpace and VOSviewer are used outside library and information science.

All programs (excluding Sci2 Tool) featured internationally, although the publication authorship was dominated by Chinese and US researchers. We can observe the growth of the scientific community writing in English, Spanish, and Portuguese. However, we could not identify any specific country from these data; these languages could be used as a first or second language in multiple countries. English dominates the video tutorial space, where fourteen languages in total were present.

Conclusions

The primary aim of this article was to compare the presence of seven visual scientometric tools in scholarship and on social media, taking into account their digital footprint and the relevant video tutorials.

We have found that the relationship between the tool’s online presence (including the accessibility of video tutorials), and the usage, measured by bibliometrics, was minimal. Thus, the authors’ hypothesis that online visibility affects the tool’s popularity with scholars was proved only partially true. There must be other factors determining the popularity of science mapping software in the scientific community.

To compare the programs, the authors combined four methods: 1) literature review, i.e. bibliometrics, 2) an analysis of video tutorials accessibility and their usage, i.e. tutmetrics; 3) an analysis of social media and Internet promotion/visibility, i.e. altmetrics, 4) a comparison of features. The data were plotted and visualised. It is worth noting that the authors proposed a term for this purpose and used it in previous research: tutmetrics (Osinska and Klimas, 2021). This concept may contribute to the study of the users of science mapping software, and their information needs. The results of such an analysis can be especially useful to software creators, who might consider them as guidelines for further projects.

Analysing the video tutorials space (N=339) identified fourteen languages, dominated by English. Various languages of the video tutorials reflected the geospatial evolution of the community using scientific software. It seems that there is a need for a more in-depth study of the educational materials available online and their impact on the scholars’ selection of tools.

Software developers can popularise their products by publishing in global databases. Bibliometric analysis demonstrated that this strategy is effective, as the developers’ publications had the most citations. However, some tools were used in scientific papers despite the lack of promotion. This means that there are factors not considered in current research, which have a significant impact on the selection between the tools. Nevertheless, research combining bibliometric study with other methods originating in web data, and scholars’ communication and behaviour on specialised networking services, should be continued and developed.

Future research will focus on further assessments of programs intended for scholars, their features as well as their usability, and availability of workshops. The authors also plan to analyse discussions of the mentioned software conducted on social networking sites. The development of a program for scientists is a complex and difficult process, but it can be successful if the developers pay proper attention and prepare well.

About the authors

Veslava Osinska is an Associate Professor at the Institute of Information and Communication Research of Nicolaus Copernicus University in Torun, Poland. Her research interest is in science visualisation. She is a memeber of ISKO, Polish chapter.
Radoslaw Klimas is a PhD student of the Doctoral School Academia Rerum Socialium at Nicolaus Copernicus University in Torun. He is interested in science mapping.

References

Note: A link from the title is to an open access document. A link from the DOI is to the publisher's page for the document.


How to cite this paper

Osinska, V., and Klimas, R. (2021). Mapping science: tools for bibliometric and altmetric studies. Information Research, 26(4), paper 909. Retrieved from http://InformationR.net/ir/26-4/paper909.html (Archived by the Internet Archive at https://bit.ly/3oWTkVW) https://doi.org/10.47989/irpaper909

Check for citations, using Google Scholar