published quarterly by the university of borås, sweden

vol. 24 no. 3, September, 2019

Proceedings of RAILS - Research Applications, Information and Library Studies, 2018:
Faculty of Information Technology, Monash University, Australia, 28-30 November 2018.

Minding the gap: investigating the alignment of information organization research and practice

Philip Hider, Hollie White, and Hamid R. Jamali.

Introduction. The issues that practitioners want researched and those that are studied by researchers are often considered not to align very well. This paper investigates the extent to which a gap between research and practice exists in the field of information or knowledge organization, using a novel index of topical overlap between research and practice discourses.
Method. The degree of alignment was measured by comparing samples of research-oriented and practice-oriented discourse published between 2013 and 2017. Information organization research was represented by scholarly articles, Information organization practice by professional blogs, webinars and conferences.
Analysis. The texts were analysed using software which identified the most frequently used terms; the number of top terms, following deletion of generic terms and normalisation, that overlapped between corpora constituted the index of topical overlap.
Results. The number of overlapping terms between information organization research and practice corpora was about halfway between the number of overlapping terms between highly similar and unrelated corpora.
Conclusions. The results suggest a fair degree of alignment between information organization research and practice. The index used needs further testing, but appears to be a promising, unobtrusive tool for comparing the degree of alignment between research and practice in different fields


'Gap n. -- Any opening or breach in an otherwise continuous object; a chasm or hiatus.' (OED Online)

The difference between what practitioners do and what researchers investigate is often called a gap (Booth, 2003; Davis et. al., 2013; Gummesson, 2014; Haddow and Klobas, 2004; McFarlene, Kahili and Johnson, 2014; Ponti, 2008; Pullins et. al., 2017; Rubins, 2014; Weston and Bain, 2015). The use of the term 'gap' implies the perception that research and practice within the same fields should be like a 'continuous object', something united and connected. A gap suggests it is not connected and that there is a divergence or divide. Previous discussions by McFarlene, Kahali and Johnson (2014), Yates (2015) and Nguyen and Hider (2018) have noted that applied fields are more likely to experience this perceived gap. Research investigating these phenomena is found in social work (Davis et. al., 2013; Rubins, 2014), business (Gummesson, 2014; Pullins, Timmonen, Kaski and Holopainen, 2017), health care (McFarlene, Kahali and Johnson, 2014; Yates, 2015), education (Weston and Bain, 2015) and library and information science (Haddow and Klobas, 2004; Nguyen and Hider, 2018).

While many researchers emphasize the gap between research and practice, others emphasize ways of bridging the gap and seek to identify those areas and researchers who are more successful in doing so. For instance, Bastow, Dunleavy, Tinkler and Aguilera (2014) discuss the connection between research and practice by examining the impact of research across the social sciences. Very little research, however, has focused on the relationship between research in smaller fields or subfields and their corresponding larger communities of practice. Existing approaches for measuring relationships tend either to be based on perception or assume that distinctions between the academy and practice are clear-cut, which is not the case for fields like librarianship and information science. Yet are these gaps between practice and research perception or reality? To what extent is there truly division or alignment between research and practice?

This paper examines the extent to which research and professional discourse are aligned within the subdomain of knowledge or information organization (hereafter information organization). We define information organization here as that part of librarianship and information science concerned with such activities as cataloguing, classification, controlled vocabulary, ontology, indexing and social tagging, though in the study we report the emphasis will be on cataloguing and metadata librarianship. The alignment, or overlap, between information organization research and practice is considered a precondition for research impact (and indeed, for practice to impact on research).

This paper begins by reviewing literature about the relationship between librarianship and information science research and practice, as well as the perception of that relationship. To expand on the discourse surrounding the topic, a keyword analysis study was conducted. Using purposive sampling techniques, information organization research article abstracts were compared with descriptions of information organization-focused practitioner continuing education and professional blogs from a five-year period (2013-2017). Results and discussion report normalized term frequencies as well as co-word analysis using VOSviewer software; contextualize findings that indicate the extent of alignment between information organization research and practice; and highlight the topics that most align. The potential of the methodology for future research is also outlined.

Literature review

Many terms are used to describe the relationship between library and information science research and practice. These terms include, gap, divide, alignment and overlap: all of these words sit on a terminological continuum, either showing a connection or break between research and practice within the library and information science domain. It is unclear whether library and information science research and practice is aligned or unaligned. Previous studies have shown that the extent of the perceived divide or alignment varies by country and domain (Powell, Baker and Mika, 2002; Pymm and Hider, 2008; Schlögl and Stock, 2008).

Australian academics and practitioners in particular have been interested in evaluating this topic. Haddow and Klobas (2004) found 11 types of gaps between research and practice in regard to communication. These gaps include knowledge, culture, motivation, relevance, immediacy, publication, reading, terminology, activity, education and temporal. Pymm and Hider's (2008) research found that senior library staff saw value in consulting research articles, but this is in contrast to Haddow's (2001) earlier research that found newsletter publications as preferable to practitioners (though in this case both senior and less senior ones). In 2016 the Australia Library and Information Association (ALIA) and Charles Sturt University created the Relevance 2020 'series of research events'(Nguyen, 2017, p. 3) with 'the main purpose of connecting academics, researchers and practitioners in order to help align future research projects and activities in the Australian library and information science profession' (Nguyen, 2017, p.4). Nguyen and Hider (2018) report more detailed findings rom the focus groups conducted during the Relevance 2020 series. Commenting on the library and information science community, Nguyen and Hider (2018, p. 5) state that, 'it would appear that little LIS research is used to address practical issues' and that,

research, which tends to be carried out in academia, does not always originate from practice, nor necessarily solve problems in, or even guide practice. A lack of relevance may be compounded by a perception amongst some practitioners that research is more the domain of the ivory tower and not something that could help them much in their professional activities. (p. 3)

Jamali's (2018) research also confirms the concerns highlighted in Relevance 2020, as well as Nguyen and Hider's (2018) evaluation from that project. Jamali (2018) interviewed seven practitioners and concluded (albeit from a small sample size) that academic-led research was often problematic stating that, 'academic research lacks practical implications as their research problems do not originate from practice'(p. 8). His research found that Australian information professionals do not, on the whole, believe that research conducted by academics in Australian library and information science programmes is relevant to information practice.

Most of the research studies conducted about this topic use qualitative (interview or focus group) data collection methodologies, focused on practitioner perceptions of the relationship between research and practice. They thus tend to be obtrusive and contain the associated risk of self-reporting biases. Practitioners may be keen to report their view of the value of the research as much as their actual use of it. Conversely, academics may be keen to promote the impact credentials of their research. In order to test the reality of these perceptions, further research is needed using different methodologies.

Historically, the knowledge and information organization domain has had a rich history of investigating research trends (Dahlberg, 1997; Hjørland and Albrechtsen, 1999; McIlwaine, 2003; López-Huertas, 2008; Smiraglia, 2012). Saumure and Shiri (2008) found that over the last half century knowledge organization research topics have shifted from an emphasis on indexing and abstracting to a focus on classification and cataloguing. Saumure and Shiri see this shift as having been motivated by the advent on the internet. While knowledge organization principles remain key in pre- and post-Web time periods, metadata is a major theme more recently.

Choi and Lee (2016) conducted a study of user-focused studies in the area of knowledge organization using metadata. Looking at a ten-year span of articles and dissertations published from 2005 to 2014, they used text analysis software (WordStat) to perform a quantitative co-word analysis in order to create topic clusters and identify major themes in recent information organization research. The research reported here extends this methodology in order to compare the themes and topics discussed in recent information organization research and practice.

Research design

Influenced by Choi and Lee's (2016) work, this study compared samples of English-language content over the five-year period, 2013-2017, one intended to reflect issues discussed by information organization researchers and the other issues discussed by information organization practitioners. For the purposes of this research, information organization was defined a little narrowly, in terms of cataloguing and metadata librarianship, as it was in this area that published content in both the research and practitioner spheres was relatively abundant. The sample of information organization research discourse was derived from all the peer-reviewed articles (excluding editorials, book reviews, reports, etc.) published during the reference period in the two main English-language research journals of the information organization field, Cataloging & Classification Quarterly and Journal of Library Metadata and those peer-reviewed articles deemed by the authors to cover information organization topics and published during the reference period in two other major English-language journals for information organization research, Library Resources and Technical Services and Technical Services Quarterly (the four aforementioned journals being those identified by Terrill (2016) as the top for information organization research, by number of articles).

The numbers of articles identified for the sample from each journal are set out in Table 1. Bibliographic information about the articles was obtained from the Scopus database; the titles and abstracts of the articles were used for the analysis.

Table 1: Journals used to create research corpus
Journal Number
Cataloging & Classification Quarterly 197
Journal of Library Metadata 74
Library Resources and Technical Services 26
Technical Services Quarterly 24

The sample of information organization practitioner discourse comprised datasets from three different types of source: professional blogs, Webinars and conference sessions. The blogs were selected from those listed on the Planet Cataloging site, which has, for many years, been aggregating English-language blogs for professional audiences with an interest in library cataloguing and metadata. Those blogs still accessible in November 2018 and that were active during some or all of the period 2013-2017, and that were mainly about cataloguing and metadata, were chosen for data collection and are listed below.

All the posts published on these blogs in the reference period were extracted, except in the case of some that were clearly >off topic (i.e. not directly about cataloguing or metadata). Both titles and the bodies of 352 posts were included in the dataset.

For the Webinar and conference session datasets, a list of organizations based in North America and known to be active in the provision of continuing education for information organization professionals (e.g. American Library Association groups, such as the Association for Library Collections and Technical Services; the Medical Libraries Association; and the Special Libraries Association) was compiled. Their Websites were then searched for details of Webinars and conference sessions on what were promoted as topics of interest to information organization professionals, held during the reference period. The titles and descriptions of forty-four Webinars and fifty-one conference sessions were extracted and collated to form the second and third dataset representing practitioner discourse respectively.

The samples of content (i.e. corpora) were subjected to co-word analysis using VOSviewer software, version 1.6.7, one of the leading applications for visualising the inter-relationships between texts, authors, journals, and so forth. A stop list was applied to remove generic terms (such as the names of months, countries and words such as article); terms were then identified and normalised using text analysis; the most frequent in each corpus were then listed and related to each other using clustering algorithms. The counting method was binary, i.e. the presence of a term in each record counted as 1 (regardless of frequency in each record). The same number of the most frequent terms in each corpus was analysed to cancel out differences in sample size. Comparison of the overlap among the corpora and comparison with baseline measures indicated the amount and nature of topic alignment between the corpora.

Results and discussion

Results and discussion from this study are presented in four sections: topics in practice; topics in research; corpora comparisons, and terminology comparisons.

Topics in practice

The three practice-oriented content sets were combined in two different ways. In the first way, we combined the content of the three sets and analysed it as a single corpus. In the second way, we separately analysed the terms in each set and then aggregated the top terms of the three corpora. Both methods produced the same result, in terms of the top terms, as well as the number of them, that were in common with those of the research-oriented corpus. Figure 1 shows a map of most frequent terms in the combined practice-oriented content (blog, Webinar and conference corpora).Colours show clusters of terms that have more association (i.e. more frequently co-occur). The farther two terms are from one another, the less likely they are to appear together in a unit of analysis (e.g. a blog post).

Figure 1's visualisation shows three distinct clusters (green, red and blue) of topics found in practice-oriented content. The green cluster focuses on general classification topics, with terms like note, class and a variety of terms with the word number. Red content covers a wide range of concepts related mainly to cataloguing and metadata, including terms such as cataloguing, data, RDA, vocabularies and linked data. The blue cluster focuses on Dewey Decimal Classification-specific terms, such as DDC, Dewey and Dewey number. (One of the more prolific blogs was the Dewey blog.) Links can be seen between all three main clusters, with some overlap between the blue and green clusters.

Figure 1: Clusters of terms in practice corpus (only terms with a frequency of 10 or more)

The list of the 20 most frequent terms from the practice-oriented content (combined and analysed as a single corpus) is presented in Table 2. The most frequent term is number by quite some distance.

Table 2: Top terms in practice-oriented content
Term Frequency
number 177
note 107
DDC 91
class 83
table 66
instruction 62
database 60
Dewey 58
WebDewey 58
entry 54
history 50
notation 49
country 45
session 45
subdivision 45
literature 43
opportunity 43
schedule 36
workflow 35
Bibframe 33

Topics in research

The topic map in the research-oriented content shows a more diverse range of topics with a greater number of clusters. Five groups of clusters (blue, purple, red, yellow and green) were identified from the research-oriented content, as seen in Figure 2. The blue cluster focuses on cataloguing topics related to bibliographic description, including terms like bibliographic description, ISBD and semantic web. The purple cluster represents cataloguing topics related to the Functional Requirements for Bibliographic Records and related bibliographic models, including terms like functional requirements, FRBR, user tasks and bibliographic records. The red cluster represents topics related to library practice, terms, such as academic libraries, collection, project and cataloguer. The yellow cluster covers the cataloguing code, Resource Description and Access, including terms like RDA, access and implementation. Finally, a small green cluster represents vocabulary control, with terms including vocabularies, subject headings and linked data.

Figure 2: Clusters of terms in research corpus (only terms with a frequency of 10 or more)

Term frequency results show that Resource Description and Access-focused terms are most frequent; the terms RDA and resource description are the top two terms by some margin. Table 3 shows the top twenty terms based on term frequency from the research-oriented content.

Table 3: Top terms in research-oriented content
Term Frequency
RDA 62
resource description 57
staff 19
code 18
rule 18
schema 18
bibliographic description 17
department 17
functional requirement 17
entity 15
web 15
international standard 14
linked data 14
strategy 14
training 14
open data 13
classificaion 12
semantic web 12
transition 12

Corpora comparisons

To evaluate the extent of overlap or alignment between research and practitioner-based outputs, a series of comparisons between pairs of corpora were conducted. The first comparison examined the overlap between the top 100 terms from the information organization research corpus and those from a journal assumed to have little subject commonality with information organization research, namely the Journal of Parasitology. Only three of the two sets of 100 terms overlapped: combination, difference and evidence. None of these terms are inherently information organization-specific terms. In any case, three was thus set as the lower limit for this index of topical inter-corpora alignment.

The second comparison compared the top (100) terms from the information organization research corpus with those from another information science journal that occasionally covers some information organization topics, namely Information Processing and Management. This second measurement identified six common terms. None were inherently information organization-specific, however.

The third comparison gauged the number of common terms, out of the top 100 terms, that two corpora of very similar content, might share. To conduct this measurement the individual datasets from the two main information organization journals, Cataloging and ClassificationQ uarterly and the Journal of Library Metadata were used. This produced fourteen common terms. Due to the very similar cope of the journals, fourteen was considered the upper limit of this index, at least for information organization discourses.

The fourth comparison evaluated the overlap between the information organization research corpus and the combined practice corpus. The resulting overlap was nine common terms, as noted earlier, half-way between the lower (three) and upper (fourteen) limit benchmarks.

The fifth, sixth and seventh comparisons evaluated the overlap between the research corpus with each of the three practice corpora (blogs, Webinars and conference sessions). Overlap was greatest with the webinar corpus: fourteen common terms, the same as the upper limit established in the third measurement explained earlier. This overlap suggests that information organization-related Webinars tend to be a little more aligned with information organization research than information organization-related conference sessions and blogs. All comparisons are shown in Table 4.

Table 4: Common terms (overlap) between pairs of corpora
Corpora Number of common terms in top 100
Articles vs Webinars (F) 14
Cataloging & Classification Quarterly vs Journal of Library Metadata articles (C) 14
Articles vs conferences (G) 12
Articles vs practice sources combined (D) 9
Articles vs blogs (E) 8
Articles vs Information Processing and Management (B) 6
Articles vs Journal of Parasitology (A) 3

Terminology comparisons

Terminology comparisons were also conducted to analyse the nature of the overlap (and differences) between research and practitioner-based outputs. The nine top terms shared by the information organization research and combined practice corpora, as seen in Table 5, are clearly related to particular topic areas, such as linked data and FRBR. Anecdotal evidence would also suggest that linked open data and the Semantic Web is an area of particular interest to both academics and practitioners in the field.

Table 5: Common terms in information organization research and practice and their frequencies
Term Frequency in practice Frequency in research
country 45 8
discipline 17 6
linked data 22 14
manifestation 13 6
note 107 7
open data 21 13
rdf 31 8
resource description 11 57
semantic web 17 12

Table 6 presents the top twenty most frequent terms in each of the four information organization corpora (articles, blogs, conference sessions and Webinars). The data shows the relative significance of different topics in each set. For instance, while RDA is the top term in research articles, it is not present in the top twenty terms in any of the three practice sets.

Table 6: Top 20 most frequent terms in different sets with theirfrequency
Articles Blogs Conferences Webinars
Term Freq Term Freq Term Freq Term Freq
rda 62 number 173 cataloging 13 session 14
resource description 57 note 107 information 7 workflow 10
staff 19 ddc 90 vocabulary 7 series 8
code 18 class 83 work 7 Bibframe 7
rule 18 table 65 workflow 7 data 7
schema 18 database 59 cataloger 6 issue 6
bibliographic description 17 dewey 58 opportunity 6 metadata 6
department 17 instruction 57 tool 6 resource 6
functional requirement 17 webdewey 57 alcts camms 5 element 5
entity 15 entry 54 change 5 instruction 5
web 15 notation 49 content 5 principle 5
international standard 14 history 47 discovery 5 project 5
linked data 14 country 44 implementation 5 world 5
strategy 14 subdivision 43 resource 5 access 4
training 14 literature 40 system 5 attendee 4
open data 13 opportunity 35 taxonomy 5 code 4
classification 12 schedule 35 challenge 4 congress 4
frbr 12 base number 30 creation 4 focus 4
semantic web 12 blog post 29 institution 4 implementation 4
transition 12 bibliographic data 28 presenter 4 number 4

Term comparisons show similar results to those found by Saumure and Shiri (2008). The top terms from the various corpora likewise suggest that both researchers and practitioners share a considerable interest in aspects of descriptive cataloguing, though whether aspects of the topic are the same is not so clear.


The overlap of top terms between the information organization research and practice corpora indicate that there is a fair amount of alignment between research reported in key information organization research journals and major forums for professional information organization discussion. Whether this alignment is due to information organization research specifically engaging with important issues in information organization practice and, conversely, information organization practitioners noting and acting upon information organization research, is another matter, and the next step in this investigation would be to drill down in those broad areas that have been identified as aligned, to gauge levels of engagement and impact. In other words, we have identified certain overlaps and associations, but not yet the nature of these research-practice relationships: to what extent are they causal and in which direction(s)? Does research lead practice or vice versa? It may depend on the particular topic area. It may also change over time. On the other hand, we might discover that while both researchers and practitioners are both preoccupied with certain subjects, they are not concerned with the same aspects of these subjects. For example, RDA may be discussed in terms of its usability by practitioners, but in terms of its theoretical merit, say, in the research literature.

The perception of a gap, at least in this subdomain of librarianship and information science, is not substantiated by the findings of this study. In some ways, it is not surprising that there is a fair degree of overlap between research and practice in a field such as information organization. After all, practitioners often contribute to journals such as Cataloging & Classification Quarterly and Journal of Library Metadata, while information organization academics often have a background in practice. It would be interesting, however, to see if other fields where this is also the case, including other subdomains of library and information science, produce similar levels of alignment using this study's index. It would likewise be interesting to ascertain the actual level of practitioner contribution to research in information organization and other fields, for example by examining author affiliations.

The methodology we employed for this study is, we believe, a new approach to measuring alignment between research and practice. The unobtrusiveness of the data collection has clear advantages, but the approach is not without its limitations. Automatic text analysis can be hampered by the vagaries of language, with words de-contextualised; it should also be borne in mind that journal articles and blog posts, for example, tend to be written in different styles. An alternative approach would be to manually index the content of the texts, though of course this would require considerable amounts of time and resources; any controlled vocabulary used might also be biased toward research or practice. Larger corpora would have enabled longer lists of frequently cited terms to be analysed, generating a more calibrated measure of overlap. The corpora themselves may not be fully representative of the discourses as a whole, particularly in the case of practitioner discourse. In this study, there were only a small number of blogs used. The reference period studied was a reasonably lengthy in the context of developments in the information organization field, but it should also be noted that the alignment between research and practice is likely to be somewhat dialectical and not totally synchronous.

Automatic text analysis produces an overview of textual and conceptual overlap, rather than a detailed picture: as such it is a good starting point for a more nuanced, manual analysis, perhaps triangulated with interpretations from authors and readers. The application of such methods may reveal a complex interplay between research and practice discourse, as well as the extent to which researchers and practitioners are aware of each other's preoccupations and helping to address them.

About the authors

Philip Hider is professor and head of the School of Information Studies at Charles Sturt University (CSU), Australia. He received his PhD from City University, London and has worked at CSU since 2003. His research interests centre mainly around information organisation and librarianship education. The second edition of his text, Information Resource Description, was published in 2018. He can be contacted at phider@csu.edu.au.
Hollie White is lecturer in Libraries, Archives, Records and Information Science (LARIS) in the School of Media, Creative Arts and Social Inquiry (MCASI) at Curtin University in Perth, Australia. She received her PhD in Library and Information Science at the University of North Carolina at Chapel Hill. Her research and teaching interests include, metadata, information organization, institutional repositories, library assessment and cross-cultural knowledge organization. She can be contacted at hollie.white@curtin.edu.au.
Hamid R. Jamali is a senior lecturer at the School of Information Studies at Charles Sturt University, Australia. He received his PhD in information science from University College London in 2008 and his research interests are in the broad areas scholarly communication and bibliometrics. He can be contacted at h.jamali@gmail.com.


How to cite this paper

Hider, P., White, H. & Jamali, H. R. (2019). Minding the gap: investigating the alignment of information organization research and practice Information Research, 24(3), paper rails1802. Retrieved from http://InformationR.net/ir/24-3/rails/rails1802.html (Archived by the Internet Archive at https://web.archive.org/web/20190818103734/http://informationr.net/ir/24-3/rails/rails1802.html)

Check for citations, using Google Scholar