header
vol. 16 no. 3, September, 2011

 

An examination of semantic relationships between professionally assigned metadata and user-generated tags for popular literature in complementary and alternative medicine


Hemalata Iyer and Lucy Bungo
University at Albany, State University of New York, Department of Information Studies, College of Computing and Information, 135 Western Avenue, Albany, NY, USA, 12222


Abstract
Introduction. This paper examines the semantic relationship between user tags and the assigned subject headings of popular literature in the domain of complementary and alternative medicine.
Method. Forty books in the domain were drawn from the LibraryThing database. These were qualitatively analysed for the semantic matches between user tags and subject headings. An adapted form of the Unified Medical Language System, Current Relations in the Semantic Network served as a framework for analysis.
Analysis. User tags were compared with subject headings for terminological matches on a book by book basis. The tags were grouped into tag categories, and tag categories were mapped on to the subject headings. Those that were not related to the subject headings were analysed for patterns.
Results. Less than 1% of tags matched terminologically. Results indicated 46% semantic matches and 54% non-matches. Frequently occurring patterns among non-matches were personal, genre or form, location, time period and belief systems. Of the semantic matches, frequently occurring relationships were physical, functional, and conceptual relationships.
Conclusions. The tag categories provide information beyond that of the subject headings; they describe, evaluate, and may assist readers in choosing materials. This study contributes towards an understanding of the dominant relationship types in this domain and this can feed into developing ontologies and knowledge structures.


Introduction

Recently, the widespread use of digital information and communication technologies has generated an unprecedented amount of Web content through Weblogs, wikis and other social tools. The Web 2.0 technologies have enabled collaborative approaches to different endeavours, such as 'folksonomies' and social tagging. The last few years have seen the rise of these technologies and the research focusing on them.

'Folksonomies' are a user-centred technique for knowledge organization. The term was originally coined by Thomas Van Der Wal; it is a portmanteau of the two words 'folks' and 'taxonomy' (Vander Wal 2007). Several social bookmarking services are available on the Web to build these folksonomies. The more popular amongst these are delicious.com, in which users bookmark and tag Web pages; Flickr, where photographs and short videos can be uploaded and tagged by users; Connotea and CiteULike which are services for scholarly articles and Websites; and LibraryThing, where the users can catalogue and tag both personal book collections and those within a library. The users are categorizing their own and others' collections and building bottom-up classification systems. Today, users are organizing their own digital collections by adding metadata.

Two approaches to organizing and providing access to resources and for building vocabulary tools are discussed in the literature: the bottom-up and the top-down approach. The top-down approach envisages first creating a structure and the control of the terminology. The traditional classification systems, such as Dewey Decimal and Library of Congress, thesauri and other controlled vocabulary tools adopt this approach. The bottom-up approach begins at the grassroots and the social tagging systems offer a platform for this collaborative effort. The advantage of this method is that it aggregates the opinions of many, and is more likely to provide users with what they want than a top-down method.

While the information profession has long employed controlled vocabularies to provide access to information resources, user tagging and folksonomies are recent approaches. Contrary to a folksonomy, a controlled vocabulary is an established list of standardized terminology intended to be used in the indexing, organization and retrieval of information. The National Institute of Health defines controlled vocabulary as 'system of terms involving e.g. definitions, hierarchical structure, and cross-references that is used to index and retrieve a body of literature in bibliographic, factual, or other databases' (National Institutes of Health 2004). A subject heading list is one kind of controlled vocabulary that is used; the Library of Congress Subject Headings and the Medical Subject Headings are examples of such lists. This contrasts with natural language terms where no restrictions are imposed. Social tagging allows users to employ natural language as metadata, and this freedom results in the use of multiple terms that represent the same concept in many different ways. While controlled vocabularies reduce the ambiguity inherent in natural languages, they are also not without problems. Using the same terminology throughout a database ensures consistency and precision and helps the user find relevant information. The disadvantage is the rigidity of the approach, which may not represent the multiple viewpoints of users.

Literature Review

Both controlled vocabularies and folksonomies are tools that aid the user in search and retrieval, but the sources of vocabularies are very different. The pros and cons of controlled vocabularies and folksonomies are discussed extensively in the literature and only a brief summation is presented. Folksonomies are low-cost, inclusive and current. They are not restricted to binary relationships and also encourage users to interact more with the material. They help to build communities of users with common interests and goals (Mathes 2004; Kroski 2005). The users are motivated to contribute to the community; the knowledge bases constructed by the masses are the impetus for their contributions. However, not all users are aware of the social impact of their contributions; they tag items to meet their own needs and their contributions can actually undermine the constructed knowledge base as a whole (Marlow et al. 2006). In addition, folksonomies lack terminological precision. As such, searches often result in low precision and low recall. The lack of synonym control and the presence of meaningless terms also complicate the search and retrieval process. At the organizational level they are essentially flat, non-hierarchical structures, which can create semantic ambiguity and confusion (Merholz 2004; Shirky 2005). As a filtering or searching device these issues present difficulties for users.

Heckner et al. tested the tag usage in Connotea using a category model for linguistic and functional aspects of tag usage as well as to determine the relationship between tags and the document text. One of the outcomes of the study indicated that 46% of the tags were not found in the document. 'This shows that users' tags considerably add to the lexical space of the tagged resource' (Heckner et al. 2007). Their research indicated that in an academic resource there were more general tags and few non-content related tags when compared to Websites such as Flickr and Delicious.

Controlled vocabularies, on the other hand, through the process of controlling synonyms and homonyms and embedding mandated terms in their semantic context (through the broader, narrower and related terms) create a navigable structure that can increase recall and precision when users perform searches. However, the controlled vocabularies use a formal, standardised form of natural language and it may be difficult for searchers to learn and use this terminology to their best advantage. Despite these differences, both controlled vocabularies and social tagging have much to offer to the study of information access.

Relationship between folksonomies and controlled vocabularies

Chan (2009) suggests that both subject indexing and social tagging are two different approaches for information storage and retrieval and recognizes the potential for social tags to enhance both the controlled vocabularies and subject access. This enrichment occurs because the social tags provide an understanding of user perspective and searching behaviour as well as provide a source for adding terms to controlled vocabularies. In addition, librarians have also used folksonomies in bibliographic instruction to teach users the benefits of controlled vocabularies and their structures (Maggio et. al. 2009). To leverage the benefits of each system, efforts are being made to explore methods of combining folksonomy tags with theories from knowledge organization systems.

Much current research is focused on mechanisms by which tags can be improved and more useful formal semantics can be derived from simple community tagging systems – that is, how ontologies can be derived from folksonomies. The aim is to optimize the trade-off between the simplicity and freedom of community tagging and the benefits to search engines of hierarchical structured vocabularies' (Shirky 2005).

In order to fully utilize both of the approaches, the user tags that are generated from these tagging systems need to be organized in some meaningful fashion.

One approach to the establishment of meaningful structures is by examining the types of semantic relationships between concepts. Kipp (2006) suggests that the relationship among tags can be defined as associative relationships, but that these relationships among tags are 'valuable sources for more fine-grained semantic relations in both folksonomies and KOS' [knowledge organization systems] (Peters and Waller . --> Several studies have examined the typology of semantic relations that exist between concepts in specific domains, though not in the context of social tagging. Bean and Green (2001) present types of relationships in various thesauri, classification schemes and relationships among knowledge structures. Neelameghan (2001) identifies lateral relationships in multicultural and multilingual databases in the spiritual and religious domains. A larger and more comprehensive inventory of subject relationships is presented in the Appendix of the Final report of the Subcommittee on Subject Relationships/Reference Structures to the Subject Analysis Committee of the Association for Library Collections and Technical Services, Cataloging & Classification Section (Michel 1996).

An article by Peters and Weller (2008) gives an overview of different approaches to re-organizing and editing tags. 'Fertilizing folksonomies with existing KOS is a promising approach to enable semantic enrichment' (Peters and Weller 2008). Angeletou et al. (2007) found that synonymy, broader, narrower and related term relationships existed amongst tags. It was also indicated that these relationship types can be used for further refining and expanding knowledge organization systems or ontologies. The aim of Veres's study was to uncover the structural properties of tag sets as they relate to Websites. The Open Directory Project and the Yahoo Directory were compared for the kinds of classification within the user tags. This has implications for the connection between folksonomy and ontology.

There have been studies comparing the Library of Congress Subject Headings with the tags provided by users in various online tagging programmes and communities. Yi and Chan (2009) reviewed over 4,000 articles that had been tagged using the Delicious bookmarking system. The tags that had been assigned to these Websites were compared to the Library of Congress Subject Headings to examine the term to term matches between the words in the subject heading list and the tags. It was found that of those tags used twice or more, 61% matched the subject headings. An additional 10% were possible matches, such as the tag 'css' for the subject heading 'Cascading Style Sheet' (Yi and Chan 2009). In a later study, Yi examined the tags on a sample of the most popular books on LibraryThing and their relation to the assigned Library of Congress subject headings. Yi used the similarity measures to predict the correct subject headings from user-assigned tags. The most effective tags were the top five. This study demonstrated that the similarity measures can predict the correct subject headings.

Heymann and Garcia-Molina (2009) conducted another study in which they also compared term to term matches and semantic matches between Library of Congress Subject Headings and tags assigned to books in LibraryThing. Similarly to the previously discussed studies, they also found that there was a high number of matches between the tags and the subject headings. However, they also found that the tags, while there were terminological matches, were often applied to various items that in actuality had little in common. This present study examines the tags and subject headings both at semantic and terminological levels, with the primary focus on the semantic level analysis.

In this study, individual books form the base unit of analysis. Subject headings and tags of individual books were compared to one another on a book-by-book basis rather than comparing all subject heading in the sample with all tags in the sample aggregated. Examining the subject headings and tags assigned to a single book provides context to the analysis, rather than using an aggregation of tags and subject headings assigned to a large, diverse sample of items.

Complementary and alternative medicine

Complementary and alternative medicine was the field chosen for this examination, because of its increased importance, use and popularity among the public at large. The National Center for Complementary and Alternative Medicine (NCCAM) defines complementary medicine as 'a group of diverse medical and health care systems, practices and products that are not presently considered to be part of conventional medicine. Complementary medicine is used with conventional medicine and alternative medicine is used in place of conventional medicine' (NCCAM 2010). Complementary and alternative medicine includes various therapies such as chiropractic, acupuncture and massage, as well as preventive and self-help measures such as nutrition and diet, herbal supplements, use of magnets and meditation. Some specific examples include:

There is an increasing use and awareness of these methods as shown by the 2007 National Health Interview Survey, which showed that roughly 38% of adult Americans use complementary and alternative therapies (NCCAM 2010). One of the most often used types is dietary supplements. In one study, it was found that natural products (non-vitamin, non-mineral supplements) were most commonly used for children and adults. Many therapies such as meditation, massage and yoga are also quite popular with the general public (Nahin et al. 2010). These popular treatments have brought attention to the field; conventional medicine is also slowly becoming aware of the increased use of complementary and alternative therapies. While the public waits for conventional medicine to fully integrate with these therapies in a comprehensive health care system, many are turning to popular literature on the subject (Ventola 2010). Therefore, librarians have a central role in providing access to this information and to the increasing scholarly literature in the field. 'A CAM librarian's role is unique; many specialize in specific areas of CAM and opportunities exist for librarians to partner with CAM groups. CAM information professionals' major roles involve information access and retrieval and education' (Crumley 2006: 81).

One form of access that librarians provide is subject access and, for book-type materials, they often use the Library of Congress Subject Headings. With Web 2.0, user tags have also become very popular. While there is prolific research output in the area of user tags and subject headings, there have been no studies that examine the tags and the headings for popular book type materials in complementary and alternative medicine. Research and discussion on this topic is significant because of the growing awareness and interest in this field and because there are increasing numbers of users looking for such information. It is important to have an understanding of how social tagging efforts and traditional subject access work together in this domain.

Methods

The purpose of this study is to understand patterns in the user tags and examine how tags and tag categories relate to and supplement the subject headings assigned by information professionals to popular materials within complementary and alternative medicine. This study will provide some qualitative information of the issues and benefits of tags and subject headings, so that a large-scale research study of the semantic relationships between subject headings and social tags can be undertaken in the future.

Forty books on complementary and alternative medicine were analysed. Books were selected by searching the LibraryThing database. A search was made for the tag 'alternative medicine'. The top twenty books that listed this term most frequently as a tag were chosen for the study. The same was done for books using the term 'complementary medicine'. Again, the top twenty books were chosen. These books were of a popular, general nature and were intended for the public at large. To avoid repetition of books in the sample, a book that had been previously listed in the alternative medicine group was not repeated in the complementary medicine group. The sample included a wide variety of books on different treatments and different opinions of alternative medicine, as well as case studies. A limitation of this study is the sample size of only forty books, but as this research is intended to be an exploratory study, the sample was considered adequate. The data ere collected during January and February of 2010.

The MARC records for these books were obtained from the OCLC Connexion database and the records were used to create a Microsoft Access database containing the user tags, subject headings and subject heading subdivisions. The database enabled the subject headings to be sorted by type of subdivisions ($x, y, z and v of MARC sub-field codes). This was done to help understand the type of subject headings and subdivisions assigned to these books. This database could also present the tags and the corresponding subject headings of individual books. The total number of tags in the sample of forty books was 2,074. Table 1 presents data on the mean number of subject headings and the mean number of tags.


Table 1: Complementary and alternative medicine: average numbers of subject headings and tags
Alternative medicineComplementary medicineCombined
Total number of subject headings9171162
Mean number of subject headings4.33.64
Total number of tags15585162074
Mean number of tags74.225.850.6

First, the individual tags were compared with the assigned subject headings to determine the types of matches. Some of the chosen types of matches (full, partial, none) were based on previous studies that examined keyword and title word matches with controlled vocabularies (Strader 2009; Carlyle 1989; Voorbij 1998). Table 2 defines the types of matches used.


Table 2: Types of matches
FullThe tags matched exactly or all of the exact words were used. Example: Alternative medicine (SH) and 'alternative medicine' or 'medicine alternative'.
Partial(a) One or more of the exact words within the subject heading were used. May or may not include additional words that are not part of the subject heading. Example: Holistic health (SH) and 'health', 'holistic', or 'holistic medicine'.
(b) The tag was covered by more than one subject heading. Example: 'complementary medicine' and Complementary therapies (SH) and Alternative medicine (SH).
(c) Language and spelling variations, such as the exact term in another language or a clear abbreviation of the exact terms. Example: Alternative medicine (SH) and 'alternativmedicin', 'medicin' or 'altmedicine'.
None(a) Tags that included none of the words in the subject heading.
(b) Conceptual variations. These are related concepts that are two separate entities, but are closely related. Examples: Vegetarian (SH) and 'vegetarianism' as well as Herbs (SH) and 'herbal'. This was done because the true meanings of the tags and subject headings differ.

Each subject heading was compared independently with the tags assigned to that book. If a book had three subject headings and thirty tags, then there was a total of ninety subject heading to tag comparisons. Thus, the total number for the complementary medicine counts is 2,963 and for alternative medicine 2,758. These totals were used for determining the percentages of the matches.

The tags were then examined for conceptual meaning. To do this, tag categories were created. These categories were labelled and defined and the definitions applied to a group of tags that had a form of commonality. For example, the tags bones, stomach and heart were grouped into a tag category called Body parts, with the added statement that this group contained those tags that named specific areas or organs of the human body. This assisted the researchers in developing the tag categories consistently and ensured that the conceptual meaning was retained for comparison with subject headings.

Tag categories were created because it is not effective to compare individual tags semantically with subject headings. Individual tags do not lend themselves to semantic analysis because they vary so widely, they do not have context when they stand alone and there are simply so many of them. When grouped by conceptual similarity, rather than alphabetically or by frequency as in tag clouds, the context becomes richer and more meaningful. Also, the tags were categorized for each book and this provided an implicit context by keeping all of the tags for each book separate from the tags of other books. This context and understanding of the tags and their meanings thus facilitated the comparison of the subject headings with the tags and allowed the determination, on a deeper level, of the types of semantic relationships represented.

In the process of assigning the tags to the categories, they naturally resulted in mutually exclusive categories for the individual books. However, a tag that appeared for multiple books may have been assigned to different tag categories for different books as each book exhibited its own contextual themes.

The process of sorting these tags into meaningful categories was done by two individuals, both with non-medical backgrounds. The individuals reflect the typical readership of these kinds of popular texts; medical practitioners are not generally the intended audience. The two individuals sorted the tags into categories, which were then compared. Where there were differences, these were discussed and a consensus reached. In cases where a consensus could not be reached and there were different points of view, an information professional was consulted and the three together reached a decision. This to some extent allowed for some degree of inter-reliability of any subjective decisions made by the individuals.

This resulted in a total of 375 tag categories. The next step was to map the tag categories on to the subject headings for each book. The same two individuals mapped the tags and the results were reviewed by the information professional. To guide this process, the Current Relations in the Semantic Network of the Unified Medical Language System was used (National Institutes of Health 2010). Appendix A shows the full list of these relationships. This provided a framework for identifying the types of relationships that existed between the subject headings and the tag categories and was selected because it is relevant to the biomedical field and is a higher level overview of possible relationships.

Analysis and results

The MARC records for the forty sample books were examined for their subject headings. These subject headings fell into the 6xx fields and most were drawn from the Library of Congress Subject Headings.

The Library of Congress Subject Headings allows for personal names, topical headings, geographic headings, etc. In addition, to increase specificity, a subject heading can have subdivisions, such as topical ($x), geographic ($z), chronological ($y) and form ($v). The nature of the subject headings and subdivisions was examined to determine the nature of the subject headings and subdivisions used in this domain.

In the total sample of books in complementary and alternative medicine, the subject headings primarily belonged to the MARC field 650, Topical Subject Headings. These headings covered a wide range of topics such as Complementary therapies, Evidence-based medicine, Massage therapy, Self-help groups, Diet, Spiritual healing and Medicine, ayurvedic. The headings did not contain any Corporate Name (MARC field 610) or Geographic Name headings (MARC field 611). Personal Name headings (MARC field 600) were included only twice.

With regard to subdivisions, approximately one third of the headings had subdivisions. Fifty-five percent of the subdivisions were Form subdivisions ($v); 74% of the Form subdivisions used the term 'Popular works'. Other form subdivisions included 'Encyclopedia' and 'Handbooks, manuals, etc.', 'Biography' and 'Formulae, receipts and prescriptions'. The Topical subdivision ($x) also occurred quite frequently, 39% of the time. Within this subdivision, 'Prevention' and 'Prevention and control' were used often. The general subdivisions covered a wide range of topics and included terms such as 'Cancer', 'Diet therapy', 'Evaluation', 'Methods', and 'Therapeutic Use', etc. The Geographical subdivision ($z) occurred rarely and there was no occurrence of the Chronological subdivision ($y).

The results of the individual tag and subject heading comparison are presented in Table 3.


Table 3: Breakdown of Matches
Complementary medicineAlternative medicineCombined
Full22 (0.74%)47 (1.7%)69 (1.21%)
Partial104 (3.51%)212 (7.7%)316 (5.52%)
None2837 (95.75%)2499 (90.6%)5336 (93.27%)
Total2963 (100%)2758 (100%)5721 (100%)

The full matches were only 0.74% and 1.7% of the total number of matches, for complementary and alternative medicines respectively. The partial matches were 3.51% and 7.7% for complementary and alternative medicine respectively and the tags that did not match were 95.75% and 90.6%, also respectively. This showed that the conceptual analysis was needed; there were so few exact terminological matches and partial matches in this domain that further investigation was needed to determine, at the conceptual level, whether or not there were similarities between tags and the subject headings.

The sorting of the tags into tag categories resulted in 375 categories, where the tags contained within each category were related in different ways. A statement describing the meaning of each category was developed and this aided the process of semantically mapping each tag category with the subject headings. Some examples include the tag categories Total care and Item description. The Total carecategory was described by the statement 'terms that address both mainstream and alternative medicine to treat the whole person, not just the illnesses'. This tag category included the tags holistic, holistic health, holistic medicine and wellbeing. In the tag category Item Description, the statement 'descriptions of the physical item itself' was used and this tag category included the tags of hardcover and paperback. For Case 1 later in this section, the tag cloud for the item, along with the tag categories and their assigned tags, are presented in full. These tag categories were then mapped on to the subject headings through a set of semantic relationships assisted by the Current Relations in the Semantic Network chart. The Current Relations framework identifies several types of relationships and associations under the general rubric of 'associated _with'. Within this rubric, it lists the relationships physically_related_to, functionally_related_to, spatially_related_to, temporally_related_to and conceptually_related_to. Within each of these categories there are further breakdowns that can be viewed in Appendix A and the exact descriptions of these relationships as adapted for this study are provided in Appendix B.

The conceptual comparison of the tag categories and subject headings revealed that almost half of the tag categories could be mapped, in terms of semantic meaning, to the subject headings. Slightly more than half of the tag categories could not be mapped to the assigned subject headings. Those that could be mapped are described as related and those that could not be mapped are described as unrelated. Table 4 presents the overall results of the semantic relationship analysis between the tag categories and the subject headings.


Table 4: Semantic Relationships between subject headings and tag categories
Alternative medicine tag categoriesComplementary medicine tag categoriesCombined CM and AM tag categories
Related110 (47.6%)85 (43.8%)195 (45.9%)
Unrelated121 (52.4%)109 (56.2%)230 (54.1%)
Total231*194*425*
* Note that some tag categories could relate to more than one subject heading. These totals include those repeated tag categories.

The following sections examine and present the results of the related and unrelated subject headings and tag categories and provide case studies of the items analysed.

Subject headings and tag categories: related

The tag categories that could be mapped to the subject headings were 46 percent of the total mappings. Of this percent, Table 5 presents the distribution of these types of semantic relationships. Appendix A lists the relationships and Appendix B explains how the relationships were defined and applied to the tags and tag categories.


Table 5: Distribution of semantic relationships by type
Category typeComplementaryAlternativeCombined
Physically_related_to20 (23.5%)37 (33.6%)57 (29.2%)
Spatially_related_to1 (1.2%)2 (1.8%)3 (1.55%)
Functionally_related_to37 (43.5%)24 (21.8%)61 (31.3%)
Temporally_related_to3 (3.5%)0 (0%)3 (1.55%)
Conceptually_related_to17 (20%)30 (27.3%)47 (24.1%)
Isa (conceptually equivalent)7 (8.3%)17 (15.5%)24 (12.3%)
Total85110195

The most frequently occurring semantic relationship types were functionally_related_to, physically_related_to and conceptually_related_to.

In both complementary medicine and alternative medicine 29.2% of tag categories fell into the physically_related_to relationship. Within this, the sub-relationships of (part_of), (branch_of) and (ingredient_of) occurred. (Part_of) occurred most often, both within this relationship type and in all the different relationship types as a whole.

The functionally_related_to relationship in both complementary medicine and alternative medicine included the subsets (treats), (interacts_with), (produces), (causes), (carries_out), (practices), (occurs_in (process_of)) and (users). Much of the data was evenly distributed among these subsets. However, the subsets of (result_of) and (manifestion_of) had a much higher representation, which led to the idea that cause-effect connection strongly exists, as well as the expressions and examples of concepts.

Conceptually_related_to had many subsets and also had a high number of occurrences. It included the subsets (evaluation_of), (analyses (assesses_effect_of)), (property_of), (method_of), (conceptual_part_of) and (issue_in). The most frequent sub-relationships were (property_of) and (issue_in) and these appeared with high frequency in both. The (conceptual_part_of) also occurred frequently, but only in alternative medicine.

The occurrence of (spatially_related_to) was very minimal and primarily occurred in the context of (location_of), which often referred to the country of origin or practice. (Temporally_related_to) was not present in alternative medicine and in complementary medicine it occurred only three times.

Subject headings and tag categorizations: unrelated

There were 121 unrelated tag categories in alternative medicine and 109 in complementary medicine. 54.1% of all tag categories were unrelated to the subject headings. Further analysis of these unrelated tag categories was undertaken to identify patterns and to understand what value they could add. The following outlines the process of determining these patterns using an example. The unrelated tag categories were aggregated into one group and were examined for similarities between them. For instance, several books had tags that appeared to be call numbers or shelf labels. During the initial analysis, these tags were grouped into the tag categories of place or item location. It was determined that these tag categories had nothing to do with the subject headings assigned to the book. Secondary analysis of all of the unrelated tag categories as a whole yielded the identification that several books had tag categories such as place or item location; the phenomenon, labelled a pattern, was designated as the location pattern. Table 6 shows all of the patterns within the unrelated tag categories and gives a description of each pattern.


Table 6: Patterns within the unrelated tag categories
PatternsDescriptions
Belief systemsPersonal viewpoints on politics, religion and culture that shape how people react. 'Cultural studies' and 'cultural criticisms' as well as 'spiritual' are examples of tags that were used.
Demographics and ageDescribes the target audience, the subject, or characteristics such as age, ethnicity and sex.
Genre or formRefers to the presentation style of the book's content, such as 'essays', 'memoirs', 'stories', etc. and also descriptions of the book such as 'hardcover'.
Time periodIndicates periods of time, such as when the book was read, stored, bought, etc. It may also refer to the year of publication.
LocationsRefers to the item's place within the library or within a personal collection. It can also refer to a country that does not have any association with the subject headings. Possibly country of publication or where the story is set.
PersonalThese are tags that have meaning only to their creators. Some include status of the book (read/unread/read later etc.) Some also include personal names. Also includes the labels of Tasks: things that indicate what readers are doing or are planning on doing.
Content termsTerms that are generally related to the domain, but do not relate directly to the subject headings. In this study, these terms relate to the field of complementary and in general, but do not relate to the subject headings assigned to their book.
UndecipherableIndividual tags that have no meaning to anyone but their creator. However, these differ from personal tags in the fact that personal tags are recognizable words and phrases.
Scientific methodThings that pertain to the scientific experimentation and the materials necessary to carry out scientific research.
AssessmentEvaluative terms that indicate the judgments passed on the content of the book. Some of these are also affective terms that pertain to the expression of emotions.
AuthorsNames of the people who created the work.

The most frequently occurring patterns were personal, genre or form and location. Time period and belief systems were also very high. The unrelated tag categories demonstrate the added value of using tags: they supplement the existing subject headings. For instance, the tag categories in the assessment pattern can assist readers in evaluating the relevance of an item to their purposes.

Another example is content terms. One book, about aromatherapy and energy therapy, has only the subject heading 'Flowers'. Medical tags and tag categories do not relate to this subject heading, but do provide additional information. These tags included 'energy therapy' and 'spiritual healing' and tag categories include 'complementary and alternative medicine', 'mental health' and 'medicine'. They could have been used as subject headings to represent the book but were not, thus the tags fill the gap by providing additional information about the content of the book. This information is very valuable in assisting the user to find the information that they are seeking; had the user searched a traditional catalogue for 'energy therapy' or 'complementary medicine' the user would not have found this item, but if the user instead searched the tags for this item it would have been returned.

The next section provides three detailed examples, drawn from specific books, of the subject heading and tag category connections.

Case 1

The book in question is actually a collection of stories from people suffering from critical illnesses and the author, a Crohn's disease sufferer. It can be found at http://www.li rarything.com/work/42829. This book concerns how these individuals dealt with spiritual and medical crises. The subject headings assigned to this book are 'Meditations' and 'Physicians–United States—Biography'. Additional name subject headings in the 600 field include 'Remen, Rachel Naomi' and 'Remen, Rachel Naomi – Philosophy'. The scope note for the heading 'Meditations' in the 32nd edition of the Subject Headings states: 'Here are entered collections of thoughts on spiritual truths for use in meditation. Works on mental prayer as a method of promoting the spiritual life are entered under Meditation' (Library of Congress 2010).

The subject headings, while they indicate the broad theme, do not indicate the specific concepts of the book. Tags in the category Uplifting get to a closer degree of describing what the book is about. The book is not a collection of prayers or meditations, but of personal experiences. While intended to help others facing difficulties, it is not intended to promote a spiritual life. It is meant to be uplifting and inspirational, but the subject headings 'Meditation' and 'Physicians–United States—Biography', although relevant, do not address all the details. The tags that are associated with this book do reflect this difference. The tags reflect peoples' reactions. The tag category Uplifting include tags that lift the spirit or mind to a level of happiness. The subject headings have tried to describe the book objectively, but the tags show how people feel about the book. The impact of the book is also important as is the subject of the book.

Information about the genre helps readers know what to expect in terms of content style. Hence the tags biography, essays, memoir, non-fiction and self-help were sorted into the tag category Genre. The subject heading could include the genre of the book but it is often limited to one description. For this book the heading 'Biography' is included as a subheading of 'Physicians', while the users gave many descriptions: biography, essays, memoir, non-fiction and stories. This is also helpful when books cross multiple lines into different genres.

The category of Demographics also helps readers to determine who and what the book is about, in a more specific way than the subject headings. Adult, Jewish and women were sorted into this category. Jewish was placed in this category and not the Religion category as this describes a person and is not referring to the religion. The tags describe the people who have contributed their stories to the work.

Communication, or tags that are concerned with how people pass on information, included both tags of a personal nature (read and unread) and the tags stories and storytelling. Tags of a personal nature help individual users but not the general public.

The tag category Understanding alludes to personal understanding and people search and acquire it. Tags of meditation, philosophy and wisdom were sorted into this category.

Judaism, religion, spirit, spiritual and spirituality were included in the Religion tag category, which was defined as the belief in a higher power or other methods of mental transcendence.

The category Medicine included tags which had to do with the field of medicine and its effect, including complementary medicine, medical, medicine and psychology.

Tags concerned with Disease and its eradication include cancer, health, life and healing. This sorting is the result of comprehensive discussion and consensus between the two researchers and the information professional.

The tag cloud for this work is presented below:

Figure 1: Tag cloud representing the book from Case 1

Figure 1: Tag cloud representing the book from Case 1

Case 2

This book is a critique of the self-help movement in America today. It can be found at http://www.librarything.com/work/560631. The subject headings are 'Self-help groups', 'Self-actualization (Psychology)', and 'Recovery movement'. These subject headings could go into more detail by adding the subheading 'Criticism' or 'Evaluation' to any of the main headings. However, the professional cataloguers did not choose to do so. The tags have an important place in showing the critical viewpoint of the author. This is crucial, because anyone researching the self-help movement for critical evaluation would not be able to assess this from the subject headings and anyone looking for methods of self-help to follow may not find this item useful when it is returned in a list of search results. This is a case where the evaluative aspects are covered by the tags in category Scepticism and not by the subject headings.

Politics and culture are also reflected in the tags and these affect how a user may evaluate an item. These sorts of belief systems can influence how the information is perceived. This book has both supporters and detractors; some tags indicate the author's altruism and some indicate the opposite. The tags reflect these reactions and judgments in a way that objective subject headings do not express.

Time period is also reflected. One of the aspects of physical books is that they often are less ephemeral than other information types such as Websites. In physical form, they are more permanent. For physical books, not everyone is going to read the book at the same time and the book is likely to trade hands. Some tags, despite their personal nature, indicate when the book was read, bought, sold, or stored and show that the book is still conveying its information to users. Even though these tags may not be useful for access purposes, they still in some way convey useful information. It shows that the book is still relevant and useful to many readers.

Case 3

This example can be located at http://www.librarything.com/work/306181. The author presents techniques for using one's positive energy to overcome fatigue. The subject headings are 'Energy medicine' and 'Mental healing'. The book offers different strategies to boost energy, improve relationships and combat the things that drain energy. Part of this is keeping fit and the book describes how to do it by oneself. This is change performed by the individual; working on all of the things in the Fitness tag category (such as bodywork, strength, and fitness) leads to the achievement, courage, healing, and growth, that are found in the Personal growth category.

And since this is done on one's own, self-development, self-help, and self-improvement are also included. These tags, when taken together, provide a more detailed summary of what the book is about. It is a self-help book intended to help readers learn to deal with the stress and trials of everyday life through the application of positive thoughts and actions; 'Energy medicine' and 'Mental healing' are appropriate headings, but the unrelated tag categories provide a different level of specificity that readers can use for their own purposes.

Discussion

In particular, social tagging seems to be especially important in popular literature pertaining to complementary and alternative medicine, where it appears that a large portion of the readership comes from the general public, who are likely to be unfamiliar with the controlled vocabulary phrases and medical terminology. From the tag analysis, it is clear that there is a variety of ways that the public describes complementary and alternative medicine. It is also clear from the literature that many people are interested in and are actively pursuing information on the subject. In general, they are searching for information on these methods to treat medical issues and prevent other issues from arising. They also appear to be looking for pain management techniques and natural methods of medical treatment. As such, it is critical to develop improved access to popular materials in this area.

Looking at the catalogue records, the access is provided mainly by topical subject headings (650 field of the MARC record). The subject headings tend to be very broad, general and repetitive across the collection. Examples include headings such as, 'Complementary therapies', 'Alternative medicine', and 'Mental healing'. Subject headings, although they provide appropriate access, may not serve the purpose of effectively organizing, filtering and retrieving books of such a nature. They may not adequately create distinct groups or discriminate between specific topics. They also lack subdivisions represented by genre and target audience, which can be represented by subject headings, but these subdivisions were rarely applied to any of the books in the sample.

The overwhelming percentage of tags that did not match the subject headings shows that the tags contribute a plethora of unique terms for the user search and retrieval. This terminological analysis shows that tags enhance access to the material in ways that assigned subject headings may not. Tags complement and augment the subject headings in providing improved access to collections.

Since the subject headings were very broad, the terminological matches between subject headings and tags were very minimal. Many of the tags that matched with the subject headings were general tags like 'alternative' or 'health'. Full matches were much rarer than partial matches, because a tag phrase including all of the words as the subject heading is much less likely than a partial match. Partial matches are much more likely to occur given that a term can be repeated and combined infinitely with any and all words in the vast and varied English vocabulary.

Although the tag and subject heading terminological matches were minimal, it is clear that there were semantic similarities between tags and the subject headings and that these tags helped expand the central idea represented by the subject heading. For example, the tag Herbs matched the subject heading of 'Herbs – Therapeutic use' assigned to a particular book. While the subject heading was very broad, the tags represented multiple concepts and provided additional detail and more avenues for exploration than just this single heading. Tags that provided said detail and avenues were economic botany and herbalism, herbal medicine, herbs, medicinal herbs, and medicinal plants. Each of these is a similar concept representing a different aspect of Herbs. Economic botany and herbalism can refer to the farming and sale of herbs. Herbal medicine is using these herbs to treat medical problems. Herbs as a tag describes the physical material used and medicinal herbs is a subset. Medicinal plants relate to herbal medicine by treating medical issues, but may in fact not actually be classified as herbs. Medicinal plants and herbs can be seen as having a quasi-generic relationship, since some medicinal plants can be herbs (medicinal herbs) and some herbs can be medicinal plants. As regards the Library of Congress Subject Headings record for the subject heading 'Herbs—Therapeutic use' it contains only narrower terms (NT) and these are 'Herbal abortifacients' and 'Herbal contraceptives', and the used for (UF) terms include 'Herb remedies', Herbal medicine', and 'Medicinal herbs'. (Library of Congress 2010). Although the term Herbs appears in both the tags and the Subject Headings, the tags distinctly encompass a broader range of topics than solely the therapeutic use of herbs. Although the terminological matches were few, the tags provided a rich source of information. Creating tag categories and semantically matching them with the subject headings as previously detailed allowed for this information to be further explored.

Forty-six percent of the subject headings semantically matched with tag categories. This is a high proportion of matches, which indicates that many of the concepts and meanings represented in the tags in some way matched the thematic content of the book. The subject headings focus on the main subject of the document, but tags provide a more detailed and deeper representation of the content. In addition, some of the unrelated tag categories (such as Content or Evaluative terms) also contain useful information that is not reflected in the subject headings. It appears to be a tendency for users of popular works in complementary and alternative medicine to contrast complementary and alternative medicine with mainstream medicine and view the two forms as opposites. As such, the users tag materials with terms that evaluate the content of the books in terms of positive and negative. Scepticism and quackery are two examples and often these terms result from contrasting the complementary and alternative medical material with that of mainstream conventional medicine.

Other results of this study indicated that a few patterns that emerged out of the unrelated tag categories could be focused on because they bring added value to search and access. The most frequently used patterns were Demographics and age, Genre or type and Location. This could be because tags are often used as a method of recommending books to others; as a general rule, people who enjoy one kind of book may enjoy a book that belongs to the same demographic description or genre. For example, in the book used in the Case 1, the tags included terms such as adult, Jewish, and women and these tags alluded to the ethnic and religious background of the author. This type of unrelated pattern, called 'demographics and age', was found to add value by providing context information. This information may be meaningful to users who are searching for materials that pertain to their own demographics and backgrounds. The tags that describe these things may help readers be able to choose which books they will read next and the location tags, such as classification numbers, help them to find the book in order to read it. Also, subject headings are intended to be an objective assessment of the content of the book, while the user generated tags can be subjective and the opinions that they can offer may help guide other users in a different and necessary fashion.

The pattern Genre or form is often used. For the book Kitchen table wisdom by Rachel Naomi Remen, the users provided a variety of tags that were different expressions of genre. The work was tagged as a biography, a memoir, a story, and as essays. Essentially, it could be all four in the minds of the readers. However, the assigned subject headings provided only one subdivision: 'Biography'. This also indicates that users' tags contribute to a prolific number of newer and unique terms and concepts not represented in the assigned subject headings. While recognizing the importance for subject headings assigned by professionals as essential, it is important to also recognize the value of the rich set of terms contributed by the users and leverage these for better access to complementary and alternative medicine materials.

The approach of examining the user tags and the subject headings through the set of semantic categories assisted by the Unified Medical Language System, Current Relations allowed the identification of semantic categories embedded in the user tags and not represented by the subject headings. This approach could be explored further, in examining and validating the semantic richness of the tags. Additionally, the results generated by such studies have some potential for use in the construction of ontologies, knowledge organization systems and query expansions.

Conclusions

The emphasis of complementary and alternative medicine, being holistic in nature, seems to be on prevention and overall wellbeing. This is important for the health of the general public and gives them knowledge of the options they can explore. This focus and increasing awareness of personal health makes access to health literature important and better methods of access are crucial.

In looking at the tag categories created from the user tags assigned to books, there are very few on specific diseases like cancer or of individual treatments. It covers categories that suggest spiritual or metaphysical aspects that are part of holistic health. The unrelated tags are not necessarily content terms but represent viewpoints and opinions and locations, genres and demographics etc, also abound. They provide information beyond what the assigned subject headings currently address. Thus, the tags contribute valuable additional information for the book type material in complementary and alternative medicine.

The fact that 46% of tag categories are conceptually matched with the subject headings and 54% are non-matches indicates that both the subject headings assigned by professionals and the user generated tags are essential to provide effective access to the resources in complementary and alternative medicine. An organization of user tags is necessary for presenting the tags to the user for browsing, search and navigation. Many ways of organization, including that of facets, have been suggested in the literature (Ram and Wei 2010; Spiteri 2010). Presentation and organization in the user interface design are outside the scope of this study.

The Unified Medical Language System, Current Relations in Semantic Network Schema was found very useful and appropriate as a framework for analysis and future use of such schemas for semantic relationships analysis is suggested. This study also contributes toward this methodology for analysing the semantic relationships in this domain.

Social tags and assigned subject headings are complementary and both approaches are important to provide improved access to users. This study suggests that there are important types of semantic relationships that are relevant in the organization and retrieval of popular, general materials in the field of complementary and alternative medicine and also suggests the ways that they can add value to and complement the subject headings.

Health information technicians, Web search engine professionals and health care professionals are some of the groups who could ultimately benefit from this type of analysis. Especially with the growing interest in complementary and alternative medicine, newer methods for improved access are valuable and needed. The knowledge gained from such studies can contribute towards an understanding of the relationship types that are important in developing ontologies and knowledge structures for Web-based applications.

Acknowledgements

The authors wish to thank the anonymous reviewers for their useful comments and suggestions. They also thank Hsia-Ching Chang for her assistance with creating the database for this study.

About the authors

Hemalata Iyer is an Associate Professor at the Department of Information Studies of the University at Albany, State University of New York. Her academic interests are knowledge organization and retrieval, visual resource management, access to CAM lnformation, variable media resources, vocabulary management, metadata and human information behaviour. She can be contacted at hi651[at]albany.edu

Lucy Bungo has since graduated from the University at Albany with a Master's of Science in Information Studies. She is currently working as a Reference and User Services Librarian at Trocaire College in Buffalo, New York (USA) and also as a Reference Librarian for Niagara University in Niagara Falls, New York (USA). She can be contacted at bungol[at]trocaire.edu or lbungo[at]niagara.edu.

References
How to cite this paper

Iyer, H. & Bungo, L. (2011). "An examination of semantic relationships between professionally assigned metadata and user-generated tags for popular literature in complementary and alternative medicine" Information Research, 16(3) paper 482. [Available at http://InformationR.net/ir/16-3/paper482.html]
Find other papers on this subject



Check for citations, using Google Scholar

logo Bookmark This Page

Appendices

Appendix A: Current Relations in the Semantic Network of the UMLS

Appendix B: Descriptions of the UMLS Current Relationships as adapted for the study

'sh' is the subject heading and 'tc' stands for tag category.

Isa: conceptually equivalents. Ex. Health (sh) and general health (tc) and Medicine, Chinese traditional (sh) and Chinese medicine (tc).

Physically_related_to (part_of): inherent part of a larger field. Ex. Alternative medicine (sh) and alternative therapies (tc) and Holistic Health (sh) and mind (tc) [the parts of the field are all related to one another as well.]

Physically_related_to (branch_of): Discipline of knowledge and its subdisciplines. Ex. Science (sh) and medicine (tc) and Holistic Medicine (sh) and alternative medicine (tc) and Gynecology (sh) and medicine (tc)

Physically_related_to (ingredient_of): the core entities that are required in order for the larger concept to exist. Ex. Energy medicine (sh) and energy (tc). Energy is required for energy medicine to work, as energy medicine involves the manipulation of energy.

Spatially_related_to (location_of): Where an entity is or an action occurs. Ex. Medicine, Ayurvedic (sh) and Eastern (tc)

Functionally_related_to (Affects_(manages): One that controls, organizes, or manages an entity or situation. Ex. Shamanism (sh) and shaman (tc)

Functionally_related_to (Affects_(treats): something that fixes something else. Ex. Complementary Therapies (sh) and human body (tc)

Functionally_related_to (Affects_(interacts_with): two things that work together and affect one another. Ex. Shamanism (sh) and spirit (tc) Shamans, practicing Shamanism, operate in the spirit realm and interact with spirits to solve problems in the human realm.

Functionally_related_to (brings_about (produces): The process that produces or creates an entity. Ex. Science (sh) and truth (tc). Not part of this, but: The air conditioner produces cold air; this is an item that is created.

Functionally_related_to (brings_about (causes): The causes that bring about a result. Ex. Stress, Psychological (sh) and antagonism (tc) the air conditioner is turned on, it causes the air to become colder. This is a cause.

Functionally_related_to (performs (carries_out)): directions for action. Ex. Cookery (sh) and instructional (tc) and Vegetarian Cookery (sh) and instructional (tc)

Functionally_related_to (practices): The person acting in their field of specialty/trade. Ex. Physicians (sh) medicine (tc)

Functionally_related_to (occurs_in (process_of)): something that happens in the process of doing or creating something else. Ex. Mental Healing (sh) and introspection (tc)

Functionally_related_to (users): those who actually use a thing. Ex. Gynecology (sh) and women (tc)

Functionally_related_to (manifestation_of): An expression of an entity. Ex. Errors, Scientific (sh) and doubt (tc). Health behaviour (sh) and food (tc)

Functionally_related_to (result_of): things that contribute to the end result. Ex. Mental healing (sh) and health (tc)

Temporally_related_to (co_occurs_with): things that happen at the same time. Ex. Complementary Medicine (sh) and conventional medicine (tc)

Temporally_related_to (precedes): one thing occurs before another; may be part of a sequence. Ex. Relaxation (sh) and comfort (tc)

Conceptually_related_to(evaluation_of): one thing figures out or appraises another; also related to the process of evaluation. Ex. Physicians (sh) and disease (tc). Ex. Evidence-based Medicine (sh) and critical (tc)

Conceptually_related_to(assesses_effect_of): the test or determinant of something else. Ex. Evidence-based medicine (sh) and the study of medicine (tc) and evidence-based medicine (sh) and experimentation (tc). This is the action and that assesses the entity.

Conceptually_related_to (property_of): attributes of an entity or a process. Ex. Meditation (sh) and awareness (tc)

Conceptually_related_to(method_of): related to a goal and the means of accomplishing that goal. The tool that is being used. Ex. Self-care, Health (sh) and herbs (tc)

Conceptually_related_to(conceptual_part_of): two related concepts. Ex. Self-Actualization (Psychology) (sh) and scientific disciplines (tc). Ex. Herbal Medicine (sh) and natural medicine (tc)

Conceptually_related_to( issue_in): A complication or a problem in another concept. Ex. Self-Help Groups (sh) and Scepticism (tc). Breast Neoplasms (sh) and health (tc)


Hit Counter by Digits
© the authors 2011.
Last updated: 4 August, 2011.
Valid
 XHTML 1.0!