Proceedings of the Eighth International Conference on Conceptions of Library and Information Science, Copenhagen, Denmark, 19-22 August, 2013
Is information still relevant?
School of Information and Library Studies, University College Dublin, Belfield, Dublin 4, Ireland
Recently, we have heard a lot about data, in particular, big data. It is proclaimed that the data deluge has arrived (Borgmann, 2012). As a result, initiatives involving cyberinfrastructures, computational social science, data curation, and data preservation have been highlighted not only in the academic marketplace, but also in the public media. The devotion to big data is partly based on the assumptions that data-oriented research is more accurate and objective and hence very useful in prediction and forecast. This optimistic outlook has obtained support from funding agencies in both public and private sectors. From more fundamental questions concerning the nature of data, to inquiries about the validity of data-driven research, to best practices of preserving and curating data certainly demand immediate attention. What does the data deluge mean for information science? Will data science be the next generation of (library and) information science? Is information still relevant in the flood of big data?
It is somewhat dangerous to attempt the question, Is information still relevant? because there is not even a consensus about the definition of information in information science. In fact, the literature on the concept of information is vast and spans across many academic disciplines, for example, communication (e.g., Peters, 1988), physics (e.g., von Baeyer, 2003), and philosophy (e.g., Floridi, 2010). Within information science, there have been many conceptualizations and discussions over the decades, implicating the changing nature of the field, in terms of both research area/topic and epistemology/methodology, while some suggest that the concept of information should be deflated (e.g., Frohmann, 2004; Furner, 2004). But the naming of information science is not entirely accidental (Farkas-Conn, 1990) and it would be prudent to think clearly about the identity of the field with or without information.
In this paper, we will review the arguments for deflating information and an historical account of the choice of information in information science, followed by an examination of three conceptual constructs-'information as data', 'information as processed data', and 'information as justifiable claims'-for exploring how information may be still relevant.
Why not information?
The meaning of information has become more ambiguous over the decades in information science and in popular discourse. While some have proposed a unifying concept of information (see, most recently, Bates, 2005; see also, Hjørland, 2007, for a counter-argument), there has not been a consensus as to what information should mean or refers to in information science. The ambiguity of the meanings of information is not without consequences, however. Capurro and Hjørland (2003) have commented that although the concept of information may be a status booster for professionals, it has had 'the unfortunate consequences of raising the level of confusion in the discipline' (p. 396). In fact, when the American Society for Information Science and Technology (now Association for Information Science and Technology) celebrated its 75th anniversary in 2012, some were still asking fundamental questions such as What is information? and What is information science?, while Michael Buckland, in his acceptance speech for the Award of Merit, suggested a semantic murder of information, specifically, he suggested that attempts to define information should be withdrawn.
During the past decade, there have been critical analyses of the concept of information concerning issues such as epistemology, history, and power. In his The Modern Invention of Information, Day (2001) engages in the critical analysis of the ideological and political powers associated with the modern concept of information in relation to the European documentation movement in the 19th century, the cybernetics movement in the post-WWII period in the United States, and the recent notion of the virtual. Furner (2004) carefully examines the use of the word information in information studies and concludes that information science/studies can do without the word information because there is always a better term such as knowledge, truth, and meaning available. Frohmann (2004) states that the 'assumption that seeking and communicating information are central to the scientific enterprise both reinforces and is reinforced by an idealization of science that privileges thinking in the service of theory construction' (p. 5) and hence calls for a rejection of 'the concept of information as theoretical kind' (p. 236). In brief, Frohmann (2004) finds the concept of information an obstruction for inquiring social practices in relation to documentation.
It is true that the word information is often used to refer to knowledge about other things, physical or mental. Information can be anything and what information is changes depending on the locale and social setting (Ma, 2012). The situational view of information such as the conceptualization of information-as-thing (Buckland, 1991; see also, Ma, 2010) emphasizes the cultural and social contexts and practices that make information, rather than proclaiming what information is. Hjørland (2000) has lamented that 'the conceptions of information, information retrieval and information science are seriously flawed (pp. 38-39). The denial of the concept of information is based on the very concern that the word information has been used-perhaps, can only be used-rhetorically and metaphorically in information science discourse.
Information science is not the only discipline concerned with information. For instance, information has been a dominant theme in theoretical physics. John Wheeler has conjectured that '[E]very it-every particle, every field of force, even the space-time continuum itself-derives its function, its meaning, its very existence entirely-even if in some contexts indirectly-from the apparatus-elicited answers to yes-or-no questions, binary choices, bits (quoted in von Baeyer, 2003) with his Really Big Questions:
How come existence? Why the quantum? A participatory universe? What makes meaning? It from bit?
And so the task is to find out what information is. The mystery of information in the physical world has been more widely discussed in the media in the past few years. For example, the 2012 February issue of Scientific American has featured an article, “Is Space Digital?” (Moyer, 2012) where the concept of information is discussed in relation to holographic principle and black holes.
Physicists have yet understood what information actually is. Von Baeyer (2003) explains that new terms in physics are frequently introduced by way of recipes for measurement and the term information has only been used operationally. However, uncertainty about the meaning of information does not seem to obstruct communication among theoretical physicists and their communication with the general public, for the term is discussed within a tight discursive frame, within which information may suggest entities or phenomena that need investigation. Put simply, the term is specialized even though we do not yet know what it is.
However, the term information in information science does not share the characteristics of those of a nomenclature: it does not bear a generally accepted definition and it does not serve as the basis and assumptions for research studies. Although many concepts of information have been proposed and discussed, a consensus has not been reached. In fact, the word information is often used metaphorically or is used to represent other things. The uncertainty of the meanings of information raises a question: why information science is labelled information science? Was information a random choice? What marks the beginning of information science?
Most would agree that the development of information science began in the post-War period-the period when we saw the publication of Vannevar Bush's As We May Think (1945, July), in which he conjectures a device, memex, where 'an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility' (Bush, 1945); and also the period when Claude Shannon's theory of information, originally published as Mathematical Theory of Communication in the Bell System Technical Journal in 1948, coupled with the advances in electronics such as the invention of transmitter, made possible the dramatic increase in storage capacity and the speed of transmission in telecommunication.
During this period, the dissemination of scientific and technical information became a national priority in the United States. It was believed that the progress of science and technology is dependent upon the development of information systems. Consequently, collaborative efforts took place among scientists, engineers, and librarians in order to facilitate the retrieval of scientific and technical information (Farkas-Conn, 1990). The inter- or multi-disciplinary nature of information science was a natural development under the direction and support of the National Science Foundation (Hahn and Barlow, 2012) and other constituents. The American Documentation Institute became the American Society for Information Science in 1968.
Maybe because of the inter- or multi-disciplinary nature of information science, research has not been constrained to information system design for the retrieval of scientific and technical information. For instance,
[B]y the early 1960s, the field had shifted from being primarily concerned with bibliography and science information; it was now becoming a more generalized information science, defined at the Georgia Institute of Technology symposium as [t]he science that investigates the properties and behaviour of information, the forces governing the flow of information, and the means of processing information for optimum accessibility and usability' (Farkas-Conn, 1990, p. 199).
Courses in information science became part of the library school curricula; and by the late 1980s most library schools had incorporated information science in their names (Farkas-Conn, 1990). This development implies that the term information in information science does not necessarily refer to the concept of information in Shannon's theory. As Bar-Hillel and Carnap (1953) point out, 'impatient scientists in various fields applied the terminology and the theorems of Communication Theory to fields in which the term information was used, presystematically, in a semantic sense, that is, one involving the users of these symbols' (pp. 147-148). The term information has mainly been used for representing other things, or used metaphorically, and sometimes used as an ambiguous placeholder in information science discourse. Nevertheless, the naming of information science was not entirely accidental-it was closely related to Shannon's information theory, as well as its lineage to library science and documentation.
Is data information?
Information in information science has not been conceptualized as a concept of information system, or a concept of bibliography, or a concept of documentation. Rather, it has largely been conceptualized or identified in models of communication and cognition (the Shannon-Weaver model), cybernetic epistemology (Bateson, 2000/1972), theory of evolution (Bates, 2005), to name a few. Aside from these conceptualizations of information, there have also been definitions that are not based on a model or a theory. One of the most popular definitions explains that information is data that has been processed into a meaningful form (Information, 2003):
Seen this way, information is an assemblage of data in a comprehensible form capable of communication and use; the essence of it is that meaning has been attached to the raw facts.
The Online Dictionary for Library and Information Science (2006) provides a similar explanation:
Data presented in readily comprehensible form to which meaning has been attributed within the context of its use.
Zins's (2007) survey of information scientists worldwide also shows that this view of the relationship between data and information is not uncommon. This understanding of information is similar to the sense of information in the DIKW (Data-Information-Knowledge-Wisdom) model (see, for example, Ackoff, 1989), although the model is not substantiated by a theory, nor is it supported by empirical evidence. Nevertheless, since the talk of data is becoming more prominent in information science discourse, it is time to rethink the relationship between data and information. We explore this relationship by answering the question: Is information still relevant?with three conceptual constructs: (a) information as data, (b) information as processed data, and (c) information as justifiable claims.
Information as data
There is no conceptual difference between data and information. Both data and information are facts that can be collected, stored, organized, and retrieved. In this view, neither data nor information has to deal with interpretation or meaning, because data/information is the objective facts out there. It is assumed that the more data/information is collected, the more we understand our environment and culture. The progress of science is dependent upon the amount of data/information collected and analysed.
Many sciences depend upon large-scale data in order to construct models of the physical environment and the human body and to make accurate and useful forecasts and predictions. For example, weather forecasting is contingent on the analysis of historical climate data as well as current weather conditions. The more data we have, the better we understand the composition of the Earth, the plants, the animals and the human body. Citizen science has emerged as a way of effectively collecting data/information (see, for example, Citizen Science Alliance). In these sciences, it is believed that the objective reality is composed of data.
Therefore, there is no significant difference between the terms data and information. When we say that 'DNA is a structure that encodes biological information' (Nature Education, 2012), we can probably also say that 'DNA is a structure that encodes biological data' without explaining the differences between information and data. Most of us would interpret that biological information and biological data refer to the same thing and conveys the same meaning. In this scientific realm, it is believed that the objective world is composed of many parts and each part can be broken up into smaller parts. The understanding of the natural world, including the universe and our genetic structures, is based on our observation through sensory experience. Data/information is understood as the facts that exist in the natural environment. Human languages are used for representing and recording the facts, but will not alter the data/information embedded in the environment and our corporal bodies.
In scientific research, data/information collected is usually added a label (pre-existing or temporary) and the label may be replaced by a nomenclature, that is, artificial linguistic devices for limiting the ambiguity of meaning. Information as data implicates an empiricist theory of knowledge. The belief that the objective world is composed of enormous amount of data, and as such, knowledge of the universe and the understanding of life require discovering and collecting data/information.
Information as processed data
Data and information are conceptually different. Data is the objective facts that can be collected and stored; however, these facts are not information until they are processed and organized. Data is not meaningful, but information is. It is assumed that data is not understandable in its raw forms and that understanding is not possible without the artificial construction of information infrastructures and that information is the understandable/meaningful form of data. Therefore, the process of transforming data into information is a matter of utmost importance. Without the appropriate mechanisms in place, data will stay raw and as a result, no information about the physical world and the cultural and social worlds can be generated and the progress of the sciences will be impeded. The availability of data is only a prerequisite for scientific discovery and technological innovation.
Information infrastructures, including taxonomical structures, metadata schemas, programming languages, statistical techniques and software, are needed for transforming data into information. Information retrieval systems, from the Cranfield experiments to the Google search engines, depend upon the manipulation of encoded data using sophisticated algorithms. In traditional information retrieval systems, what is called data usually refers to a text-based or graphic-based document. Information infrastructures are used for facilitating the retrieval of these documents, not making sense of these documents. The process of transforming documentary data into information often involves representation, for example, the creation of bibliographic records, and organization based on certain classification systems or taxonomical structures.
The processing of big data, however, is different from that of traditional information retrieval systems. Presumably, the data points in a data set only provide a vague picture of whatever is. Unlike a document (for example, a journal article or a webpage), we cannot create a representation based on the significant characteristics, physically or intellectually, of the data. Rather, data-processing in big data research is the process of creating simulations of objective reality, rather than a representation of it.
he purposes of information retrieval systems and big data research are thus different: the former aims to facilitate the retrieval of documents; the latter strives to create simulations of the objective world. The former allows the readers and the listeners to have their own interpretation and understanding; the latter gives you packaged facts presumed to be true and real. Nevertheless, both require similar techniques and technologies for processing data and both assume information as processed data.
Information as justifiable claims
There is no direct relationship between data and information. In this conceptual construct, information is considered as justifiable claims. What is considered information is co-determined in a cultural context. Information may be potentially useful in accomplishing a task, or potentially makes a person more knowledgeable. This is why we assume that what lies in front of our eyes, when we are browsing in a library or looking through the results of a search engine, are potentially useful for the pursuit of knowledge, or the attainment of a skill, or perhaps, entertainment. Both information and information need are temporal: once we find the information we are looking for, it may become knowledge that will stay with us as long as we live, or the information we found may become misinformation or disinformation that we will forget eventually, and so the need for information-the key to knowledge, broadly construed-may arise again.
Information may refer to a book, a speech, or a sign on a tree. But the transient nature does not stop information as information because information is necessarily temporal-we want or need information when we want to know about something (see also, Buckland, 2012); and this something that we want to know about changes from context to context. However, it does not mean that information is merely a meaningless placeholder because what it represents has the potential of making a person knowledgeable at a certain time and place.
Since information is temporal and since it has the potential of making a person more knowledgeable, it should be justifiable. Consider this question: Is a strand of my hair information? -- Well, it depends. We do not usually think of a strand of hair as information despite the fact that it tells a lot about our corporal bodies; however, it would be a piece of evidence-information-if I were a suspect of a crime. Put simply, what information refers to has to be justifiable and is thus usually very specific to time and place. Nevertheless, we have generally accepted that cultural objects such as books, magazines, and TV news are information, although we might have a very different opinion when we are asked about a specific book, magazine, or TV news channel!
Information should be justifiable. Is it useful for our pursuit of knowledge and enlightenment? Is it a hint for a scientific discovery? Does it give us the correct direction to the train station? It is up to our judgment and decision to agree or disagree with the claim. Data can be information if it has the potential to inform and if the potential is justifiable. Information is not hard facts; rather, we make information through our actions.
Is information still relevant and in what ways? In the first conceptual construct, we think of information as data, or data as information. Data/information is embedded in our physical environment and in our corporal bodies. We collect data/information through our senses and we create representations and models to help us understand the universe and other physical entities. In this conceptual construct, it is assumed that knowledge is only concerned with what we can observe through our sensory experience and that more data/information leads to better understanding of the world. What is true or real depends on what we can see. The empiricist epistemology has become widely accepted not only for the study of physical entities, but also for the study of social and human affairs for its scientific characterization. The conception information as data and its relation to knowledge have become more popular as big data becomes available and accessible through the World Wide Web, despite the fact that big data social sciences research creates simulations from data, rather than creating representations and models based on observations. Notwithstanding the perils of empiricism in social research, the conceptual construct of information as data (or data as information) is a statement of a fact. The conceptual construct is not essential for theory construction or scientific discovery and its generality gives little guidance as to how or why information should be a fundamental concept in information science.
In the second conceptual construct, information is conceptualized as processed data. In other words, information is a product of certain procedures. Its making is based on the procedures that put into processing the raw data. There is no assumption or judgment whether data or information is good or bad, useful or not. It is believed that data will become meaningful-and hence becomes information-if the correct procedures are in place. This conceptual construct has encouraged the very important development of information infrastructures including hardware, software, metadata schemas and ontologies. However, the absence of human agency in this conceptualization usually leads to the information-processing model of cognition and communication (see Ma, 2012). In this view, processes and procedures are highlighted and emphasized; human and social factors, if considered, are usually from an instrumental perspective, say, whether the system fulfills user needs? Or, whether the system is user-friendly? Moral and political issues, however, are seldom discussed or considered in the construction of information infrastructures. Further, standards, rules, software, databases, and many other things involved in the processing of data are developed for instrumental purposes without public consultation in regard to issues such as freedom of speech and privacy, and ultimately, what and how we may be informed. Information as processed data is a conceptual construct that highlights neither information nor data, but the procedures and mechanisms for processing data. Similar to information as data, this conceptual construct also adheres to the empiricist theory of knowledge for its dependence on data in generating and producing information and knowledge.
The last conceptual construct, information as justifiable claims, does not state a definite relationship between information and data. There is not a static form of information because what information refers to is temporal, under the condition that its informativeness is justifiable. Unlike nomenclatures in the sciences, information in information science is not strictly defined; rather, the term is used to represent other things, or knowledge about other things. Therefore, if a set of data is potentially useful for the pursuit of knowledge and if the potential is justifiable, we may call the set of data information. It does not assume that all data-or all processed data-is information, but data collected, stored, preserved, and organized is information. This should not be confused with information as processed data, however, because information is not recognized as a product of procedures, but a product of human decisions in collecting, storing, preserving, and organizing certain types of information. Not only that human agency plays a major role in determining what information may refer to, the decision should be defendable and justifiable. This sense of information, though less of a theoretical kind, allows for practical, moral and political considerations in the construction of information and information infrastructures.
The choice of the word information was not arbitrary. It is clear that the early development of information science was closely related to Shannon's information theory (see, for example, Shaw and Davis, 1983). However, it is also clear that the concept of information in information science has not stayed close to Shannon's concept of information and that there has not been a consensus about the definition of information. The talk about information has been going on for many decades-Until recently, the talk has been overwhelmingly about data, in particular, big data. So this paper asks a risky question: Is information still relevant?
The three senses of information described above are not novel; however, they are not always used in sharp distinction and are usually used without considering their epistemological commitments. Surely, metaphorical or rhetorical uses of a word are not uncommon, nor is it ultimately harmful in our day-to-day chatter, not to mention that they could be beautifully constructed in poems and plays. However, an ambiguous theoretical concept could be detrimental to scientific research and professional discourse. It is because theoretical concepts, presumably, serve as the very assumptions in scientific research and the very bases of the development of standards, measurements, and best practices in a professional field. The consequences of ill-defined or ill-understood concepts could be immediate and direct such as a false understanding of the universe or the side effects of a pharmaceutical product, while some may not be directly perceivable such as the affects of political and commercial propaganda. An ill-defined concept of information does not provide a theoretical framework for understanding information and information-related phenomena but perhaps produces rhetorical and metaphorical uses based on certain epistemological commitments and political agendas. Hence, information is relevant in a scientific or professional discourse only if the use of the term is specialized in information science discourse.
Information science has largely been concerned with what and how we may be informed. Not surprisingly, the processes of making information accessible, including representation and organization of information as well as the understanding of user needs, and the development of evaluative techniques such as bibliometrics are well-developed research areas (Milojevic, et al.2011). Many of these activities are instrumental in nature-whether a system works or whether an evaluative measure is effective-but they does not and should not support a notion of information within an information-processing model because what and how we are informed is not a product of procedures, but a product of human action. What information is is co-determined in the construction of information infrastructures at a certain time and place.
The analysis above shows that the conceptualization information as data states a matter of a fact, whereas information as processed data highlights neither information nor data. Nevertheless, both constructs emphasize the technicalities of data or of data-processing and implicates the empiricist theory of knowledge. If we adopt either or both of these conceptions of information, information science could easily be renamed as data science, or even, computational science.
In the flood of data, information is relevant if it is understood as justifiable claims that shape and are shaped by the standards, rules, and best practices of data preservation, data curation, and other activities associated with big data. The temporality and normativity of information demands not only practical, but also epistemological, ethical and political considerations in the construction of information infrastructures.