vol. 25 no. 2, June, 2020

Book Reviews

Badia, Antonio. The information manifold. Why computers can't solve algorithmic bias and fake news Cambridge, MA: MIT Press, 2019. xvii, 334 p. ISBN 978-0-262-04303-8. $50.00/£40.00

That 'information' is a problematic concept will be no surprise to any information scientist: the topic has been discussed many times in the literature. The meaning of the term has changed over time from a piece of advice, to a statement of criminal intelligence, to a divine message, to a mathematically defined quantity divorced from any concept of news or meaning (Oxford English Dictionary). By the time we reach the present day, it is no wonder that the term can cause so much confusion and that different disciplines have different definitions of what constitutes 'information'.

Badia's book is an attempt, if not to resolve some of the problems associated with the concept of information, then at least to clarify them. He does so in the context of the computer manipulation of 'information', with the aim of showing why computers cannot, of themselves, overcome the problem of algorithmic bias, or the problem of fake news. He does so by, first, discussing the different contexts in which 'information' is used: the syntactic, the semantic, the pragmatic, and networked information, or 'information as communication'.

The syntactic mode of information is that most readily observed in the work of Shannon and Weaver: the author discusses their work, focusing mainly on Shannon, and the emergence of information theory, and compares it with the work of their Russian contemporary, Andrey Kolmogorov. The work of Shannon and Kolmogorov has been closely linked to computer science and telecommunications, and, as Badia notes, Shannon's work can claim to be the foundation of the modern telecommunications industry.

It is with information at the semantic level that we meet the definition of information that is more closely associated with information science and information management. Badia discusses this level under the heading of Information as content: semantics, possible worlds, and all that jazz. Semantics is concerned with meaning, but the problem is that 'meaning' is just about an elusive concept as 'information':

As of today, there is no widely accepted, formal definition of "meaning". As in the case of "information", an intuitive, everyday notion of meaning has yet to be totally captured by any standard definition. Adding questions of meaning to the task of defining information makes the entire enterprise of finding a semantic theory of information considerably harder... (p. 50).

However, the author does not give up at this point, but reviews a number of theories of semantics, and the relationship between information and reality, from Bar-Hillel and Carnap's theory of 'possible worlds', through situation theory, and relation of information to knowledge, which gets us into notions of what constitutes "truth", valid data, and similar issues. Floridi's ideas are also covered and his notion that data must truly reflect the observed reality. We know, of course, in this pandemic situation in which we find ourselves at present, that finding data that truly reflect the reality is not always possible and that we have to rely upon something less than the absolute truth—indeed politicians manipulate the data nightly in their briefings, to try to convince us that they are managing the situation, when it is evident that the data they use reflect only part of the reality.

The pragmatic approach to information argues that both the syntactic and the semantic approaches fail to capture the totality of the nature of information, and that the concept of information use is needed. The work of Grice is cited here and information is viewed as that which is communicated, or, as Badia puts it: that which is needed to fulfill an information need. One of the concerns of the pragmatic approach is what characteristics information needs to have in order to make an impact on the recipient of the information. Ideas of the novelty of the information—telling me something that I didn't know before; and confirmation—verifying something that I already believed, are central to the notion of measuring impact.

In the chapter on the network approach, the author focuses on the diffusion of information through networks, and the idea of ‘emergence’: the idea that new properties can emerge through the interactions among the nodes of the network. This chapter is very much informed by formal network theory and the examples of emergence seem to have little to do with the semantic and pragmatic aspects of information.

As noted earlier, all of the first part of the book is a preparation for the discussion of what computers can and cannot do. The author notes that, ‘A computer is, in the end, a machine that executes algorithms’ (p. 192) and that it is, ultimately a machine for dealing with the syntactic nature of information. Computers can deal with meaning (at least at present) only by using some surrogate: Badia cites the Google Page Rank algorithm, which ranks search engine output according to the extent to which a page is linked to by other pages, thereby arriving at a notion of the ranked relevance of the pages in the output. This syntactic/algorithmic nature of computers is at least part of the explanation of algorithmic bias: the computer can only work with the data it is given and if the input is biased, the output will be biased. One of the most notable examples of this was the COMPAS algorithm used by courts in the USA to determine whether a defendant in a case should be released on bail. An examination of its results showed that black defendants were twice as likely as white defendants to be identified as high risks for re-offending. The basic problem is that blacks are much more likely to be arrested than whites, meaning that the core data are biased by ethnicity. Unless the big data files used by such systems are cleaned of the inherent bias, the results of their use will continue to be biased.

As to fake news, we get into the difficult area of the nature of "truth"—and when a US President's aide can argue that lies are simply ‘alternative facts’ and, apparently, be believed by a significant proportion of the electorate of the USA, we can see how difficult it may be for a mere machine to distinguish between fact and fantasy. This is why Facebook now employs thousands of human workers to fact-check the posts to its pages.

Given the author's aims, it is a little surprising that he gives no attention to the treatment of information in information science. From Otlet and Fontaine onwards (and probably earlier), scholars have wrestled with the concept, but the author has no mention of them, nor of Shera and Egan's 'social epistemology', or of Patrick Wilson's 'public knowledge', or Michael Buckland's more recent analysis of the four aspects of 'information'. All of these authors have something relevant to say about 'information as content' and their incorporation into Badia's discussion would have added richness to the account. That said, it must be noted that the author has covered an enormous range of relevant literatures and has done an excellent job of explaining and integrating theories from a variety of disciplines. The book is not an ‘easy read’, but it will certainly be of value to information researchers, in whatever discipline they work.

Professor T.D. Wilson
June 2020

How to cite this review

Wilson, T.D. (2020). Review of: Badia, Antonio. The information manifold. Why computers can't solve algorithmic bias and fake news. Cambridge, MA: MIT Press, 2019. Information Research, 25(2), review no. R694. http://www.informationr.net/ir/reviews/revs694.html

Information Research is published four times a year by the University of Borås, Allégatan 1, 501 90 Borås, Sweden.