header
vol. 27 no. 1, March, 2022

Book Reviews


Walsh, Susan. Between the spreadsheets: classifying and fixing dirty data. London: Facet Publishing, 2021. xxii, 158 p. ISBN 978-1-78330-503-2. £36.99.

I got this book because of its cheeky title formulated obviously by someone with a good sense of humour and language. As it occurred, the book deals with something that is dear to many librarians - organised and accurate data. Despite the fact that the data the author talks about pertains to business, or rather more precisely it is about procurement and spend data, the general principles and the methodology of working with it could apply to any other data. I view the proposed metodology of Consistent, Organised, Accurate and Trustworthy (COAT) data with a librarian's view, i.e. as someone who understands the necessity of cleaning the mess and the importance of classification and organisation in this process. Taking consistent decisions and following them faithfully helps to get accurate and trustworthy data. Oh, dear, I wish my students could follow these simple rules when they do their references!

Actually, nothing is as easy and straightforward as it may seem, otherwise we would not need consultants and experts, such as the author of the book, or spend ages on correcting literature lists to meet the simplest of the standards. Moreover, knowledgeable human beings constitute an essential force for managing big and small data, but they need to understand the consequences of human errors for their businesses or work of any organisation. The author argues this point very well. People working with data may not be comfortable with the system or software, but if they understand the importance of the accurate data and follow the rules of entering and checking it regularly, it takes less time and saves resources in more than one way.

In fact, this book is a collections of good recipes on how to maintain your data healthy and relevant to the business. A lot of attention is paid on the normalisation of terminology and such elements as the names of supplier companies and codes, use of taxonomies and classification with careful explanations of the differences.

There is a most practical chapter on cleansing the data introducing efficient ways to do it in Exel. The author has good reasons to provide the examples and instructions of work with Exel sheets - they are available to everyone, suitable for the purpose, and most of the organisations have staff familiar with this tool. Relatively simple methods of entering the date and monitoring it as well as monitoring its accuracy are explained and illustrated by tables and screenshots. It is a very practical book for a rather narrow area of working with data.

My interest is far from procurement and any kind of business data, but I have rarely found such a brilliant argument about the importance of COAT - the overall approach to the management of data. I would gladly decipher it again: consistent, organised, accurate and trustworthy data. Those, who use all kinds of big and small data for research will know how tedious and long is the proces of cleansing the data and putting it into shape that allows to get meaningful results from analysing it. Here, the horror stories brought about by inacurate data become very tangible for business decision making, erroneous accounting or leading to fines for violating GDPR rules.

I see broad parallels of the principles and approaches introduced in this book with processes involved in managing metadata for bibliographic databases and digitised document collections, research and workflow data, not to speak of financial or any other data.

I would also draw the attention of the potential buyers of this book to the fact that the impression that I got from the title was quite correct. It is amazing how well this book reads, despite dealing with its dry and very technical subject. The author approaches all her topics with palpable humour and presents them in lively and attractive style. She is not shy of presentign herself as a person, an expert and as a most helpful educator and instructor. Thus the book of technical recipes is turned into an entertaining journey into the maize of data management with a knowledgeable and friendly guide.

I do not expect that this book will find a very large audience among library and information professionals in general, but many small and big business managers should find their way to it. I would suggest that it is a relevant acquisition for business information departments or their equivalents in public libraries as much as putting it on the desks of the people dealing with all kinds of business data.

Elena Maceviciute

Swedish School of Library and Information Science
February, 2022


How to cite this review

Maceviciute, E. (2022). Review of: Walsh, Susan. Between the spreadsheets: classifying and fixing dirty data. London: Facet Publishing, 2021. Information Research, 27(1), review no. Rxxx [Retrieved from http://www.informationr.net/ir/reviews/revsxxx.html]


Information Research is published four times a year by the University of Borås, Allégatan 1, 501 90 Borås, Sweden.