Book Review: Scholarly metrics under the microscope

BOOK AND SOFTWARE REVIEWS

Cronin, Blaise & Sugimoto, Cassidy R., (Eds.). Scholarly metrics under the microscope. Medford, NJ: Information Today, 2015. xii, 963 p. ISBN 978-1-57387-499-1. $149.50

At almost 1,000 pages, this is the largest volume to be delivered to my door for review since I launched Information Research twenty years ago, and quite a daunting task for a reviewer.

The aim of the Editors' work is certainly laudable:

we do feel that by assembling a representative cross-section of the literature critiquing evaluative bibliometrics we may be able to raise awareness of the approach's limitations and also encourage greater procedural caution among relevant constituencies.

I suspect that, unfortunately, one of those constituencies, that of university administrators, is not likely to abandon bibliometric evaluation for the purposes of promotion and research evaluation. Sadly, the ease with which numbers can be manipulated seems to lead to the assumption that the numbers must mean something; after all, the process is 'scientific', is it not? Surely it must be, since numbers are involved. I suspect that the level of statistical sophistication required really to understand the numbers is lacking. This is the subject of an entertaining paper by Roger Brumback, Editor of the Journal of Child Neurology, in Part 4 of the volume, which I heartily recommend.

However, I leap ahead of myself: to return to this volume. It consists of fifty-five papers, grouped into six section, plus an introduction and an epilogue, and separate introductions by the editors for each of the sections. I recommend reading all the editorial material to begin with: not only do the editors introduce the included papers, they also review the associated literature in a very helpful and occasionally amusing fashion. My only complaint is their infelicitous use of 'behaviours', an uncountable noun in English for centuries before some idiot psychologist decided that the plural was needed - to divide 'behaviour' into 'behaviours' is clearly nonsensical. That moan aside, however, if you read only these introductions, you would end up well informed.

For the purposes of a review, it is not necessary to read all fifty-five papers, although the student of bibliometrics certainly ought to; that would extend the reviewing period well beyond the next publication date, and we try to review books in the next available issue. My approach, therefore, has been to read one or two papers in each section (which, in itself, amounts to reading a typical book). My choice was at times random, at times guided by the editors' introductions, and at times because the paper was completely new to me. However, it all begin with Eugene Garfield's paper, Citation indexes for science, originally published in Science in 1955, which appears immediately after the editors' introduction. This ought to be essential reading (and probably is) for any student new to citation indexes. It is evident, and obvious, that Garfield saw the citation index not as a tool for the evaluation of people, but as a more effective means of discovering information of interest to the scientific researcher than existing abstracting services. His key argument was that it is impossible to index scientific papers to the degree necessary to meet every possible future contingency, and he gives examples based on papers that were cited outside the journal, and even the field, in which they were published, which, without a citation index, the scientist would have to rely upon serendipity to discover. He also saw citation indexes as a tool for libraries to select the most 'useful' journals in any field, although, interestingly, he used the term 'impact factor' in relation to the impact made by a scientist rather than by a journal and perhaps he had an intimation of the potential misuse of citation metrics in his final words: 'It will help in many ways, but one should not expect it to solve all our problems'.

Moving on to Part 2, Concepts and theories, I have to admit that my choice of Beyond the Holy Grail, by Paul Wouters, was not a particularly enlightening one. Wouters has two arguments in the paper: one, that the cycle of scholarly communication, based on peer review is influenced by the citation cycle, through its impact on the perception of scientists and policy makers of the nature of their fields of concern; two, that co-word analysis and citations are parallel 'indicators' of science and technology. The problem with the first proposition is that, as Wooters acknowledges, there is little empirical research to support the proposition, although anectodally we may be aware of it; and the problem with the second is that it is not clear what the 'indicators' are indicating, other than that co-word analysis may have something to do with meaning. In the end, and my fault, I'm sure, I emerged rather more confused about the whole subject than I was before.

It was, of course, impossible to resist Blaise Cronin's Toward[s] a rhopography of scholarly communication, if only to find out what 'rhopography' means. This is a fine example of Blaise's style, well written, thoroughly grounded in the literature, and with just that hint of humour that lightens the load. It turns out that 'rhopography' is derived from the Greek, rhopos, meaning 'petty wares', and is used in art history to describe the representation of everyday things. Here, it is used to identify those who don't figure in the authorship of a paper, but who are mentioned in the acknowledgements and who probably play an even bigger part in the research than some of the cited authors. The 'rhopos' of a paper would not, I think, figure as an evaluative metric, and it is not the intention to suggest it should. Rather, it could be a contribution to the sociology of scientific communication, uncovering the complexities of research indirectly, rather than through ethnographic research in the laboratory.

There are eleven papers in Part 2, Validity issues, which is, to my mind, the most important section in the book, since the validity of some uses of citation indexing is extremely questionable. Choice of the first paper was easy, just twelve years after Garfield's paper Kenneth E. May wrote of Abuses of citation indexing, in what I think was probably a letter, since it is very short. The 'abuses' with which May was concerned were those that could be perpetrated by authors, rather than the present abuses by policy wonks and university administrators:

Authors will choose their citations so as to make the citation indexes serve their purposes... The idea that journals and referees will prevent such abuses is no more realistic than the notion that they do so now.

I don't know what work has been done since to uncover such abuses and perhaps it goes on in a minor way, but I rather doubt that it has developed to an extent that would enable an individual author to manipulate the system so that he or she benefitted. Self-citation, of course, is pretty rampant, sometimes appropriately but often, I feel, directed to having another 'citation', but the notion that a cabal might collaborate to promote its members seems a little farfetched.

Another short contribution, No citation analyses, please, we're British, by Alun Anderson, reports on academic reaction to the proposition, mooted by the Higher Education Funding Councils in the UK, that quantitative indicators should be employed in the 'Research Assessment Exercise'. The proposition was met with what can only be described as total opposition and, ultimately, the idea was rejected. Evaluation by metrics has crept in by another route, however, for example, business schools in the UK (and perhaps other academic specialisms) now present their academic staff with a list of the journals in which they must publish, if they are to be entered in the school's submission to the assessment process. So, staff having rejected metrics pretty well unanimously now find themselves internally evaluated by the impact factor of the journals in which their papers are published. It seems that university administrators are less open to persuasion than the Funding Councils.

I could have taken any of the papers in this part of the book to review the limits of assessment by citation, but another short paper, this time a blog entry by Stephen Curry, a Professor at Imperial College London, entitled Sick of impact factors, in which he describes impact factors (or, rather, their use for purposes well beyond what Garfield had in mind) as 'a cancer that can no longer be ignored'. Noting the research by Per Seglen, which revealed that, 'typically only 15% of the papers in a journal account for half the total citations', he advocates stigmatising impact factors as cigarettes have been. Almost his final word is, 'if you use impact factors you are statistically illiterate': harsh words and written before the death of Professor Stefan Grimm, allegedly the result of what might be called insensitive management by metrics.

Moving on to Part 3, and conscious of the length of this review, beyond simply noting Eugene Garfield's 1985 response to critics of the scope and coverage of what was then the Science Citation Index as worth reading, I chose just one paper, or, rather, a chapter from a report: Novel forms of impact measurement—an empirical assessment, by Wooters and Costas. The team examined a total of fifteen services, including commonly known ones such as Google Scholar, Mendeley and Zotero, and some that I had never heard of such as Readermeter and ScienceCard, and conclude,

in their current state, due to their limitations and restrictions in their use... it can be concluded that they seem to be more useful for self-assessment than for systematic impact measurements at several levels of aggregation.

The authors might also have noted that most of these services, if not all, make no attempt to present data according to any bibliographical standard. Mendeley and Zotero, for example, depend upon what an individual user records, and others, like CiteULike follow no bibliographic standard at all.

Part 4 deals with indicators such as the h index and variants and the old and new Crown Indicators. That any of these indicators should be considered capable of measuring the 'quality' of a person's research is really rather laughable - would Einstein have worried about his h index? In what we might call the companion volume to this collection (Cronin and Sugimoto, 2014), Yves Gingras sets out criteria for evaluating indicators, and demonstrates that most fail to reach the desirable level. One of the main problems with indicators is that the data from which they are derived are considerably skewed (not the point earlier regarding the proportion of papers accounting for 50% of citations) and, as a result indicators based on averages and pretty useless. In this section, Gingras teams up with Larivière, to critique the 'new Crown' indicator, the key point of which is that the original Crown indicator mistakenly used the wrong way of averaging measures and that the 'new' Crown index is simply a reversion to the correct way of doing things and, as such, is not new at all. When acceptance of these indicators depends on trust in believing that their authors know what they are doing, this does not encourage one to accept any kind of indicator!

In Part 5, we come to the relationship between research indicators and science policy, including research evaluation. Here, another early paper (1987), by Jean King, drew my attention. King was looking for indicators of research output that would be cheap and effective to use in research evaluation (at the time she worked for the UK Agricultural and Food Research Council). Her requirements were that the method of collecting data should cost not more that about 1% of the research expenditure being evaluated, 'routinely applicable and capable of focusing on any clearly-defined field'. After surveying the full array of potential candidates, she came to no very positive conclusion, there were difficulties with all possible measures and the manual labour involved at the time made the processes of data collection and analysis expensive. It seems that while much may now be automated, the doubts about the validity of measures remains.

A short article from the Economist in 2013 provides a bit of comic relief—unintentional, I should add: it recounts the arrest of criminals who were providing fake papers for medical journals, which they sold to academics. The reason for their success was that in China, the award of research grants is based on the number of papers published, without regard, it seems, for quality. Academics, desperate for research funds to promote their scientific careers, succumb to the temptation to make use of these services. I do wonder, however, how different it is from the services in the West that write up research, mainly, I understand, for researchers in the pharmaceutical industry, who then put their names to the product: I didn't realise that such services existed until I met someone who worked for such a company. The China case is a prime example of how attempts will be made to 'game' whatever system is evolved for evaluating research.

For more fun from Part 6 you can read Marian Apostol's Science against science and Blaise Cronin's not altogether serious response, New age numerology: a gloss on Apostol, and, if you have ignored my recommendation to begin by reading all of the editors' introductions, you should certainly read this one, because it sets the scene very well. The title of this part is, Systemic effects and the papers concern the impact of the metrics phenomenon and the editors note:

The use of metrics, whether to monitor, compare or reward scholarly performance, is not a value-neutral activity. Metrics are shaped by, and in turn shape, public policy decisions; they focus the institutional mind, influence the allocation of resources, promote stratification and competition within science, encourage short-termism, and, ultimately, affect the ethos of the academy.

One might add that, as a result of these influences, notions of collegiality disappear from the institution, since in the race for reward (of any kind), it is every man or woman for him/herself.

I could go on for much longer writing about this excellent compilation, which, together with the companion volume, constitutes a major achievement in reviewing the concept of scholarly metrics and provides a sound ground to action to ameliorate the worst excesses of their use. In this regard, the last word should be with the editors, who argue that there should be, 'a collective determination that the assessment of a scholar's work should be based on direct engagement with that work rather than on quantitative indicators of the work's impact.

To which I can only say, 'Amen'.

Reference

Cronin, B. & Sugimoto, C.R. (2014). Beyond bibliometrics: harnessing multidimensional indicators of scholarly impact. Cambridge, MA: MIT Press

BOOK AND SOFTWARE REVIEWS

Cronin, Blaise & Sugimoto, Cassidy R., (Eds.). Scholarly metrics under the microscope. Medford, NJ: Information Today, 2015. xii, 963 p. ISBN 978-1-57387-499-1. $149.50

Reference

Professor Tom Wilson
Editor-in-Chief
January, 2015

How to cite this review

BOOK AND SOFTWARE REVIEWS

Cronin, Blaise & Sugimoto, Cassidy R., (Eds.). Scholarly metrics under the microscope. Medford, NJ: Information Today, 2015. xii, 963 p. ISBN 978-1-57387-499-1. $149.50

Reference

Professor Tom Wilson Editor-in-Chief January, 2015

How to cite this review

Professor Tom Wilson
Editor-in-Chief
January, 2015