Information Research, Vol. 5 No. 2, January 2000

The lowest canonical denominator: electronic literary texts, and the role of the information professional

Claire Warwick
Department of Information Studies
University of Sheffield
Sheffield, U.K.

This paper argues that the English literary canon has reasserted itself in electronic form. It traces the history of print canons and contends that analogous forces are shaping an electronic canon. This issue should concern not only literary critics, but also information professionals. Humanities scholars need diverse resources, rare texts and multiple editions of works. Yet canons threaten diversity of resources, and it is difficult for works to re-establish their place once excluded. If collection managers aim to provide a wide range of high quality resources for future users then an electronic canon is undesirable. If we are to avoid such problems then questions of electronic collections policy must be addressed. For example, do funding councils bear a responsibility to ensure that less canonical texts are available? Who makes the decisions about what is important, and on what basis? How should electronic collections policies be formulated? Should the choice of editions which are digitised matter, is a bad edition better than nothing at all? Should collections policy for electronic resources be organized on a national level, or left to individual institutions? These are areas in which an information professional can and should be able to make an important contribution.

Table of contents


Some literary critics would have us believe that the idea of a literary canon of great works is so beleaguered that it is in danger of being eclipsed altogether by the study of trendy theories or abstruse texts (Kermode, 1993, Bloom 1994). While this seems unlikely, even in the world of print, the canon is certainly alive and well and being replicated in electronic form. In this paper I shall examine the problem of the canon as it refers to electronic text and why it should be a concern for the information profession, both now and in the future. No research has yet been done into the amount and variety of electronic text available in English literature, whether available freely over the Internet, produced by academic text archives or through commercial publishers. It is also a commonplace that the amount of material on the Internet grows constantly and is entirely uncatalogued. Therefore, my methodology has to be based on empirical evidence, rather than authoritative quantitative data. Observations will be made based on a case study of humanities computing at Oxford University. However, as I have continued to investigate the production of electronic texts in English literature these observations have not yet been contradicted.

My interest in the literary canon in electronic form began when I worked for the faculty of English at Oxford, and was responsible for the maintenance of the Web site. I was asked to help to produce some on-line learning materials for a course on the history of the English Language. Certain authors had been specified for study and I set out to find links to resources about them and to electronic texts of their works which were available on the Web, but of good enough quality to be used by university students and their teachers. I found that, perhaps not surprisingly, there were literally hundreds of sites of interest about writers such as Shakespeare and Jane Austen, and multiple editions of their work. The quality was variable but at least it was there. I was however less successful when I came to look for texts of writers such as John Clare.

Another part of my job involved teaching students the basics of electronic text analysis. Some of them saw its possibilities and were enthusiastic about wanting to try it out on authors whose works interested them. However, it was not long before problems again surfaced. They could find multiple editions of Milton's Paradise Lost, but few, if any of his prose works, which are, in linguistic terms just as interesting. Coleridge's poetry could be found, but again, very little prose. Defoe's Gulliver's Travels was easy to locate, but not The General History of the Pirates on which a graduate student was working for his PhD. Also, the students wondered about the editions of these texts. Were they the same as those they had been asked to use by their tutors? If not how could they tell which editions they were, and whether they were reliable, complete, and proof read? Some of the more fortunate students were able to find the texts they were looking for in archives such as those at Oxford, University of Virginia and the University of Michigan. This at least meant that they were able to find information about the editions and reliability of the texts. However, there was still no certainty that the text they needed was there, especially if it was by a non-canonical author or a lesser known work by a well-known writer.

This led me to carry out some further research. As I attempted to find electronic literary texts, it did, indeed, seem as if the more canonical a writer was perceived to be then the easier it was to find an electronic text of his (usually his) work, or that the better known the individual work by that writer, the more likely that an electronic text of it was to be found. This applied whether texts were freely available on the Web, in a text archive or in collections such as those published commercially. When I looked for one of the writers on whom I did my own research work, Richard Crashaw, I found a few poems. When I looked for the other, Joseph Beaumont, I found nothing at all. Both writers are not the best known of seventeenth century poets, but a knowledge of Crashaw's work would be expected of most scholars of the period even if Beaumont, is not widely studied.

The electronic canon

The history of canon formation

What then do I mean by the idea that the canon is reasserting itself electronically? The Canon of English literary works is an exclusive gathering of 'great' writers whose work has stood, or is felt likely to stand, the 'test of time' and whose value and merit will not only be briefly enjoyed by its contemporary readers but will be worthy to be read by succeeding generations. Opinions amongst scholars differ about exactly when the canon began to form, some critics suggest that as early as the sixteenth century writers were being recommended as especially valuable (Ross 1997, Terry, 1997 ). Shakespeare is first praised as 'not of an age but for all time' by his contemporary Ben Jonson , but it was not until the eighteenth century that the formation of what we think of today as the canon began. Until this time the most valuable works of literature had been seen as those in classical languages, but critics such as Samuel Johnson began to propose a canon of great English writers (Kernan, 1987:158-167). Their classical forbears had been recommended as guides to learning good rhetoric and written style. However eighteenth century critics began to recommend writers for their intrinsic aesthetic value and above all as the arbiters of good taste, both as part of the educational system and for the general reading public (Court, 1992, Terry, 1997: 89). Weinbrot (1997) also argues that the growth of the canon was related to economic factors. Publishing began to be privately financed, and made profitable by the income of the growing literate classes. This meant that writers were no longer dependent on aristocratic patrons for their income and could publish what their publishers agreed would sell. Coupled with the first limitation of copyright on books to twenty-one years, this meant that for the first time texts which critics had decreed to be canonical were available to be bought relatively cheaply (Ross, 1992). Apart from the question of the low price, this may already suggest parallels with the situation in commercial electronic publishing in the 1990s. The appearance of large amounts of electronic text has often depended on the ability of publishers such as Chadwyck-Healey to acquire cheap, out of copyright print editions for conversion to electronic form. Thus, economic factors, the views of eminent and influential critics and the needs of the educational system drove the formation of the original western canon of great works.

However, can it be said that just because it is, at present, easier to find electronic editions of writers who are most well known in print, that a new canon is being formed, or an old one reinforced? Many of the scholars who work in the area might disagree with this assertion, since from the period of Johnson onwards it has been asserted that canons do not simply happen, they are prescribed and deliberately chosen and, more importantly, some works must also be deliberately excluded (Hunter, 1997:96). It might be argued, therefore, that there is not as yet a canon in electronic terms since nobody seems to have produced a definitive list of those texts which should be included in it or excluded from it. We might also argue that texts published electronically by academics or which private enthusiasts choose to put on the Web are entirely at the producers' discretion and are not consciously following a prescribed canon.

Publishers and the electronic canon

In fact, decisions about which texts should be included or excluded from electronic products are already being taken. As I have argued above, canon formation in the eighteenth century was partially driven by commercial pressures from booksellers, and I believe that the same is happening with the publication of electronic text today. Both Goulding, (1984), and Bonnell (1997) argue that the selections made by the producers of popular poetry anthologies affected the way the canon was formed in both nineteenth century America and eighteenth century England. Goulding's fascinating article immediately suggests parallels with the process of making editorial decisions about what to include in CD-ROMs of English literature produced by Chadwyck-Healey. As an employee of the company I recall that a proposed CD on Romanticism was never produced because the editorial board could not come to a decision about what would be included, and about the balance between canonical and lesser known works, when academics often wanted the latter and publishers the more profitable former.

Academics and the electronic canon

As in the eighteenth century, academics are already making decisions about what should appear in electronic form. This is not surprising since what is considered canonical is often driven, especially in North America, by what academics decide that their students ought to read (Lauter, 1991 Chapter 6, Guillory, 1993: Chapter 1). For example the University of Toronto Representative Poetry On-line project is, in effect an electronic anthology of literary texts on the Web. It is clear that decisions have had to be taken about inclusion and exclusion from the anthology, and not surprisingly, since they are intended as a teaching aid, most texts are what might be described as broadly canonical.

The impact of an electronic canon

The impact on users

If an electronic canon is indeed beginning to form, why should it matter? Furthermore, since very little work, with the exception of Heinzkill (1990) and Cyzyk (1993) has been done on the role of the information professional in this area, why should it matter to us? First, it matters because of the impact it must have on potential users of electronic text. To return to my experience in Oxford, I have already said that the students began to realise that they could not find all the texts they needed in electronic form. The result of this was that students became much less enthusiastic about the use of electronic resources. When faced with a situation in which electronic texts either could not be found, or not in the right editions, or without the necessary metadata it seemed that students lost confidence in them, and most crucially in the whole business of using electronic text analysis techniques in their work. Since these were mostly undergraduates who had neither the time or the training to create their own electronic editions of works they needed it was not surprising that they returned to using print sources, and slower, manual techniques of analysis.

This illustrates two points. First, it suggests that there is a need in the user community for electronic texts in good, reliable editions of less popular works whether from canonical or non-canonical authors, and this need is as yet not being provided for. Secondly, it illustrates a more complex point about user perception of resources. If a student looked for an e-text and found either no edition at all, or a bad or incomplete one, whether in the most unreliable sphere of the Internet, or the more reliable providers such as text archives or commercial CDs, they then seemed to wonder about whether anything of value could be found at all and thus lost faith in the electronic medium as a whole. This may at first seem perverse. We are used to finding a wide choice of books in libraries and academic book shops, but we do not necessarily question the quality of the library if we find it does not stock the book we want, or has the wrong edition. But that is because we have a basic trust that academic libraries have collections policies and are likely to stock useful material. At present it seems that very little work is being done in English literature that is based on the analysis of electronic text (Shreeves,1992, Corns, 1991). The reasons for this are complex, and I discuss them in detail elsewhere (Warwick, 2000). However, it appears that one of them is a lack of basic trust in the raw materials. This might explain why simple awareness raising amongst scholars, much of it by librarians, does not seem to have produced a consequent rise in usage, at least in so far as we can judge by the production of academic research. Users can be made aware of resources but cannot be made to use them if they are not appropriate for their needs.

The need for diversity

It has become abundantly clear that the quality of resources that are produced must be reliable (Shreeves, 1992). However, the provision of diversity is important if the use of electronic text in English literature is to become more than the preserve of a few committed enthusiasts. Humanities research and, in particular, that in various types of literature requires diversity of resources (Sweetland, 1992). Not only do literature scholars need a wide variety of texts by a large number of authors, but they also often need a variety of editions, or at least to be able to select the one that they regard as the best. Electronic projects such as the Canterbury Tales Project can be ideal ways to provide information about multiple editions and variants. However, too many literary texts found on the Internet are bereft of any information about editions at all, and even commercially produced texts may not be ideal, if they choose to use only out of copyright print editions as the base text, as Chadwyck-Healey does.

The question of diversity should cause us to worry about the reassertion of the canon, since a preponderance of canonical texts in electronic form is likely to have practical as well as theoretical repercussions. There has been a great deal of debate in the last two or three decades about what the canon should consist of. There is not the space here to go into the details of this, as the arguments could, and have, filled book-length studies. However, most critics agree that canons are by nature restrictive, and while it may be useful for those who construct educational syllabi to produce restrictive lists of works to be read, it is at odds with the job of those who seek to preserve texts and make them available for further use. Fowler (1982) distinguishes between the accessible canon, that of literature which can be found relatively easily by a reader, and the selective canon, of great works. However, as Kernan (1987) rightly argues, selectivity tends to affect accessibility, since the most highly valued texts tend to be the ones most often reproduced. We have seen that canons have been connected not simply with ideas but also with commercial production. From the eighteenth century onwards the relationship between the production of texts and canonicity has become circular. A measure of the canonical status of a book is in its circulation: if its ideas are widely circulated and admired those who want to be thought well-educated want to read it. Therefore, it must be produced in multiple copies, and, thus, is likely to be a commercial success, and more copies produced (Kernan, 1987: 158-163, Ross 1992). However, if a book is not part of the canon then it is much less likely to be produced, and so copies of it will be much less easy to find.

Falling out of the canon

As an example, I did my doctoral research on Richard Crashaw, a poet who is little known, but still, I believe, interesting. In the seventeenth century he was popular and later was thought of as an influence on Milton and Pope, but his popularity had waned by the eighteenth century and he was not considered sufficiently important to be included in the canons suggested by Dr Johnson and others. Therefore, few editions of his work were produced in the eighteenth century and nineteenth century and he was largely forgotten. I would not claim that he is the equal of writers like Milton, but I find him interesting and I am glad that in the early twentieth century someone decided to produce a modern print edition of his work, since I doubt I would have had the persistence, as a student, to ferret him out in a rare books room. At present the only electronic edition of his works is in Chadwyck-Healey's English Poetry database. It is a copy of a very late and unreliable edition and is not the one that any serious researcher or student of Crashaw's work would trust, since parts of the text are unreliable at best. It was probably chosen because it was out of copyright, but if presented to a literary researcher would probably serve to confirm all their worst fears about the poor quality of electronic text. I hope that there is a good electronic edition of his works for future readers to find since a future student may not want to go to the trouble of looking for editions of his work in an archaic form like print. If there is no electronic text they might assume that his works are of no interest or importance. Thus his position as marginal would be reinforced as a result of electronic de-facto canonicity.

This problem of canonicity seems to be becoming exaggerated in the case of electronic text. Perhaps because those who produce electronic text are not entirely sure of the value the new medium in scholarly terms, it seems even more likely that they will to produce a text which has been enshrined in the print canon. Even in the very earliest days of the use of computers in literary study Fogel (1962) stressed that it was important to produce as wide a variety of texts and editions of them for use by scholars internationally. This has clearly not yet happened. Authors whose works are less popular or have not been seen as canonical may exist in print but editions may be more difficult to find or less numerous. It is these authors who, at least at present, are not being produced in electronic form. This means that canonicity is being reinforced doubly, in print and in the electronic forms of delivery. Thus the breadth and diversity of the subject is seriously undermined.

Debates about the canon have argued for the inclusion of different works, which are less well known, often those by women or by non-white authors (Froula, 1984, Bergonzi 1990, Chapter 5). Canonical status also tends not to be accorded to contemporary writers because of the importance of the 'test of time' arguments. Because of the use of non-copyright texts it is already difficult to find texts of contemporary writers in electronic form. This poses problems for anyone who works in this very popular area of study. We are thus faced with a potential situation in which texts by women, modern writers and those of non-white origin may become difficult to find in elctronic form. This could mean a return to a much narrower canon in electronic text, and a threat to the diversity which is essential for much humanities research. It is surely not only literary theorists like Culler (1988) who see this as a retrograde step.

Future use and current planning

Perhaps, when applying for funding to produce expensive electronic editions, academics may feel that their application is more likely to succeed if the subject of the project is an already acclaimed canonical writer, whose work can thus be preserved for posterity, rather than a modern writer. The modern writer might even be better known at present, but the assumption is often made that modern fiction is by nature emphemeral, its popularity transitory, and therefore not worth preserving in an expensive form like electronic text. The danger of this type of assumption can be shown, once again, by referring to the history of print publication. In the seventeenth century it was assumed that plays were low-status ephemeral works, hardly worth publication. Ben Jonson challenged this assumption and had his complete works published and this proved a success. Had he not done this it is likely that no book-seller would have thought it worth collecting the complete works of Shakespeare. Thus, the epitome of the canonical work, the First Folio might not have been printed at all.

It is clear then, that the popularity and cultural status of writers changes and nobody is now in a position to know what will be studied or read in the future. While it is possible for a once popular work to fall out of a canon, it is very much harder for a neglected work to force its way in. One of the arguments that is often advanced in favour of books and against electronic text is that the portability of paper make it possible, for example, to read a book in the bath. Yet in the relatively early days of printing and book production writers would have been horrified at this suggestion. Not only would they have thought it irresponsible to run the risk of damaging a valuable book, but they were also convinced that bathing at all was seriously bad for the health, and furthermore they would not have been able to envisage a technology which would allow anyone to have regular baths inside a house. By which I mean that we have no real idea what uses the future may want to put electronic text to, and thus we need to ensure that it is produced in all possible diversity rather than restricted by notions of the canon as we perceive it at present.

This may of course be a transitory problem: since electronic publishing of literary texts is a new medium, it might be seen as inevitable that popular, canonical texts will be produced in the early stages. It may be that electronic projects like Brown Women Writers, which concentrates on the production of non-canonical and little known writers will multiply, as well as projects run by teams of scholars who will produce their own scholarly editions of writers in whom they themselves are interested, such as the excellent Rosetti Archive. Therefore, the problem may be solved gradually. On the other hand it may be that large anthology-like collections like those produced by Chadwyck-Healey may become dominant. Bibliographers, of course, have no influence over the contents of these collections (Sweetland, 1997: 793). This, again, is likely to prove problematic for the user who needs multiple editions or rare texts. It is analogous to an acquisition manager trusting the publisher to send them a few hundred books in a particular area, which have been chosen on their behalf; a situation which few academic libraries would accept happily. As Kopp, 1997 argues, in virtual, as in print collections intellectual decisions need to be taken about what should be included, rather than simply accepting what a publisher chooses to produce. The worst-case scenario is that commercial pressures, which have already caused several commercial publishers to wonder if there is much profit to be made in publishing electronic texts in the humanities, will lead to a conservative commissioning strategy of only producing popular texts. This means that it will be easy for users to find electronic text of popular authors but it will be just as difficult to find rarer texts. At present it is impossible to tell what will happen, if things are left entirely to chance.

The role of the information profession

This is surely an area in which the information profession needs to be proactive. In the early days of electronic publication it was clear that there was some anxiety about the role of acquisitions and collections management (Nissley, 1993, Gaunt, 1990). However, far from being redundant, bibliographers and collection managers have an even more important role to play in the electronic world at least as regards literary texts (Kemp, 1997). As yet there seems to be very little idea of collections policy as applied to the creation of electronic literary text. It may be that the way we can ensure access to the greatest diversity of text in literary studies is through the publicly-funded work of scholars which is then deposited in a text archive. This would not have to be driven by the constraints of a commercial publisher to produce profitable work. It is heartening to see that, in the U.K., the Arts and Humanities Research Board (AHRB) has encouraged academic projects to bid for money to create electronic resources for humanities scholars. Resources created by such projects will be archived by the appropriate service provider of the Arts and Humanities Data Service (AHDS), which, in the case of literary texts, is the Oxford Text Archive (OTA). However although the OTA has just developed a collections policy, its terms of reference are very general. It also does not have the resources to create its own texts. It is therefore left to scholars to decide what resources they want to create once funding councils are convinced that projects are academically valuable. In general literary scholars do not have experience of planning resources for others to use. Most of those who deposit texts with the OTA are thinking about the primary use that the electronic text that they create can be to them, and not necessarily the rest of the scholarly community.

At the same time, collection managers in libraries and humanities computing support staff find that they have relatively few electronic literary texts of a good standard to offer to their users. The information profession possesses the skills that seem to be lacking here, those of collection planning and management, and it is important that some kind of collaboration should happen between the two communities, since information professionals, and especially bibliographers are used to thinking of the needs of a wider community. Kohl (1997: 54) neatly sums up the task for the information profession: 'An access strategy which depends on materials being 'out there' requires someone to make sure that 'out there' is actually out there. This is what bibliographers must do.' He argues that collection development specialists will need to collaborate and will need to develop a more outward looking and proactive focus and work with a variety of outside agencies to ensure that resources are developed and access is negotiated to them, where this involves commercial publishers.

For example. I argue above that the rarer the text, with a few notable exceptions, the less likely it seems that it will appear in electronic form. From the point of view of anyone interested in broadening access to texts in whatever form this seems ridiculous. Surely the reverse should be true. Electronic publication is not just the means of providing fast access and searching for reference databases, it also makes it easier for users to gain access to rare and in some cases fragile texts. One electronic copy of the Beowulf manuscript can be made accessible via the Internet to as many users as necessary. But, of course, we come back to the same problem. What could be more canonical than Beowulf? In many ways, if we are to fight creeping canonicity it is more important that electronic editions of lesser known texts should be published electronically than those canonical authors whose works can readily be found in various formats. A collections manager will immediately see the point of this argument, but a commercial publisher, interested in making a profit, may take longer to convince.

Questions of collections policy

This will not of course be easy. Many questions remain about how collections policies might be formulated, and how individual institutions might collaborate with central bodies like the Joint Information Systems Committee (JISC) and the AHDS. If texts remain physically archived by a central body then ownership has clearly been sacrificed in the cause of wider access. Would this be acceptable to libraries? Although the OTA has begun to develop a collections policy, it is still a small body. At present it cannot create its own texts and does not have the financial resources to commission them from others. Given the increasing amount of electronic text being created, it also does not possess the resources to archive everything it is offered and thus at present has to take difficult decisions about what to reject (Popham et al, 1999). This will either mean that a text is not created, or that something which has been created may not be maintained. In the absence of legal deposit for electronic text, this will inevitably mean that the diversity of texts available to scholars will be threatened. Ideally this problem could be solved by expansion of the resources for archiving electronic text, but who should pay for this? Should it be funded by central government or by individual libraries funding their own archiving services, or should libraries contribute to a central service like the OTA? Will a collections policy be a success if it has to be developed gradually, and initially on a small scale? Will it only be used to inform funding decisions when proposals are submitted or might we look forward to a time when work is commissioned from academic teams or even private publishers? These are all matters of policy which must surely be discussed and on which the information profession is particularly well qualified to comment.

Further problems are raised by the creation of selection criteria for electronic texts. Where an archive has to decide between accepting another edition of a canonical work on the grounds that scholars need diverse editions, or an edition of a lesser known work which it does not possess, which one would be chosen? Ideally both would be accepted, but with increasing amounts of text being created this seems unlikely. Should the choice of which editions to digitise matter, is a bad edition better than nothing at all? Text archives also have to take into account the technical standards of texts, and the OTA produces an hierarchical scale of formats that it accepts, from plain text marked up in TEI lite at best, to PDF at worst. However, if the technical requirements are pitched too high then potential editors may be deterred, but if too low then the quality of the work may not be high enough to make it usable. At present, discussions with some users indicate that many of them would welcome any text in any format now rather than having to wait for a better edition later. But this would obviously be at odds with the need to produce high quality editions for future use. The problem remains, however, that the lack of any sort of edition now may seriously limit the ability of scholars who work with electronic editions and may therefore limit the need for them in future. Clearly these decisions have to be made, but they are obviously not easy ones. It seems clear, however, that information professionals already have the collection management skills and experience necessary to make a vital contribution at all stages of the process.

Further research

In order for a contribution to be made, however further research is needed on several different topics. A preliminary study has already been done on user needs in the humanities (Greenstein and Porter , 1997) and work is in its early stages at Sheffield to investigate the way in which literature researchers in both English and Modern Languages make use of digital resources. However, I believe that further research needs to be carried out into the area of de facto canon formation, and whether this will be affected by the electronic collections policies being developed by bodies such as the OTA.

It is desirable to investigate the availability of English literary texts more thoroughly, and produce some quantitative analysis of literary texts available and the proportion of them which are popular works by canonical writers. However, this may prove difficult since the creation of texts is on-going, and, especially in the case of those available on the Internet, it may be difficult to locate them all. It is also important to find out what those working in the field of English literature think about the availability of electronic texts. What are the real needs of those who do research and teach English literature, and are these the same as those of their students? User-needs surely become much more important in the world of electronic collections than ever before (Friend, 1993). Those who work in humanities computing support and in libraries have an empirical idea of what their users report in terms of shortcomings but organized investigation is necessary.

There is no modern day Dr Johnson making pronouncements about what ought to appear in an electronic version of the canon. And in some ways this makes the job of information professionals more challenging. Even so, the canon is not about to disappear, nor is the consequent threat to diversity of provision of English literary texts. As information professionals we need to know more about what is happening now in order to ensure that electronic text in as wide a variety as possible are provided for present and future users.


How to cite this paper:

Warwick, Claire (2000)  "The lowest canonical denominator: Electronic literary texts, and the role of the information professional"  Information Research, 5(2) Available at:

the author, 2000. Last updated: 5th January 2000