header
published quarterly by the university of borås, sweden

vol. 27 no. Special issue, October, 2022



Proceedings of the 11th International Conference on Conceptions of Library and Information Science, Oslo Metropolitan University, May 29 - June 1, 2022


Interrogating paradata


Olle Sköld, Lisa Börjesson, and Isto Huvila


Introduction. The concept of paradata has received increasing attention in recent information scholarship. The literature however shows considerable disagreement about what paradata is and how it can be purposive. The aim of this paper is to facilitate the use of the resources and analytical impetuses offered by the concept of paradata by providing an explanation to what paradata is in operational terms.
Method. The examination of paradata is based on a scoping review of 53 scholarly texts that varyingly engage with the concept.
Analysis. The reviewed texts were analysed using a concept-analysis framework focusing principally on discerning the main descriptive features of paradata and prevalent instances of use and usefulness.
Results. The paper shows what the key elements, relationships, use-cases and uncertainties of paradata are.
Conclusions. Findings support a media-archaeological view of born-digital artifacts, and bibliographical archaeology is shown to provide a programmatic approach in identifying significant archaeological characteristics among artifacts that have yet to be exhaustively studied.nificant archaeological characteristics among artifacts that have yet to be exhaustively studied.

DOI: https://doi.org/10.47989/colis2206


Introduction

A key driver of information research is the interest to understand and describe information phenomena. Recent information-research scholarship (Dahlström and Hansson, 2019; Huvila, et al., 2021a; Huvila, et al., 2021b; Huvila and Sköld, 2021) shows an increasing interest in paradata, a term that signifies information about the means (procedures, tools, activities) by which a certain body of information came into being (Sköld, et al., forthcoming).

There are several indications that paradata might prove to be a useful addition to the core conceptual cadre of information research. One such indication is the slightly differing emphasis, meaning and disciplinary connotations of paradata compared to closely related and impactful notions like metadata and provenance (Börjesson, et al., 2020). Another sign pointing to the conceptual and analytical offerings of paradata is how the notion is put to use in non-information research settings (see e.g., Berg and Goorman, 1999; Baker and Yarmey, 2009; Wallis, et al., 2013). The potential usefulness of paradata in information research notwithstanding, the literature shows considerable disagreement about what paradata is and how it can be purposive (Gant and Reilly, 2018; Huggett, 2020; also cf. Bentkowska-Kafel, et al., 2012; Huvila, 2012). Notable uncertainties exist regarding the principal characteristics of paradata and its supposed applications, users and makers. A key first step in facilitating the use of the resources and analytical impetuses offered by the current state-of-the-art of paradata inquiry in information research and associated scholarly and professional fields is to examine and attempt to clarify the concept by drawing on the many instances of use and deliberation of paradata. As of now, this has not yet been accomplished.

The purpose of this paper is to provide an explanation to what paradata is in operational terms by conducting a concept analysis of paradata as it emerges in multi-field research use. Through this examination the paper also seeks to facilitate further exploration of the range of possible applications, practical uses and theoretical offerings of paradata in information research. The empirical basis of the paper is a scoping review (Munn, et al., 2018; Pham, et al., 2014) of 53 scholarly texts (journal articles, articles from conference proceedings, book chapters), drawn from two domains—archaeology and cultural-heritage studies—where, in contrast to information research, paradata has been discussed already for some time. Moreover, in these specific domains, the earlier discussion on the concept resonates closely with how paradata could be meaningfully framed in information research. In this way, a survey of literature in archaeology and cultural-heritage studies provides a good starting point for exploring the potential relevance of paradata in information research. The reviewed texts were facilitating the use of the resources and analytical impetuses offered by the current state- of-the-art of paradata (Baldwin and Rose, 2009; Nuopponen, 2010b). The relevance of inquiring into the notion of paradata is theoretically underpinned by Moore's (2004) rendering of concepts in research practice as the perpetuators of productive ambiguity between global (abstractions, theory) and local (specific manifestations, connections, experiences) realizations, all impacted by certain pre-theoretical commitments. As such, concepts play a key role in the meaning-making efforts of the scholarly narrative and are non-monolithic; they can have varying meanings even within a tightly delimited cluster of scientific outlets (Abbott, 1988; Maines, 1993) but are nevertheless knowable and definable phenomena (Rodgers and Knafl, 1993; Nuopponen, 2010a) if investigated in situ.

Descriptions of information processes

Process information is described and termed differently between and across disciplines and research interests. This image is also evident in information research where conceptual heterogeneity is a well-documented state of affairs owing to historical, epistemological and organizational factors (Nolin and Åström, 2010) like the not seldom close relationship between information research and archival studies (Figuerola, et al., 2017; Tuomaala, et al., 2014). Paradata aside, provenance is a common concept with connections to the aforementioned archival sphere that denotes process information. These terms share a main characteristic with the, in information research and otherwise, well-established notion of metadata and other information-about-information terms insofar as they express the features of and connections between complex informational phenomena and more abstracted descriptions thereof (see e.g., Pomerantz, 2015). The information-process concepts are distinguished by describing the creation and curatorial histories of an information phenomena (Börjesson, et al., 2020) that in many cases, but not necessarily so, emanate from scholarly inquiries or exist within the auspices of a GLAM institution or other heritage contexts (Fear and Donaldson, 2012; Huvila, et al., 2021a; Kreuter and Casas-Cordero, 2010; Sweeney, 2008).

Huggett (2012) employs the term provenance to refer to information that can mitigate the custodial and re-use challenges posed by the contextual boundedness of research data and segments it into agent- (the creators and manipulators of data), object-, (data and data- manipulation origins) and process-centred (the elements and actions of data generation) classes of provenances. Provenance is a concept that additionally has a strong and long- standing position in archives and records management (Bearman and Lytle, 1985), and that more recently has seen use in data-creation path modelling (Curcin, 2017; De Oliveira, et al., 2015). Provenance in the archival sense does not have an undisputed definition or origin (Ridener, 2009), but is used to describe the individual or organizational actor whose activities the record in question is created as a part of. Besides creation, provenance signifies the procedures of record keeping and so ties into the notion of original order—an in archival theory and practice influential idea that much record-keeping value stems from maintaining the structure in which records were put by their creators and curators (MacNeil, 2008). Provenance is useful for appraisal and the identification of archival value (Tschan, 2002), and for arranging, acquiring and retrieving records (Sweeney, 2008).

The concept of paradata emerged in survey research as 'auxiliary data describing the [survey] process' (Couper, 2000, p. 393). Survey paradata can be digital audit trails, interview lengths and survey-management information and encompasses both automatically captured trace data from survey-conduct and response activities and other forms of manually generated data (Kreuter and Casas-Cordero, 2010; West, 2011). The usefulness of paradata in survey research is attributed to the possibilities it affords for evaluating and developing computer- assisted survey approaches and for interpreting present and historic survey results (Couper and Kreuter, 2013; Nicolaas, 2011). In this, the concept of paradata echoes one of the main offerings of provenance: descriptions of information processes partially solve the elementary problem of information provision in scholarly and GLAM-connected reuse scenarios by bridging to some extent the epistemic and sociotechnical horizons of the data creators and manipulators with that of the data reusers (Borgman, 2012; Kim and Yoon, 2017). Paradata has also seen use in archaeology and cultural-heritage studies especially in relation to matters of research-data reuse and heritage-data management. It should be noted that there exists, in addition to those that have been discussed here, other concepts like provenience (Buchanan, 2016), pipelines (Mudge, 2012) and workflows (Davidson and Freire, 2008) that explain information about information processes with slightly varying emphases, connotations and disciplinary affiliations (see also Huvila, et al., 2021a; Reilly, et al., 2021).

Study approach

This paper is based on an analysis of 53 scholarly texts from the fields of archaeology and cultural heritage research. Scoping-review methodology inspired the paper's data collection procedures and the writings on concept analysis supplemented the qualitative analysis of the resulting corpus.

Collecting the corpus

The scoping review is a fairly recent approach to synthesizing evidence in the literature that was developed from systematic-review methods (Munn, et al., 2018). The scoping review maintains the importance of transparent and structured literature-search paths and corpus selection (DiCenso, et al., 2010), but diverges from the systematic review by being adapted to first-step explorations (Munn, et al., 2018) of unchartered and heterogeneous topics (Pham, et al., 2014) where the literature that has widespread characteristics and qualities (Peters, et al., 2015). A more specific use of the scoping review is to clarify how concepts are used in research praxis (Munn, et al., 2018), which makes it suitable for the purposes of this study.

Data collection proceeded by adapting the scoping-review procedures suggested by Arksey, et al. (2002; as described in Tricco, et al., 2016 and Pham, et al., 2014).

Literature searches were conducted to find studies that could be evaluated for inclusion in the corpus using Web of Science, Scopus, Google Scholar and the OPAC of the Uppsala University Library—search engines that together offer good coverage of the archaeology- and cultural- heritage studies literature. The searches were conducted in November-December 2019 with the addition of a complementary search round taking place in December 2021. Eligibility criteria for inclusion in the corpus included language requirements (English), academic requirements (peer review), a non-trivial engagement with the concept of paradata and the removal of duplicates. The study selection process resulted in a corpus of 53 English- language scholarly texts including book chapters (6), papers from conference proceedings (13) and journal papers (34) published between 2008–2021. The texts included in the corpus can be found in the reference list of this paper, highlighted with an asterisk.

Analysing the corpus


Table 1: Analytical scheme for the conceptual analysis of the concept of paradata and its use in the studied corpus.
Main focus Subfocus Subfocus description
The descriptive features of paradata Prevalent descriptions What paradata is described to be
Key instances The empirical referents of paradata
Creators, users, means and methods The supposed creators and users of paradata; the means and methods of paradata creation
The use(s) of paradata Purposes The purposes of paradata and paradata use
Paradata-solvable problems Problems foregrounded where paradata is a solution
Use-cases Examples of paradata-use activities

The analysis of the literature corpus was informed by concept analysis and proceeded from an analytical scheme (see Table 1). The analytical scheme was developed on the basis of common points of focus in the literatures of concept analysis (Nuopponen, 2010b; Nuopponen, 2010a; Nuopponen, 2011) and previous research on paradata and data descriptions (Faniel, et al., 2013; Huggett, 2012). The analytical focuses in Table 1 are expanded upon in Table 2, which summarizes the concept of paradata as it emerged in the studied corpus in terms of the elements of paradata, paradata use-praxis, key relationships of paradata and the principal uncertainties of paradata. The coding of the literature corpus drew on grounded theory (Charmaz, 1983) and the method of constant comparison (Corbin and Strauss, 1990).

Limitations

Several limitations should be noted. The limited number of scholarly texts analysed and the disciplinary delimitations to archaeology and cultural heritage studies makes the results of this study unable to support any wider-ranging conclusions. The selection of search engines and search parameters did impact the study's results. So did also the focus on English- language literature and the study's analytical approach (Table 1) including its discursive rather than descriptive interests. Limitations notwithstanding, the usefulness of the study lies in it taking the first steps towards understanding the concept of paradata as it emerges in scholarly operations and what this means for useful appropriation of the notion in information research. In this way the present paper fulfils one of the main roles of the scoping review: to offer results that can be used to inform the design and direction of subsequent research.

Paradata in research-use

The descriptive features of paradata

Prevalent descriptions of paradata

The concept of paradata is described and exemplified across the studied corpus in ways that show important commonalities. The majority of authors assert, or at minimum firmly indicate, that the concept of paradata is an information phenomenon that—whether or not its creation and use is discussed in past, present, or future terms—explains occurrences that have taken place in time gone by. The past occurrences that make up the informational centre-point of paradata, what paradata is about, are differently termed. The most common renderings in the corpus are that paradata describes what is termed processes (Bentkowska-Kafel and Denard, 2012b; Denard, 2012; Demetrescu, 2015; Havemann, 2012; Havemann, et al., 2009; Gasparetto and Baratin, 2021; Huvila, 2021; ten Harkel and Fisher, 2021) or otherwise coined sequential and goal-oriented series of physical and mental actions and applications of technological, methodological, or other means (Carboni, et al., 2016; De Reu, et al., 2013; Reilly, et al., 2016; Saygi, et al., 2018; von Schwerin, et al., 2016). Given the nature of the corpus, these processes and event chains are always tied to research ventures and are, also, thoroughly productive: paradata here arise as an information phenomenon that describes processes that put into existence some scholarly product, e.g., scientific publications (Cook, 2019), 3D heritage models (Giles, et al., 2012), or museum artefacts (Havemann, 2012).

The analysis shows that similarly there is much agreement among the authors about what foci paradata must encompass in order to adequately describe productive scholarly processes.

Carboni, et al. (2016, p. 57) express a prevailing position by writing that ideal paradata can convey 'reliable knowledge of the essential parameters that have gone into the creation and delivery of [a scholarly product]'. Key essential parameters of productive scholarly processes in the studied corpus are the cornerstones of intellectual work (the collection, processing and analysis of data; interpretative, creative and evaluative activities; Beacham, 2011; Bentkowska-Kafel, 2013; De Reu, et al., 2013; Demetrescu, 2015; Denard, 2012; Tamborrino and Wendrich, 2017; von Schwerin, et al., 2016) and the network of resources drawn upon in this work. Examples of such resources are domain and project-related knowledge, specifications of technical means employed, datasets, the finished scholarly product itself and other important information sources (Arnold, 2008; Beacham, 2011; Carboni, et al., 2016; De Reu, et al., 2013; Denard, 2012; Tamborrino and Wendrich, 2017).

Prevalent instances of paradata

Beyond what paradata is about, the corpus also shows different roads taken to describe in detail what the information phenomena of paradata is. Aside from the implicit assumption that paradata is indeed data, paradata is varyingly characterized as documentation (Bentkowska-Kafel and Denard, 2012b; Denard, 2012), records (Westwood, 2016) and information (Beacham, 2011; Tamborrino and Wendrich, 2017). While it is difficult to identify what the significance of such terminological choices are in relation to paradata, it can be inferred that they can be tied to deliberations of whether useful paradata can be structured or unstructured, accidentally or deliberately recorded, embedded or stand-alone. The studied corpus holds no consensus here, and paradata is described as existing alongside all axes together (Beacham, 2011; Huvila, et al., 2021a; Lo Turco, et al., 2019; Reilly, et al., 2021; Westwood, 2016). This ambiguity is further illustrated by the plurality of prominent instances of paradata, the empirical referents of the concept, found in the studied texts. The empirical referents are of many types and characteristics but have certain similarities: they are multimodal (texts, images, maps, computer models; Cook, 2019; Carnall, 2016; Tamborrino and Wendrich, 2017; see Havemann, 2012 for many examples) and can be both rigorously structured (Abate, et al., 2017) and free-form (Arnold, 2008) but are always intimately tied to the type(s) of data processed in pursuit of the research objective and the domain of inquiry in question. The strong affiliation between paradata and the research framework from which it emerges is also described as a value-adding relationship. All authors in the corpus agree that paradata increases and sustains over time and multiple (re)use-scenarios the usefulness of the scholarly product whose inception and creation the paradata describes. The scholarly product likewise is a non-interchangeable resource important resource for making sense of the paradata set, and it is repeatedly suggested that they should be stored, disseminated and interlinked in a manner that maximizes their reciprocity (Beacham, 2011; Giles, et al., 2012; Reilly, et al., 2016).

Paradata creators, users and the means and methods of paradata creation

There is little variation across the studied corpus regarding who the creators and users of paradata are described to be. The supposed creators of paradata are human actors connected to the scholarly product whose inception the paradata is describing, either in researcher (Carboni, et al., 2016; Iadanza, et al., 2019; McEwen Arnold and Lafreniere, 2018) or other professional capacities (Bogucka and Jahnke, 2018; Llamas, et al., 2017; Lo Turco, et al., 2019). Paradata users are discussed in a similar fashion: '[e]xperts and non-experts' (Di Giulio, et al., 2017, p. 251) including researchers (De Reu, et al., 2013; Giles, et al., 2012; Lercari, et al., 2018), professionals and other stakeholders in associated domains (Abate, et al., 2017; Carnall, 2016; Lo Turco, et al., 2019) and members of the public (Champion, 2016; Havemann, et al., 2009; Schofield, et al., 2018) are the main groups of paradata users mentioned.

The analysis however shows a greater diversity regarding the means and methods employed in paradata creation. The most abstract instances of paradata creation are described as a manual documentation of work processes that is followed by linking the paradata to the scholarly product it pertains to (Abate, et al., 2017; Abdelmonem, et al., 2017; Havemann, 2012; Iadanza, et al., 2019). The authors reflect on a series of documentary means to achieve such purposes. In group working environments, Lercari, et al. (2018) propose the keeping of a communal diary; Bentkowska-Kafel and Denard (2012) encourage writing lab notebooks. Carboni, et al. (2016) suggest the use of pipelines, workflow diagrams and metadata forms to create paradata in a project with multiple actors. Giles, et al. (2012) put forward a holistic paradata-creation approach that captures the intellectual processes of the research team by using means and methods adapted to the stages of research work and the various research materials in focus therein. The important step of linking paradata and scholarly products is in the corpus often enacted via markup technologies that allow specific paradata segments to be tied to relevant instances of the scholarly product (Havemann, et al., 2009; McEwen Arnold and Lafreniere, 2018; Tamborrino and Wendrich, 2017). Additional means and methods of paradata creation discussed are the tracking and recording functionalities afforded by the computer tools used to gather, curate, analyse and communicate research data and results (Bentkowska-Kafel and Denard, 2012a; Bentkowska-Kafel and Denard, 2012b). It is also pointed out that paradata is automatically created during the course of computer-assisted processing of research data (Westwood, 2016). The resulting traces and artefacts are frequently accessible using present-day means, but can become more or differently visible by the way of technological development which, in turn, can create more opportunities for re- framings and re-interpretation (Westwood, 2016).

The use(s) of paradata

The analysis shows three interconnected main modes of paradata use(s) that encompass with notable congruence the authors' expressions of the purposes for turning to paradata, paradata- solvable problems and paradata use-cases (see Table 1). All modes of paradata use are rooted in the umbrella notion of paradata as something that can be used to attain transparency (e.g., Bentkowska-Kafel and Denard, 2012b; Champion, 2016; Champion and Rahaman, 2019; Reilly, et al., 2016; Saygi, et al., 2018; Schreibman and Papadopoulos, 2019). Transparency as it emerges in the studied corpus signifies the conditions that allow consumers of scholarly products to become cognizant of their production processes, that is, means used, methods employed, assumptions and prioritizations made and—for the producers of scholarly products—sufficient resources and know-how to convey the cornerstones of their intellectual work and the network of resources employed in it using documentation and description (cf. above).

Paradata use can further research robustness

The first mode of paradata use centres on research robustness. Within this mode it is argued that the transparency afforded by paradata-use offers opportunities for data producers to better make visible the methodological rigour and reliability of research studies (Abate, et al., 2017; Abdelmonem, et al., 2017; Beacham, 2011) including the settings of relevant hardware and software tools (Borrero and Stroth, 2020; Bozorgi and Lischer-Katz, 2020; Štular, et al., 2021). Beyond describing the key components of scholarly production and the resources drawn upon in that, paradata is also used to communicate underpinning hypotheses, interpretations and their degrees of uncertainty (Denard, 2012; Giles, et al., 2012; Parisi, et al., 2019; Prizeman, et al., 2020; Reilly, et al., 2016; von Schwerin, et al., 2016). Demetrescu (2015), Arnold (2008) and others argue that the use of paradata is not only descriptive but also to some degree prescriptive; the steps involved in documenting paradata in a research project can work as a device for articulating and reflecting on what assumptions, disciplinary idiosyncrasies and special-case interests and factual information that research design and carry-through are based on. Such use of paradata is implied to be a countermeasure to the 'intellectual opaqueness' (Reilly, et al., 2016, p. 39) of scholarly products by showcasing them as the result of work practices that are both deductive and inductive, rule-bound and contextually adapted (Abdelmonem, et al., 2017; Beacham, 2011; Schofield, et al., 2018; von Schwerin, et al., 2016).

Paradata use can afford data evaluation and assessment

The second mode of paradata use is closely connected to research robustness and focuses on data evaluation and assessment. Here it is implied that paradata is an affordance for users to query and appraise research data they are interested in (Carboni, et al., 2016; Champion and Rahaman, 2019; Fisher, et al., 2021; Rahaman, 2018) via data reliability (Demetrescu, 2015; Kansa, et al., 2020) and authenticity checks (Havemann, 2012). Other examples are comparisons of research designs and data analyses across multiple studies (Giles, et al., 2012) and reviews of research findings (Denard, 2012). This use of paradata in the corpus is also tied to research accountability. Paradata is described as the path towards data users being able to interrogate the bases that producers of scholarly texts, models, or other products use to found claims and considerations (Carnall, 2016; Denard, 2012; Richards-Rissetto and Landau, 2019). Another practical outcome often mentioned by the authors is how the use of paradata for the purposes of evaluation and assessment can improve the re-use potential and sustainability of scholarly products (Champion and Rahaman, 2019; De Reu, et al., 2013; Lercari, et al., 2018). Referencing 3D documentation of cultural-heritage information, Carboni, et al. (2016, p. 57) proposes that sufficient paradata could make such scholarly products be 'on par with traditional bibliographic resources' in terms of their reusability. Westwood (2016) adds to this line of reasoning and highlights that paradata-afforded reuse potential comprises both human and computer-driven re-readings.

Paradata use can facilitate cross-boundary communication

The third mode of paradata use revealed by the analysis has to do with paradata as a vehicle for communication. This rendering of paradata-use emanates from the idea that paradata can facilitate reciprocal and productive relations between the users and producers of scholarly products (Bogucka and Jahnke, 2018; Carnall, 2016; Cook, 2019). The communicative challenge that paradata can help overcome grows out of the specialized nature of research and its potentially wider use and reuse-audiences (Arnold, 2008). Demetrescu (2015) suggests that (standardized or structured) paradata can help transcend the diverging motivations, interests and disciplinary specificities that distinguish and separate user groups. Two educational benefits of using paradata to this end are discussed in the corpus. Paradata makes more visible and informative—also to practitioners from other domains—the difficulties, solutions and related deliberations underpinning every scholarly product (Carnall, 2016; Cook, 2019). Also, research results and scholarly products with competently formed paradata more naturalistically appear as being manifestations of a particular point in the overarching structure of scholarly knowledge pursuits rather than something that springs from a final state of knowing (Bentkowska-Kafel and Denard, 2012b). In addition, paradata-driven and better- quality opportunities for communication across groups and communities of users and producers of scholarly products is seen to open up avenues of multivocal interpretative efforts (Lercari, et al., 2018; Tamborrino and Wendrich, 2017) and to be a factor supporting engagement from new actors (Cook, 2019).

Discussion

Making sense of paradata

The analysis offers several insights about paradata that can be used to outline a paradata proto-definition. Table 2 shows a summary of the main findings, building and expanding upon the analytical focuses presented in Table 1.


Table 2: The components and uncertainties of the concept of paradata as they emerged in the studied corpus.
Paradata main part Paradata subpart Subpart description
Elements of paradata Aboutness Paradata is information about productive processes
Subject matter Paradata is information about the network of actions, means, methods, deliberations, rationales and intellectual horizons that constitute productive processes
Modality Paradata is analogue, digital, multi-format and multi-temporal
Organization Paradata is structured and unstructured; embedded and stand-alone
Context Paradata is discussed in the contexts of research (data), GLAM institutions and heritage issues
Creators Paradata is created by human (researchers, professionals) and non-human (software, recording devices) actors
Conceptual family tree Paradata is connected to the terms information, documentation, records, descriptions, metadata, provenance, workflows, pipelines, contextual information
Empirical referents Paradata is found in texts, drawings, annotations, tracked-changes, audio and video recordings, code books, research diaries
Paradata-use praxis Attaining transparency Paradata use can further research robustness
Paradata use can afford data evaluation and assessment
Paradata use can facilitate cross-boundary communication
Key relationships of paradata With data Paradata and the data it describes are reciprocally value-adding e.g., regarding interpretability
With the productive process Paradata is only purposive if it sufficiently describes the productive process it is connected to
Paradata is a shadow-play rendering of the productive process (or parts thereof) it is connected to, shaped by the rationales and means of documentation
Principal uncertainties of paradata The definition of paradata; the benefits of paradata cf. other process-description concepts; what is paradata and what is metadata; how to determine the required types and amounts of paradata; considerations of paradata users; paradata use beyond research, GLAM and heritage contexts; what is paradata-facilitated transparency

As a whole, paradata is information about processes that operate a network of means, mental and physical actions and intellectual regimen in the creation or shaping of a(n information) product. Paradata was most often discussed teleologically in the corpus; documented with the dual intent to make the productive process interpretable across time and different contexts of (re)use, and to make visible the connections between the productive process and its end result. Paradata can also refer to activity traces or descriptions that are discreet and made without particular ends in mind. In terms of modalities, paradata can be of any informative matter and format and be created by human or non-human actors. The analysis also shows that the most frequently occurring settings of paradata discussion are research data and affiliated matters of curation, dissemination, description and heritage issues. This is a result of the study's sampling procedures. The concept of paradata has potential significance for a broad array of settings and productive efforts with different degrees of formalization. Seeing as information research exhibits an increasing interest in data (Bates, 2018; Papenmeier, et al., 2021; Yan, et al., 2020) and heritage (Booth, et al., 2021; Golub, et al., 2021; Koya and Chowdhury, 2020) however, this rather indicates the usefulness of research in archaeology and cultural-heritage studies also for further inquiries into and utilizations of paradata as an empirical referent or analytical device.

Paradata, metadata, provenance (data)

A key step when considering what paradata is and how it can be useful is to delve into how paradata affiliates with core concepts—and by association also research subfields—in information research. The findings confirm the close connection between paradata, metadata and provenance but does not offer any clear evidence regarding the connections between the concepts; they are often mentioned pairwise, metadata and paradata being the most often occurring, and without distinction (e.g., Iadanza, et al., 2019; Giovannini, 2020; Jeffrey, et al., 2020; Münster, et al., 2015; Münster, 2019; Rashid and Antlej, 2020; Tryfonos, et al., 2021). It is not clear if this signifies that the differences between the concepts in the minds of the authors are so obvious that they do not require explanation, or if this mode of reference is used to sweepingly refer to information that can enrich the scholarly product in the focus of the studies.

This study suggests that the notable divergences that mark the intersection of paradata, provenance and metadata principally stem from the concepts having distinct cognates—ties of varying strength and elasticity to patterns of established use and thought. Many divergences and differences can of course exist in definitional considerations, but in in-situ settings the concepts are partly empirically indistinguishable. Although paradata and provenance are, for example, more closely associated with descriptions of productive processes than metadata, there are few practical or theoretical constraints that would hinder using the term metadata to discuss information pertaining to the origins of an information resource (see provenance metadata, Huggett, 2012; provenance of paradata Huggett, 2020). It is similarly easy to find cases where the notion of paradata with great plausibility can be employed to designate what otherwise could be termed metadata or provenance (paradata metadata, Drachsler, et al., 2012). Metadata, then, is a powerful and often high-level term not seldom used in a context- agnostic manner to signify resource descriptors (Pomerantz, 2015). The cognates of paradata and provenance together make up a sphere of reference that is possibly narrower than that of metadata, but still large and without very clear distinctions of what sets the two concepts apart. Provenance has a history of use in the settings of archives and records-management (Bearman and Lytle, 1985), computer science (Curcin, 2017) and many other disciplines (Reilly, et al., 2021) while paradata is principally associated with survey research (Couper and Kreuter, 2013) and cultural-heritage studies (Bentkowska-Kafel, et al., 2012); both terms are employed in archaeological research (Gant and Reilly, 2018). Further, the cognates of paradata and provenance can be strikingly different when considering certain use-scenarios. Provenance is commonly applied in discussions of the origins and custody and curatorial chains of digital, digitized and physical objects and collections (Dahlström and Hansson, 2019; De Oliveira, et al., 2015). In historical archives, especially provenance data is comprised of often fairly abstract narratives of custody and origin written by archivists. A contrast can be found in the findings of this paper where paradata, perhaps owing to its ties to survey research, not seldom is comparatively more intimately associated to the specific processes as in the goal-oriented progression of discrete events and activities that constitute productive ventures in e.g., science and heritage work. An example of such paradata would be in survey-evaluation is computer-captured traces of user interaction like time-stamps, logs of revisions and choices made and user information. However, plenty of opposite examples can be found in the literature and in the results of the present study that render provenance data as recordings of discrete actions in computerized environments and (Drachsler, et al., 2012; Fear and Donaldson, 2012) and paradata as free-form descriptions of key segments of productive processes or, perhaps most commonly, as a combination of the two like in the prevalent topic of heritage and research data re-use.

Paradata in information research

This paper suggests three principal and complementary reasons why it might make analytical sense to continue exploring the concept of paradata in information research. The first reason is practical: although paradata from a holistic viewpoint appears to be difficult to empirically and theoretically distinguish from provenance, it should be noted that provenance in the archival setting in the majority of cases is used in a narrower sense that is informed and defined by archival theory and archives and records-keeping praxis. Since archival studies and the archives and record-keeping profession is a part of the information-research sphere (Bates, 2007), it makes sense to utilize the concept of paradata as a vehicle for discussing and exploring descriptions of productive processes and related issues beyond the scope of archival provenance. There is also a topical reason for why paradata can prove useful in information research: there is presently a growth of research that explores or otherwise shows an interest in information processes and process perspectives (Thomer, et al., 2018; Tognoli and Guimarães, 2020; Trace, 2011); the notion of paradata might be a useful addition to this current of work because of its emphasis on process documentation and its breadth in terms of what resources and aspects to include in process capture and provision (see Table 2).

The third reason is theoretical, multipart and rests on the implications of the present study. Table 2 and the literature analysed and referenced in this study shows that paradata is a concept with a fairly invariable core meaning—paradata is information about productive processes—complemented by a peripheral flexibility—paradata process information can draw on a wide array of resources and descriptions, from trace data to explications of intellectual horizons. These characteristics point to the potential of paradata to be a boundary object in information research, i.e., a useful notion in several subdisciplines and research avenues that also can facilitate cross-boundary communication. Connected this, paradata additionally offers theoretical contributions by highlighting certain aspects of process information. A main cognate of paradata is the multidisciplinary vein of research attempting to understand and solve the many problems that hinder purposeful re-use of scholarly data—a large research- related information challenge in the present day (Borgman, 2012; Faniel and Yakel, 2017). Paradata that allows the re-users to better grasp the data creator(s) horizon and means of work is considered to be one of the most promising solutions (Faniel, et al., 2016; Yoon, 2017). To delve into paradata as a means to facilitate transparency between data producers and data re- users raises many foundational questions pertaining to the nature of this transparency—is it, e.g., a membrane, a representation, or a prism that refracts and makes certain components of the original research process differently visible depending on their particular characteristics— and how it functions. Paradata similarly highlights that there is an ethics of conversation to explore in the mode of communication between the producers and re-users of data, where process information is a key communicative element. These ethics would be grounded in the present infrastructural, political and policymaking investments made into facilitating and promoting research data sharing in publicly-funded academic research on an international scale. Important points to consider here would be what values are built into the practices of creating and using paradata and to what extent and how these values support the intent to reduce the intellectual and sociotechnical distance (Baker and Yarmey, 2009) of data production and re-use, if the dialogue parties can be seen to engage reciprocally in a shared cooperative endeavour or if the communicative structure appears in some other way.

Conclusions

Paradata is a concept that recently has started to make appearances in information research (see e.g., Börjesson, et al., 2020; Dahlström and Hansson, 2019; Huvila, et al., 2021a). While paradata carries promise to be useful, the concept and the scope of its existing application has remained unexplored. The present paper addressed this knowledge gap by analysing the concept of paradata and how it was employed in a corpus of scholarly texts from archaeology and cultural-heritage studies—two domains where the already extended discussion of paradata is strongly linked to how the concept can be framed in information research and associated professional settings. As a scoping review, one of the objectives of this study is to be a stepping stone for further inquiries researching or using the concept of paradata. Many potentially fruitful inquiries have been suggested; see especially the paradata uncertainties outlined above and in Table 2. Continued conceptual development and insight will likely occur as paradata is applied in new ways and in areas of work outside the research-data, GLAM and heritage settings. That being said, forward-looking applications of paradata will also continue to be informed by subsequent considerations of paradata in practical and theoretical terms. Such works could choose another approach than that of the present study, where the analysis was driven by a synthetic reading of paradata literature. Options include investigations of the ruptures and liminalities of paradata in professional and scholarly use, seeking to identify when and where paradata become difficult to discern from other types of data to the effect of becoming less useful and difficult to operationalize.

Acknowledgements

This work has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme grant agreement No 818210 as a part of the project CApturing Paradata for documenTing data creation and Use for the REsearch of the future (CAPTURE).

About the authors

Dr. Olle Sköld is a senior lecturer at the Department of ALM and the director of Uppsala University's Master's Programme in Digital Humanities. His research is characterised by a broad interest in the ALM field, research data creation and use and digital humanities. He can be contacted at olle.skold@abm.uu.se.
Dr. Lisa Börjesson works as a researcher at the Department of ALM at Uppsala University in Sweden. Her research focuses on research information including research information management systems, data descriptions, data publishing and use. She can be contacted at lisa.borjesson@abm.uu.se.
Isto Huvila is professor in Information Studies at the Department of ALM, Uppsala University in Sweden. Huvila chaired the recently closed COST Action ARKWORK and is directing the ERC funded research project CAPTURE. His primary areas of research include information and knowledge management, information work, knowledge organisation, documentation and social and participatory information practices. He can be contacted at isto.huvila@abm.uu.se.

References


How to cite this paper

Sköld, O., Börjesson, L., & Huvila, I. (2022). Interrogating paradata. In Proceedings of CoLIS, the 11th International Conference on Conceptions of Library and Information Science, Oslo, Norway, May 29 - June 1, 2022. Information Research, 27(Special issue), paper colis2206. Retrieved from http://InformationR.net/ir/27-SpIssue/CoLIS2022/colis2206.html https://doi.org/10.47989/colis2206

Check for citations, using Google Scholar