Metadata functional requirements for genomic data practice and curation

Authors

  • Hong Huang University of South Florida
  • Jian Qin

DOI:

https://doi.org/10.47989/ir292363

Keywords:

Metadata Goals, Metadata Schema, Genome Curation, Data Practice, Metadata Infrastructure

Abstract

Introduction. The rapid accumulation of genomic data and their widespread reuse in clinical and scientific practice demands more effective description and organization of genomic information, which poses new challenges in developing metadata schemas. Based on a previously developed taxonomy of metadata requirements, this paper reports the results from a survey that addresses these new challenges.

Method. The survey gathered empirical data from 156 genomics scientists to identify context-sensitive metadata functional models for genome curation. The study further investigated the metadata elements from four well-known genomic metadata schemes against the functional requirements for genome metadata.

Analysis. The survey data were employed the statistical package STATA, to produce descriptive statistics, factor analysis, Fisher’s exact test, and related reports.

Results. Analysis of the empirical results revealed that genomics scientists recognize specific sets of criteria for metadata needs in the genome-curation context. Twenty one metadata requirements were reduced to five factor constructs. The ranking of these constructs in decreasing order is: portability, reusability, manipulability, sufficiency, interoperability, extensibility, and modularity. The Fisher exact test results revealed that the genomic community required rich context and technical related metadata elements to facilitate data exchanges and experimental operations in genome curation.

Conclusion. The findings indicated that genomics scientists developed metadata to meet the needs in genome curation activities related to data wrangling, integrations across platforms and databases, and data reuse. Architectural layout as flat file needs extra administrative metadata to support data sharing and documentation. The resulting metadata requirement model can serve as valuable resources to genome scientists, data curators and administrators for designing metadata schema and developing data-curation policies.

Downloads

Published

2024-06-18

How to Cite

Huang, H., & Qin, J. (2024). Metadata functional requirements for genomic data practice and curation. Information Research an International Electronic Journal, 29(2), 3–29. https://doi.org/10.47989/ir292363

Issue

Section

Peer-reviewed papers