Metadata functional requirements for genomic data practice and curation
DOI:
https://doi.org/10.47989/ir292363Keywords:
Metadata Goals, Metadata Schema, Genome Curation, Data Practice, Metadata InfrastructureAbstract
Introduction. The rapid accumulation of genomic data and their widespread reuse in clinical and scientific practice demands more effective description and organization of genomic information, which poses new challenges in developing metadata schemas. Based on a previously developed taxonomy of metadata requirements, this paper reports the results from a survey that addresses these new challenges.
Method. The survey gathered empirical data from 156 genomics scientists to identify context-sensitive metadata functional models for genome curation. The study further investigated the metadata elements from four well-known genomic metadata schemes against the functional requirements for genome metadata.
Analysis. The survey data were employed the statistical package STATA, to produce descriptive statistics, factor analysis, Fisher’s exact test, and related reports.
Results. Analysis of the empirical results revealed that genomics scientists recognize specific sets of criteria for metadata needs in the genome-curation context. Twenty one metadata requirements were reduced to five factor constructs. The ranking of these constructs in decreasing order is: portability, reusability, manipulability, sufficiency, interoperability, extensibility, and modularity. The Fisher exact test results revealed that the genomic community required rich context and technical related metadata elements to facilitate data exchanges and experimental operations in genome curation.
Conclusion. The findings indicated that genomics scientists developed metadata to meet the needs in genome curation activities related to data wrangling, integrations across platforms and databases, and data reuse. Architectural layout as flat file needs extra administrative metadata to support data sharing and documentation. The resulting metadata requirement model can serve as valuable resources to genome scientists, data curators and administrators for designing metadata schema and developing data-curation policies.
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Hong Huang, Jian Qin
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
https://creativecommons.org/licenses/by-nc-nd/3.0/