Book review: Semantic modeling for data: Avoiding pitfalls and breaking dilemmas

Alexopoulos, Panos. Semantic modeling for data: Avoiding pitfalls and breaking dilemmas. O'Reilly Media, Inc., 2020. xviii, 306 p. ISBN: 9781492054276. $52.94.

This book is written by Panos Alexopoulos. He currently works as Head of Ontology at Textkernel BV, in Amsterdam, the Netherlands, leading a team of data professionals in developing a large cross-lingual knowledge graph in the human resources and recruitment domain. He holds a PhD in knowledge engineering and management from the National Technical University of Athens. He always tries to fill in the gap between academic and industry through research papers, scholarly presentations, and webinars. He shared his professional experience and deep knowledge through the book under review.

This book presents an overview of fundamental and pragmatic concepts and developments in the area of semantic data. The text is easy to read and informative. The book’s chapters are usefully divided into sections with a common theme, including deep content coherence. Each chapter has a brief introduction and summary to highlight key points. They help the reader to understand the content more effectively. The book pays a lot of attention to practical examples. The applicable examples help readers to absorb and understand presented theories and more complicated aspects of the book. The book is divided into three parts and fifteen chapters.

In Part I, the basic, fundamental concepts, terminologies, and main activities related to semantic data modeling are presented. This first part consists of five chapters.

Chapter 1 provides understanding about semantic data modelling as an outcome of collaboration between data science and artificial intelligence. Differences between semantic models and machine learning models are explained. The need for and the relevance of the developing and using semantic data models are presented. Moreover, the author provides examples of bad modelling which can be problematic and cause some dilemmas and pitfalls.

Chapter 2 deals with the most common semantic modeling elements such as entities, relations, classes and individuals, and attributes, focusing on languages like RDF, OWL, and schema like SKOS. In addition, it describes the terms, complex axioms, constraints, and rules, including various types of reasoning (abduction and deduction). Moreover, the chapter presents a high-level overview of common and standardized elements like lexicalization and instantiation in detail and with examples. Various kinds of relations, definitions, subsumption relations, focusing on meaning, mapping, semantic relatedness, and interlinking relations are discussed in data modelling frameworks. It highlights differences and similarities in a variety of semantic communities in order to underline their representation potential and requirements.

Semantic and linguistic phenomena are examined in chapter 3. Some of them, such as vagueness, ambiguity, uncertainty, and semantic change are argued in the light of development, accuracy, quality, usefulness, and application of a semantic model, focusing on examples and their varieties. The characteristics, impact, and roles in human language and thinking of these phenomena are discussed to identify suitable strategies and mechanism to tackle reasoning problems, low semantic accuracy, pitfalls, and dilemmas.

Chapter 4 concentrates on key dimensions and methods to evaluate the quality of semantic data models along with different approaches to measuring this quality. Generally, this quality is defined by semantic accuracy, completeness, consistency, conciseness, timeliness, relevancy, understandability, trustworthiness, availability, versatility, and performance. The reasons and causes of some bad models are explained to deepen the understanding of models’ quality problems and challenges.

Chapter 5 considers lessons, challenges, steps, and the process of developing a semantic model. It describes how to define a suitable strategy, principles, requirement, and goals for the model development step by step using key questions to prevent misunderstanding. Moreover, it deals with semantic usefulness, reusing, and evolution of the model. Further, semantic modelling patterns are characterized and mining tasks of semantic models and their effectiveness are explained. In addition, methods and techniques of mining such as hand-built patterns and rules, supervised machine learning methods, semi-supervised methods, distant supervision, and unsupervised methods are scrutinized in detail.

In part II, some methods and techniques are introduced to avoid common pitfalls in semantic data models.

Inaccurate descriptions produced by humans in developing a semantic model are explained in chapter 6. Misleading names, inaccurate definitions, omissions, ignoring vagueness, and not documenting biases and assumptions may cause mistakes. Each of these problems is discussed and exemplified professionally.

Semantic mistakes occurring through semantic modelling language and frameworks are covered in chapter 7. Common mistakes, including bad identity (bad synonymy, bad mapping and interlinking), bad subclasses (rigid classes, instantiation and parts as subclasses), bad axioms and rules (vague relations and hierarchical relations as transitive) are explained. Moreover, some situations, practices, and erroneous inferences are exemplified to inform modelers how to avoid these failures and flaws in modelling semantically. Some suggestions and guidelines are offered as well.

Chapter 8 highlights several pitfalls, under some circumstances, related to bad specification and deficient knowledge acquisition. Some practices and examples are covered to clarify how to get the right specification. The context, features and characteristics (core entity types, competency questions) in the semantic model specification process are presented. Moreover, assessing feasibility and the choice of wrong knowledge acquisition sources (data and people), tools and methods are scrutinized. Finally, a story of a specification and knowledge acquisition is told, depicted, and explained.

Bad quality management is discussed in chapter 9. It presents an overview of some practices and quality-related dimensions, metrics, and the ways of their interpretations. The chapter discusses how not to treat quality as a set of trade-offs. Moreover, bad semantic quality metrics, including misleading interpretations, little comparative value, vague assertions, quality signals, and arbitrary value thresholds are exemplified and explained.

Chapter 10 looks at mistakes and issues of the usage of semantic models in applications such as bad entity resolution and semantic relatedness calculation. Some tricky scenarios like various types of ambiguity, metrics of semantic model richness, and evidential adequacy are presented. Some stories about these topics are told to highlight how to improve disambiguation capability, and optimize the model for various contexts, actions, domains, and goals.

Chapter 11 reviews two main strategy-related pitfalls, namely bad strategy and organization challenge successful design and development of semantic model initiatives. It also clarifies what a semantic model strategy is essentially about. Additionally, it discusses how to craft an effective strategy avoiding myths and half-truths, underestimated complexity and costs, and misunderstood context. Moreover, key and complementary skills and right attitude of a data semantic team are described in detail.

Part III focuses on some frequent dilemmas of different types related to meaning and languages and their influence on designing or developing a semantic model.

Representation dilemmas are highlighted in chapter 12. The fuzzification is suggested as a solution to represent vague elements in a semantic model. Common dilemmas related to class or individual, subclasses, attribute or relation are exemplified, depicted, and explained considering related key questions to reduce undesired effects. Moreover, fuzzification options, fuzzy membership functions, truth degree, and fuzzy model quality are presented by providing detailed examples. Storytelling and cases are used to describe the representation and application of fuzzy models.

Expressiveness and content dilemmas are examined in chapter 13. It concentrates on frequent dilemmas in and out of a semantic model in order to clarify how to accomplish right balance of expressivity efficiently and effectively. Therefore, lexicalization of entities and representing multiple truths are reviewed. Additionally, the chapter deals with the granularity, generality and specificity of entities, negative or positive assertions, truth contextualization, and interlinking approach. There are some guidelines, techniques, and major questions related to the choice of rich semantic model proposed in this chapter.

Chapter 14 is devoted to dilemmas of two key tasks, namely evolution and governance in the life cycle of a semantic model. It describes crafting a suitable evolution strategy to overcome strategy-related challenges and issues. Therefore, the chapter scrutinizes how to remove some statements, how to release a new version focusing on some criteria, how to plan a stakeholders’ feedback mechanism, how to define and measure the model's semantic draft. Moreover, the necessity of a model governance system, including, principles, processes, to reassure the effectiveness of the strategy is discussed.

Predicting the future is proposed in chapter 15. The author reflects on the questions of how to be optimistic and naïve, how to avoid tunnel vision, how to avoid distracting debates in planning and tackling a semantic model initiative and project. The key lessons of the book are summarized in this chapter. In the end, a combination of induction from machine learning and deduction from semantic model is reviewed to make well-designed and bias-aware semantic models to narrow and bridge the semantic gap, to intensify reuse and consistency in the future.

The rest of the book includes a detailed index and comprehensive resources for deeper study. There is also a glossary of terms to help users in understanding rich context of the book.

It is worth highlighting that one of the advantages of the book is an insightful critical view provided by the author on fundamental problems and basic challenges in the uses of theories of semantic descriptions, various semantic relationships, and semantic schemas in designing or developing a semantic model. The broad text is very relevant and well presented. The main focus of the book is directed to the context of semantic meaning and modelling critically and practically. It covers a range from basic topics to applied discussions.

It is a useful source for anyone interested in trends and issues of designing or developing a semantic model, especially for information scientists, knowledge engineers, knowledge organizers, and information architects. The book is useful for those who are interested in seeking an informative resource to practical content in the field of knowledge graphs, and semantic data mining. It is also highly recommended to researchers, semantic developers, ontologists, taxonomists, and semantic data modelers. The exhaustive content will be fruitful and informative for data engineers and data scientists who are interested in making combination between semantic models, application profiles, and machine learning models to tackle biases and semantic gap. Moreover, it is also chiefly fruitful for students new to the topic or just seeking a better understanding of practical concepts in the field of linked data and semantic modelling through industrial context and applications. It is highly recommended to be used as a tutorial for related courses in the academic context to bridge the gap between academic discussions and industrial applications, to help understanding of how and why some academic papers and textbooks are misapplied or do not fit in practice.

In a nutshell, the book helps professionals interested in semantics to empower their semantic thinking practically, pragmatically, critically, and correctly. Additionally, it supports enhancing organizational and strategic aspects in knowledge-based and ontological-based projects.

Elaheh Hosseini

Department of Information Science, Alzahra University, Tehran, Iran
December, 2020

How to cite this review

Hosseini, E. (2020). Review of: Alexopoulos, Panos. Semantic modeling for data: Avoiding pitfalls and breaking dilemmas. O'Reilly Media, Inc., 2020. Information Research, 25(4), review no. R705 [Retrieved from http://www.informationr.net/ir/reviews/revs705.html]

Information Research is published four times a year by the University of Borås, Allégatan 1, 501 90 Borås, Sweden.