In the ever-expanding world of digital libraries and cultural heritage collections, bibliographic metadata standards provide a structured approach to managing resources. The capture of additional contextual information further supports the identification, selection, and use of the resources described. As the problem spaces and areas of study of Humanities scholars are increasingly diversified (Henry and Smith, 2010) and supported by digital material and methods, existent approaches and systems are falling short of the needs of users (Fenlon et al., 2014; Varvel and Thomer, 2011). In short, research agendas and investigations have begun to evolve beyond searches based on traditional metadata parameters (author, date, publication place, genre).
Semantic Web technologies have been identified as a potential solution (Bair and Carlson, 2008). They augment keyword-based, full-text approaches to discovery, with methods that rely on named entity identification, relationships between entities, and the potential to leverage interlinked data from a variety of repositories and corpora. A number of different, well-defined ontologies (structural frameworks capturing domain information) created by various bodies within the wider context of the Digital Humanities (DH) have emerged (Isaac, 2013; Stead, 2006). A critical evaluation and comparison between these different structures has, however, been lacking.
In this paper, we provide a summary of four bibliographical metadata ontologies, and expand on an earlier initial comparative analysis between them. What follows is a more in-depth discussion of complementary differences and parallels in terms of expressiveness, rather than domain, focus, or perspective. The strengths and weaknesses of these vocabularies are of interest and importance to anyone working with any type of Humanities dataset or research output, whether it be interacting with metadata that describes the resource, features that have been extracted from it, or the resource itself.
To date digital libraries have relied heavily on traditional library bibliographic standards, such as MARC 1. As new research questions have arisen in DH, the limitations of earlier standards have become more pronounced (see Ramesh et al., 2015; Sfakakis and Kapidakis, 2009; Park, 2006; Cantara, 2006; Shreeves et al., 2005), and a number of ontologies designed to map the entities and relationships inherent in bibliographical metadata have emerged.
Rather than aiming to provide a comprehensive analysis of all known examples, we extended a preliminary evaluation of a small number of bibliographic ontologies. Earlier research 2 bridging the large general corpus of the HathiTrust Digital Library 3 with the specialist Early English Books Online - Text Creation Partnership 4 assessed the different needs of three distinct case study examples and analysed the suitability of existing ontologies to adequate capture associated information, including publication facts and object biographies (Nurmikko-Fuller et al., 2015a). This preliminary analysis examined four ontologies: MODS RDF 5/ MADS RDF 6, BIBFRAME (Miller, et al., 2012), Schema.org 7, and FRBRoo (Bekiari, et al., 2013). BiBo 8 was originally considered, but excluded from the final comparison as it operates on a finer level of granularity.
In this paper, we elaborate on that initial analysis, and provide access to the entirety of the comparative table (Nurmikko-Fuller et al., 2015b) 9 of which only a representative sample has previously been made available. We summarise the main characteristics of each structure in order to provide context for the detailed discussion outlining the parallels and differences between the models.
Based on available documentation and extant examples, we conducted an extensive review of the expressiveness of each ontology. Comparing each property and class against possible alignments in the other three led to the identification of parallels and differences between these models. One revelation was the differing extent to which documentation had been left incomplete, highlighting the lack of workflow standardisation in ontology-development even within a shared domain. At times the absence of extensive documentation complicated our ability to confidently assert parallels between the models.
The comparative analysis led to the insertion of all classes and properties of each ontology into one cohesive table, aligned wherever the same data could be represented regardless of how the mapping was achieved, and resulting in a table of exactly 500 rows. Five types of alignment were identified:
The bibliographic metadata ontologies discussed here differ in their approach and expressiveness. Of the four, MODS RDF/MADS RDF was found to be most descriptive, with FRBRoo an event-based model, and BIBFRAME bridging the two by virtue of containing characteristics typical of either. Schema.org stands out as an ontology that promulgates a model at the crossroads between the other four; however, its focus on instrumenting marketplace transactions also detracts from much of its descriptive power and leaves it orthogonal to the purposes of the others. It has some generic properties that are useful in each of the other ontologies, but also possesses properties and classes that end users do not require outside of point of service systems.
From the perspective of a DH user of these ontologies, each is (to some degree) a victim of its provenance and the motivations of its designers. The benefits and failings of each are different, and they all incorporate a number of idiosyncrasies:
Our review examines the structure and scope of four ontologies designed for the representation of bibliographic metadata as applied to cataloguing digital source material in the Humanities. From this analysis direct equivalences, parallels, and complementary differences have emerged: there are many similarities in aim, scope, and expressiveness, but none of the considered ontologies completely satisfy scholarly needs on their own. Moving between them is feasible, but not achievable without some lossiness, as illustrated by the examples for granular alignment (see Methodology). For the comprehensive mapping of all the aspects of a given dataset, these models need to be supplemented with less bibliographically-focused ontologies. Our analysis has highlighted the need to formalize the mappings, best practices, and transformations, as these are key to the correct (re)use of ontologies across projects and domains.
We have provided DH researchers with a window into the digital corpora design process. Knowing the requirements of domain scholars to have interactions with finer-grained research objects, we will be looking at standards like BiBo, Web Annotation, and others during the next round of research.
The authors gratefully acknowledge their colleagues Pip Willcox, Bodleian Libraries, University of Oxford; Colleen Fallaw, and Megan Senseney, Graduate School of Library and Information Science University of Illinois at Urbana-Champaign, for their invaluable contributions to the creation of the Bibliographic Ontologies Comparative Features Dataset, available at https://www.ideals.illinois.edu/handle/2142/88356.