DH 2016 Abstracts

EDM in Use: Collecting Metadata for a Regional Cultural Heritage Portal

The Europeana Data Model (EDM) aims to provide an abstract and formal specification for the delivery of data to Europeana. The model is meant to replace the older Europeana Semantic Elements (ESE) definition. Though Europeana still accepts data provided in ESE, the newer model means to provide a richer set of description possibilities and fine granulation in the distinction of the provided digital heritage object (provided CHO) and one or more digital representations thereof. Most notably, EDM is designed to provide Linked Open Data (LOD) of the described resources (cf. Berners-Lee, 2006). Thus, data providers are encouraged to switch to EDM and to take advantage of the extended potential.

The project “Repository of Styrian Cultural Heritage” will build a digital archive of cultural heritage objects, i.e. a common web platform where all partners (3 universities, 2 museums, and the local government) will expose metadata on their Styrian specific collections and holdings. As the nature of digitized objects ranges from text-centered materials like manuscripts or correspondence to images like historical photographs to museum objects of various contexts, an abstract data model for representation and retrieval was needed.

In a first step, the metadata core categories obligatory for all partners and content types were defined: They roughly correspond to Dublin Core (DC) and DCTerms and will form the basis of the web portal: institution, title/description, person/creator, time, place, object type, media type.

In a second step, this abstract model had to be converted into a formal specification of how to provide the metadata for a harvesting mechanism. The process of choice was OAI-PMH, either in form of an OAI-PMH interface or as an OAI 2.0 compliant XML-file. Harvesting of metadata on diverse cultural heritage objects naturally calls for the application of EDM expressed in XML. Thus, the abstract metadata core categories are mapped to EDM properties and formalized as XML elements and attributes. EDM is a powerful yet flexible tool, therefore a standardized application profile had to be developed for the project.

An important consideration is the integration of controlled vocabularies and norm data. EDM offers the possibility to do so, yet the incorporation of geographical coordinates for instance calls for the modelling of the place name in the edm:Place element and not in dc:coverage or dcterms:spatial.

At the end, the EDM XML data is integrated into the OAI-PMH stream. The goal is to arrive at a recommendation on data structure for the contributing institutions, a process which is currently underway.

Using EDM offers obvious advantages in the context of a harvesting portal: as it was specifically designed to capture the field of data aggregation from various sources, it offers a good spectrum of possibilities to address the difference between the original (i.e. analog) cultural heritage object (edm:providedCHO) and available web resources (edm:WebResource), but also sets these two aspects in correlation (ore:Aggregation).

The conception of a least common denominator template is achieved relatively easy though this is surely not a trivial task, taking into consideration the flexibility and integration potential of the model. Another important aspect needs constant supervision during the project: maintaining metadata quality and how to fill the template accordingly. What type of annotation of person forenames and surnames should be used? What set of keywords? What date format? What controlled vocabularies are vital? How to deal with uncertainties and fuzziness inevitably occurring in such datasets? These questions need to be addressed in cooperation with the scholars working on the collections, clarifying that these annotations will determine the accurateness of search and the quality of data retrieval and representation on the web portal. With regard to the (re-)use as Linked Open Data, special attention is given to the annotation with suitable norm data.

EDM offers a good framework for use in harvesting contexts, especially where various content types are present. Nevertheless, the experience in the project has shown that offering a technical concept is only half of the story: To achieve formal data interoperability and to simultaneously populate this model with comparable content categories is the great challenge in this context. Only with such a homogenous data basis the resulting web portal will be able to offer features like timelines, map-oriented visualisations and other discovery mechanisms. The poster will introduce the project’s so far approaches, solutions and lessons learned from this point of view.

Bibliography

Berners-Lee, T. (2006). Linked Data. https://www.w3.org/DesignIssues/LinkedData (accessed 29 February 2016)
DC and DCTerms. http://dublincore.org/documents/dcmi-terms/ (accessed 29 February 2016)
Definition of the Europeana Data Model v5.2.6 (2014). http://pro.europeana.eu/files/Europeana_Professional/Share_your_data/Technical_requirements/EDM_Documentation//EDM%20Definition%20v5.2.6_01032015.pdf (accessed 29 February 2016)
EDM Documentation. http://pro.europeana.eu/share-your-data/data-guidelines/edm-documentation (2016-02-29)
Europeana. http://www.europeana.eu/portal/ (accessed 29 February 2016)
Europeana Semantic Elements Specification and Guidelines (2013). http://pro.europeana.eu/files/Europeana_Professional/Share_your_data/Technical_requirements/ESE_Documentation//Europeana%20Semantic%20Elements%20Specification%20and%20Guidelines%2014%20July%202013.pdf (accessed 29 February 2016)
OAI 2.0. http://www.openarchives.org/OAI/openarchivesprotocol.html (accessed 29 February 2016)
OAI-PMH. http://www.openarchives.org/pmh/ (accessed 29 February 2016)
Repository of Styrian Cultural Heritage. http://wissenschaftserbe.uni-graz.at/en/ (accessed 29 February 2016)