The Baedeker Corpus, a digital collection of early German travel guides that is currently being developed in the travel!digital project 1, brings together a valuable but rarely investigated part of cultural heritage in an up-to-date and sustainable digital form. A combination of source oriented approaches and Semantic Web formats (Meroño-Peñuela et al, 2014) allows for enhanced access to both the TEI-based digital edition and the rich domain-specific knowledge. As complex inter-texts (Wierlacher, 1997; Koshar, 2000) and significant discourse-historical artefacts (Maingueneau, 2014) travel guides represent codified and authorized versions of local culture and history (Pritchard et al, 2005). Reflecting dominant discourses, (re)producing and (re)constructing them, the genre plays a central role in shaping the tourist experience and directing the tourist gaze (Thurlow et al, 2007; Urry, 1990). Giving insight into various readings of history, tradition and culture, the new language resource is meant to foster cross-disciplinary research in cultural representation and identity constructing discourses.
The Baedeker Corpus comprises all first editions of German travel guides on non-European countries which were brought out by the Baedeker publishing house before World War I. It contains more than 1.5 million running words and covers various regions, offering a balanced picture of different cultural areas. 2 Along with the basic layers of linguistic annotation, controlled vocabularies and content contextualization by Linked Open Data resources will assist in exploring cultural narratives from the turn of the 19th century.
Focusing on people and monuments, two essential components of the guidebook genre and of cultural discourse itself, the extensive lexical inventory is represented by means of the Simple Knowledge Organization System SKOS. Besides personal names, people are addressed in a variety of different forms in the guidebooks. References to classes are frequently used in making generalizations about groups and individuals as well (cf. Schmidt-Brücken, 2015). Each of these generic expressions is represented by a bdk:Descriptor, which is defined as a subclass of skos:Concept. Individual terms are encoded as skosxl:prefLabel(s) and skosxl:altLabel(s). Variants and translations are designated on term-level by the properties hasVariant/isVariantOf and hasTranslation/isTranslationOf. Modelled in this way, the vocabulary identified in the guidebooks forms the basis of the concept scheme. Seven categories indicated by skos:topConceptOf improve structuring of the bdk:ConceptSchemeGroups:
1. Collective terms: population, tribes, natives; 2. ethnic/national communities: Englishmen, Wedda; 3. geographical concepts: Europeans, Orientals; 4. names of languages and scripts, language affiliation: Arabic, Cyrillic; 5. professions, including political, religious, economic roles, and styles of living: merchants, government officials, priests, peasants, nomads; 6. religious communities: brotherhood, pilgrims; Buddhists, Sikhs; and 7. social classes: castes, workers. Concepts and labels include both nouns and adjectives, indicating associations among them by means of the property skos:related. Figure 1 lists definitions of skos:topConcept(s) and shows selected examples of concepts and labels.
The picture is similarly varied for monuments and notable sights. Since assessments and classifications of cultural heritage objects are integral parts of cultural representation, they are included in a separate concept scheme. The bdk:ConceptSchemeMonuments is organized by skos:topConcept(s) which indicate topical spheres the objects belong to, ranging from architecture and artworks, to accommodations, landscapes, and breath-taking views.
By the end of the project, a web application based on the corpus_shell framework 3 will make the digital texts available along with their facsimiles, exposing them via FCS/SRU protocol 4 that is part of the CLARIN infrastructure. This online edition will offer querying capabilities inside of both the text and the linguistic annotation layers. It will provide indexes of word classes, lemmas and the semantic entities defined by the SKOS-vocabularies presented in this abstract. Transforming names of people/s and monuments in the texts into links to the LOD cloud 5, the vocabularies will connect occurrences in the Baedeker Corpus to other online resources, providing enhanced access to the corpus and additional information via the guidebooks’ main actors. The presented data model aims at supporting fine-grained examinations of semantic components that have a lasting influence on cultural perceptions of “Other” and “Self”. It is expected that semantic technologies do have the potential to reveal much about a discourse that goes far beyond travel literature.
The project “travel!digital. Exploring People and Monuments in Baedeker Guidebooks (1875–1914)” is funded by the platform Digital Humanities Austria.
Palestine and Syria (1875), Lower (1877) and Upper Egypt (1891), North America and Mexico (1893), Asia Minor (1905), the Mediterranean Coastline of Africa (1909), India and Ceylon (1914).
Cf. https://clarin.oeaw.ac.at/corpus_shell [accessed 2015-12-22].
Cf. https://www.clarin.eu/content/federated-content-search-clarin-fcs [accessed 2015-12-22].
LOD datasets: VIAF Virtual International Authority File, GESIS Thesaurus for the Social Sciences, AAT Art & Architecture Thesaurus, UNESCO Thesaurus, DBpedia.