The implementation of geo-historical gazetteers increasingly depends upon the development of Natural Language Processing (NLP) and Corpus Linguistics as well as geographical analysis in disciplines such as History, Archaeology and Literary Studies. The application of these methods usually relies on the appropriate modelling of databases for performing the semantic enrichment of documents including geoparsing tasks. At the same time, even when performing a manual enrichment and referencing of place mentions in texts or in library or museum catalogues (e.g. when applying the CIDOC CRM model and its spatio-temporal extension), an adequate source of external information is crucial.
Today, geo-historical data are more and more often published following the Linked Data (LD) principles: i.e. using URIs and data format standards (RDF) and linking to other data sets to enable information discovery. Moreover, an implicit driving principle of LD, widespread in the Semantic Web community, is the reuse of vocabularies and ontologies already defined by others to avoid duplication. Keeping track of provenance is crucial. Pleiades is one of the best examples these days, but other generalistic sources such as DBpedia, Wikidata or GeoNames also provide interesting - albeit partial – geo-historical information and have proved to be useful in Digital Humanities (DH) projects. Linking texts to external sources using URIs enables the retrieval of additional information about the referenced places. Once this has been achieved, the information in the sources can be easily used to produce different views and aggregated analysis of corpora: i.e. visualizations (Jessop, 2008); this in turn is meant to help scholars to capture place perceptions and to analyse spatio-temporal phenomena described in corpora.
The choice of geo-historical datasets which are used as gazetteers depends on the domain of the texts under consideration. Pleiades is specifically suited to places in Mediterranean Ancient History texts. However, tasks such as referencing places from historical periods other than Antiquity, or identifying geographically vague or imaginary places in literary texts, if ever possible, might need the development of a different methodological approach, which would include the construction of conceptual mapping models and the creation of a completely different kind of gazetteer. In any case, the choice will have an important influence on the results of such visualizations as well as on the pertinence of the interpretation. Existing gazetteers vary widely in how they abstract the world. Important aspects – such as scale, the representation of time (and change over time), complex geometries, uncertainty and vagueness as to location and/or date, multiple points-of-view, representation of hierarchies of political-administrative units, their boundaries and their change over time, alternative names (rejected and standard forms, vernacular and multilingual), representation of fantastic places – are modelled in different ways, or are missing altogether. This limits their applicability in the Humanities. Moreover, interlinking between corresponding entities in different gazetteers is often lacking, although progress has been made in this regard, through community initiatives or by using GeoNames, or Wikidata as backbones (Simon et al., 2015). Finally, the ontologies used to link toponyms in texts to spatial references need to be further developed, especially when it comes to deal with fuzziness and uncertainty in mentions (Reuschel and Hurni, 2011).
Clearly, new models should conform to LD principles, and they should privilege the reuse of existing and consolidated ontologies, vocabularies and datasets whenever possible. Long term preservation and maintenance are also crucial problems in this sense because texts enriched with references to sources that have become obsolete or unavailable may have results that are unusable for the task for which they were tagged (Janowicz et al., 2012). In this sense, specialisation of efforts on the one hand (for pooling efforts) and coordination on the other, are crucial for such projects. Finally, geo-historical projects should also promote harmonization of their data with standards and practices of the broader DH community, and of the current research trends, in particular for what concerns the interoperability of resources within the framework of larger research infrastructures such as CLARIN or DARIAH.
In the proposed full-day workshop, we will focus on geo-historical gazetteers, and we will discuss their limits in supporting the needs of the Spatial Humanities (SH) community. The proposed workshop will be composed of nine presentations (abstracts are listed below), each of which concerns the production of geo-historical gazetteers as LD as well as the annotation, the recognition and the geoparsing of place names referenced in texts, library and museum catalogs, digitized maps, etc.
Christopher Donaldson (University of Birmingham), Extracting and visualizing the geographies in historical travel writing: This presentation will introduce a procedure for the automated extraction and resolution of geographical information from a corpus of historical writings about the English Lake District. The research on which the presentation is based is using the spatial analysis of geo-historical LD sets to achieve a more comprehensive and refined understanding of how the landscape of the Lake District was perceived, represented, and experienced in the past.
Karl Grossner (Stanford University), Joining Place and Period in Historical Gazetteers: Places referred to in historical documents and gazetteers have temporal as well as spatial extents. Likewise, historical periods have spatial extents. However existing data models and format standards and the mapping and timeline software that use them do not reflect this. I will discuss recent work on Topotime, an extension to the GeoJSON format adding temporal expressions, and allowing for some types of uncertainty encountered in historical data.
Katherine Hart Weimer (Rice University): A wealth of geographic information is included in library catalogs, with existing structures for name disambiguation, cross-referencing and inclusion of geographic coordinates. Recently, efforts are underway in libraries to convert this data into LD allowing for cross platform applications. The presentation describes an experiment in this sense.
Maurizio Lana (Università del Piemonte Orientale): Annotation of place mentions in Latin Literature. The annotation pipeline uses parsing+NER but later mentions are manually checked and referenced to external gazetteers such as Pleiades. The novelty of the project is the GeoLat GO! ontology that allows for a more complex annotation.
Bruno Martins (University of Lisbon): NLP and IR methods for handling geospatial information in textual documents. In my talk, I will present a brief survey of techniques for handling geospatial information within textual documents, including work at our team in the University of Lisbon, and other methods proposed within the Computational Linguistics and IR communities. I will discuss methods to address the problems of (i) document geocoding, (ii) toponym resolution, and (iii) selecting geographically relevant key-phrases. Applications within the broad field of DH, and SH in particular, will also be outlined.
Patricia Murrieta-Flores (University of Chester): So far, research in the SH has been mainly concerned with geographically precise information or what could be considered as ‘real’ places in historical and literary sources. Nevertheless, non-locational places play an important role in narratives of all sorts of sources from the fantastic, to geographically vague travel accounts. This is an important limitation in the analysis of place in the DH. Using Medieval Romances as an example, this presentation will discuss the challenges posed by literary narratives of place in terms not only of disambiguation, but also reference to fantastic and non-locational places.
Michael Page (Emory University): Atlanta Explorer: Historical Geocoding & the City: Atlanta Explorer focuses on building datasets and geospatial tools to explore the history of the city. Completed is a geodatabase and geocoder for circa 1930 and the pilot 3D virtual environment. The next phase includes producing geocoders for the remaining years (1868–1930) and therefore strategies and methods for developing historical geocoding datasets and tools for place discovery will be discussed. Our goal is to also share the underlying data with the community CityGML as how we would likely share and archive the model.
Rainer Simon (Austrian Institute of Technology), Pelagios project: an international community initiative concerned with the development of Linked Open Data methods, tools and services to better interconnect geo-historical datasets. In its most recent project phase (“Pelagios 3 - Early Geospatial Documents”), Pelagios has developed Recogito, a semi-automatic geo-annotation tool; Peripleo, a geotemporal search engine. Furthermore, Pelagios has annotated more than 300 historical sources from different cartographic traditions, collecting more than 120,000 place references in literary texts and early maps.
Humphrey Southall (University of Portsmouth): Engaging the wider public with historical gazetteers. Gazetteers are a powerful tool for humanities researchers, but are also of great fascination and utility for the general public. That interest enables academic projects to achieve wider “impact”, enables popular web sites to be sustained by advertising income, and enables expansion through crowd-sourcing. This presentation covers three experiences: the established Vision of Britain site; PastPlace, our new LD gazetteer which uses Wikidata as a spine to which we are adding historical toponyms; and GB1900, a crowd-sourced gazetteer building project developed in collaboration with National Libraries in Great Britain.
The proposed workshop targets an audience of scholars, data designers, and software developers, and will also comprise a speed presenting session for participants, topic-based breakout discussions between experts and attendees, a panel to highlight research priorities and summarize the main contributions of the workshop and research directions.