Data about wars is typically heterogeneous, distributed in the data silos of the fighting parties, multilingual, and often controversial depending on the political point of view. It is therefore hard for the historians to get a global picture of what has actually happened, to whom, where, when, and how. We argue that Semantic Web and Linked Data technologies are a very promising approach for modeling, harmonizing, and aggregating data about war history. Our goal is to make it possible, for both historians and laymen, to study history in a contextualized way where linked datasets enrich each other. The paper presents the in-use WarSampo 1 system, where massive collections of heterogeneous data about the (Finnish) history of the Second World War are harmonized using an event-based approach, and provided as a Linked Open Data service for applications to use. As a use case, a semantic portal WarSampo providing six different perspectives to the war based on events is presented.
There are several projects publishing data about the World War I on the web, such as Europeana Collections 1914–1918 2, 1914–1918 Online 3, WW1 Discovery 4, Out of the Trenches 5, CENDARI 6, Muninn 7, and WW1LOD (Mäkelä et al., 2015). War history makes a promising use case for Linked Data because war data is heterogeneous, distributed in different countries and organizations, and written in different languages (Hyvönen, 2012). Using metadata, also different opinions and conflicting information about the war can be represented.
Many websites also publish information about the World War II, the largest global tragedy in human history, such as the World War II Database 8 to name just one. However, such portal data is typically meant for human consumption, and there are only few works that deal with machine readable data about the WW2 for applications to use (Collins et al., 2005; de Boer et al., 2013).
Our work contributes to the related research above by initiating and fostering large scale LOD publication of WW2 data on the web, based on event-based data modeling. The idea is to publish Linked Open data, aggregated from distributed data silos, for Digital Humanities applications to use. In our case study, the data is related to the Finnish Winter War 1939–1940 against the Soviet attack, the Continuation War 1941–1944, where the occupied areas of the Winter War were temporarily regained by Finns, and the Lapland War 1944–1945, where the Finns pushed the German troops away from Lapland.
We first present and discuss the data modeling approach developed for the WarSampo LOD service. After this an application of the data, the WarSampo semantic portal, is presented where events are linked to related resources, such as photos, persons, and historical places. In conclusion, lessons learned are discussed and directions for further research pointed out.
Table 1. Central datasets published and linked in WarSampo.
The project deals initially with the datasets presented in Table 1. The casualties data (1) includes data about the deaths in action during the wars. War diaries (2) are digitized authentic documentations of the army unit actions in the frontiers. Photos and films (3) were taken during the war by the troops of the Defense Forces. The Kansa taisteli magazine (4) was published in 1957–1986; its articles contain mostly memories of the men that fought on the fronts. Karelian places (5) and maps (6) cover the war zone area in pre-war Finland that was finally annexed by the Soviet Union. Organization cards (7), written after the war, document events of military units during the war. War time events (8) extracted from various publications include, e.g., battles and political incidents. The data, over 5 million triples in total, has been transformed into RDF from database dumps, spreadsheets (CSV), and by applying OCR to documents. Named Entity Recognition (NER) techniques were used to link texts to data, e.g., to identify and disambiguate persons and places in the magazine articles and captions of the photos. In addition, new datasets are planned to be included in the system, such as the Finnish Broadcasting Company YLE’s audio and film material recorded during the war or related to it (“Living Archive”), and a database of prisoners of war.
CIDOC CRM 9 is used as the harmonizing basis for modeling data, with events providing the semantic glue for data linking (Doerr, 2003). Our data model for WW1, presented in (Mäkelä et al., 2015), is used as the metadata model to start with. The model and data is documented at the data service 10.
The data is annotated using a set of domain ontologies, including: 1) an ontology of the troops and their hierarchies, 2) persons with their ranks and roles, 3) place ontology of historical places, 4) event ontology of battles, politics, and other war time incidents, 5) an ontology of time periods, and 6) a subject matter ontology. Ontologies on objects such as weapons, aircraft, and vessels remain topics of possible future work.
All WarSampo datasets are available as Linked Open Data (LOD) at the “7-star” Linked Data Finland service 11 (Hyvönen et al., 2014), based on Fuseki 12 establishing the SPARQL endpoint, and with a Varnish Cache 13 frontend for dereferencing URIs.
The idea of the WarSampo portal is to provide a variety of different perspectives to war history. There are six perspectives available: Events, Persons, Army Units, Places, Kansa taisteli Magazine Articles, and Casualties (Hyvönen et al., 2016). The idea is that the perspectives enrich each other based on data linking.
Figure 1 illustrates the WarSampo Events perspective application to the WarSampo data. Events are displayed on a map, (a) in Fig. 1, and on a timeline (b) that shows here some events of the Winter War. When the user clicks on an event, it is highlighted (c), and the historical place, time span, type, and description for the selected event are displayed (d). Photographs related to the event (e) are also shown. The photographs are linked to events based on location and time. Furthermore, information about casualties during the time span visible on the timeline is shown alongside the event description (f), and the map (a) features a heatmap layer for a visualization of these deaths.
Figure 1. Event perspective featuring a timeline and a map.
The events can also be found and visualized through other perspectives. For example, in the Army Units perspective, the events in which a unit participated can be viewed on maps and in time, providing a kind of graphical activity summary of the unit. In the casualties perspective, military units of the dead soldiers are known, making it possible to sort out and visualize the personal war history of the casualties on historical maps that come from a yet another dataset in WarSampo.
The WarSampo semantic portal 14 was published Nov 27, 2015 at and has had tens of thousands of users. It is implemented solely on the SPARQL endpoint of the WarSampo LOD data service.
Our first experiments, as illustrated in Section 3, suggest that heterogeneous datasets of war history really can be interlinked with each other through events in ways that provide useful insights for the historians. We have also learned that even in the rural northern parts of Europe, massive amounts of WW2 data can be found. We have initially dealt with tens of thousands of people killed in action. However, there is also data available about hundreds of thousands of soldiers who survived the war. In addition to historians, WarSampo data is very interesting to the laymen, too: every soldier’s history is of interest at least to, e.g., his/her relatives. Managing the data, and providing it for different user groups, suggests serious challenges when dealing with, e.g., the war in the central parts of Europe, where the amount of data is orders of magnitude larger than in Finland, multilingual, and distributed in different countries. For example, solving entity resolution problems regarding historical place names and person names can be hard. However, it seems that Linked Data is a promising way to tackle these challenges.
Our work 15 is funded by the Ministry of Education and Culture and Finnish Cultural Foundation. Wikidata Finland project financed the alignment of the historical maps in WarSampo.
“Sampo'' is a magical artifact in Finnish mythology that brought good fortune to its owner.
The application is in use at http://sotasampo.fi.
See the project homepage http://seco.cs.aalto.fi/projects/sotasampo/en/.