DH 2016 Abstracts

MEDEA (Modeling Semantically Enriched Digital Editions of Accounts)

Most human communities have produced accounts of various sorts, and scholars have long used accounts as primary sources for economic and social history. MEDEA is a collaborative project that involves participants from three continents and includes projects using primary sources that span several centuries across multiple geographical regions. We recommend that digital editions of accounts—business, governmental, personal—use XML/TEI, the widely accepted stable archival format for digital scholarly editions, as a first step. XML/TEI-based editions of accounts should be further encoded using RDFs/OWL to take advantage of the affordances of the Semantic Web and Linked Open Data. Producing semantically enhanced digital scholarly editions of accounts on this general model will create opportunities to ask new questions about the social and economic lives of people in the past. The session features participants from MEDEA’s first year; we expect the project to continue for several more years, depending on resources.

1. Accounts: abstract models of human interactions

Account books share with printed and manuscript texts in prose, poetic, or dramatic forms a problem in data modeling because they are already abstract representations of human interactions. In the case of accounts, although these representations have taken different forms in different times and places, they have an apparent uniformity in that they often consist of lists of people, goods, and services accompanied by numerical values that represent amounts of currency, credit, or in-kind benefits exchanged. Often but not always, these values are totaled or balanced in some way (Tomasek and Bauman, 2013). The development of printed ledgers and spreadsheets illustrates the apparent uniformity of the structure of account books. Contemporary spreadsheet software fails to recognize the conceptual abstractions embedded in the physical format, but these unmarked abstractions are carried into the representations of spreadsheet information in digital form.

Accounts contain both numerical and semantic information (Thaller, 2012; Vogeler, 2014). Various types of accounts lend insight into business dealings, institutional, municipal, and state activities as well as personal uses of goods, property, and currency. The numerical information might include prices, wages, payments of customs duties or taxes as well as quantities of time worked and goods exchanged. In its simplest form, such information can be input as comma separated values (csv), and spreadsheet software can be used to perform calculations on the data. Through the emphasis on calculation, spreadsheets privilege the numerical information found in accounts.

But financial records include information of historical interest in addition to data expressed in numerical form. Account books often include the names of people with whom an individual or organization exchanged goods and services. Goods, services, and currencies have historically significant values in addition to the numerical data expressed as prices, wages, and quantities of time or goods, and such values have associations that are of historical interest. For example, different grains were staples for bread making in different regions (Allen, 2001).

Indeed, the semantic information contained in accounts can provide a rich picture of a community at the local level as well as that community’s relationships to others across space and time. In combination with other kinds of community records well known to social and economic historians—tax lists, church memberships, probate inventories, manuscript census returns—the semantic information embedded in accounts can reveal insights that the pure numerical representation might obscure. And such information might open up new avenues of inquiry or facilitate new insights into the operations of past networks of human relationships.

2. Historical accounts on the Semantic Web

If the semantic information is marked in standardized machine-readable ways, digital editions of accounts can be combined with Semantic Web technologies to produce much richer comparative information about the human relationships they represent than has been possible with earlier iterations of social and economic history. In fact, accounts are an example of a problem space in which insights of

Digital Humanities can improve longstanding practices in social scientific uses of computers to produce data about past human practices and relationships.

Since the Semantic Web offers opportunities to collect and compare data from multiple digital projects, the MEDEA project looks to the potential of developing broad standards for producing semantically enriched digital editions of accounts. Because the Guidelines of the Text Encoding Initiative (TEI) offer a method for creating stable humanities-oriented data from textual sources, MEDEA explores models for building on them to test ways to publish data on commodities, wages, and prices susceptible to comparative analysis.

While the TEI Guidelines provide a standard for markup of manuscript and print sources, some of the elements and attributes most useful for accounts fail to model machine-readable values adequate to the goal of comparability of accounts originally created across broad ranges of time and space. Thus the MEDEA project looks also to CIDOC-CRM and RDF/OWL as sites to begin consideration of the kinds of taxonomies and ontologies that will produce standard machine-readable values to express some of the semantic values found in accounts--especially information about commodities and currencies--that are relevant to humanities scholars. This includes conversion of local measures (Vogeler, 2014).

At present, the MEDEA leadership team imagines semantically enriched digital editions as networks of references between several digital representations of original archival account books. Such editions will allow scholars with varied interests to use the data from the accounts in different ways according to their fields and scholarly interests. We consider RDF a good solution to publish the data extracted from accounts in machine-readable format as it facilitates explicit references to other (possible) representations of the accounts. If spreadsheets are employed, their use should be restricted to data input and calculations.

3. Presentations

This multi-speaker session reports on results from the first stage of a joint project of historians at the University of Regensburg, the Centre for Information Modeling - Austrian Centre for Digital Humanities at the University of Graz, and Wheaton College in Massachusetts. Short presentations from participants will highlight historical and technical features central to MEDEA.

Kathryn Tomasek will act as chair for the session and offer some brief observations about the advantages of international and cross-institutional collaborations for developing standards for semantically enhanced digital editions of accounts. In North America, the greatest support for producing digital editions of accounts tends to come from a small number of well-established documentary projects and from librarians and archivists who seek to encourage use of their sources. Resistance continues in the form of familiar objections emphasizing the labor-intensive nature of TEI and transcription generally. Both using transcription and markup in undergraduate classrooms and exploring opportunities for carefully curated crowdsourcing offer possible solutions. Graduate advisors in Asia and Europe seem to be more open to encouraging their doctoral students to explore digital edition of accounts than do those in the United States. Developing an international community of practice increases the likelihood that existing projects will seek ways to optimize their own data for the Semantic Web.

Kathrin Pindl and Anna Paulina Orlowska offer examples of ongoing interest in accounts as primary sources for historical study at the graduate level. Influenced by the work of British economic historian Richard C. Allen, Pindl hopes to develop better serial data through digital edition of accounts (Allen, 2001). Pindl will discuss her MEDEA working groups’ experiences with beginning a digital scholarly edition of granary accounts from the St. Katharinenspitalarchiv in Regensburg. Orlowska will describe challenges to producing a digital edition of the accounts of Hanseatic merchant Johan Pyre, who was active in Danzig from 1421 to 1455. Since Pyre’s bookkeeping methods developed through eight clear stages during those twenty-four years and contained unreliable temporal data, they demand a careful search for proper algorithms for their translation into digital media.

Oyvind Eide, Cliff Anderson, and Georg Vogeler will speak to some of the technical questions involved with data modeling for the Semantic Web. Eide will describe advantages of event-based modeling like that used in the CIDOC-CRM. Anderson will present observations based on his comparison of XML/TEI and the contemporary business tool XBRL-GL using information extracted from the Dutch periodicals De Heraut (The Herald) and De Standaard (The Standard), which were edited by the scholar, pastor, and prime minister Abraham Kuyper in the late nineteenth century.

Vogeler will close the presentations with a brief discussion of the value of Semantic Web technologies for historians interested in the “content” layer of accounts. RDFs/OWL allows modeling and encoding of the basic economic facts recorded in historical accounts and economic records. SKOS allows describing taxonomies of commodities, services, and monetary values recorded. They can be aligned into a common vocabulary. SPARQL allows aggregate querying of resources on these common facts together with individual data recorded. Digital editions of registers and accounts that not only publish text but try to express their interpretation of the text in a “content” layer and that publish this interpretation online with the help of semantic web technologies do what scholarly edition is meant to do: publish the critical analysis of the document by a competent scholar.

4. Funding

The MEDEA project is supported by an award from the Bilateral Digital Humanities program of the National Endowment for the Humanities and the German Research Foundation [grant numbers HG-229349-15]. Any views, findings, conclusions, or recommendations expressed in this session do not necessarily reflect those of the National Endowment for the Humanities or the Deutsche Forschungsgemeinschaft.

Bibliography

Allen, R. (2001). The Great Divergence in European Wages and Prices from the Middle Ages to the First World War, Explorations in Economic History, 38: 411–47. doi: 10.1006/exeh.2001.0775.
Crofts, N., Doerr, M., Gill, T., Stead, S. and Stiff, M. (Eds.). (2011). Definition of the CIDOC Conceptual Reference Model. ISO: 21127:2014.
RDF Working Group. Resource Description Framework (RDF). http://www.w3.org/RDF/.
Sanderson, R. and Albritton, B. Shared Canvas Data Model. http://iiif.io/model/sharedcanvas/1.0/index.html.
Sarnowsky, J. (lead researcher), Die mittelalterlichen Schuldund Rechnungsbücher des Deutschen Ordens um 1400. http://www.schuredo.unihamburg.de/content/.
TEI Consortium. Guidelines for Electronic Text Encoding and Interchange. [Last updated on 15th October 2015]. http://www.teic.org/P5/.
Thaller, M. (2012). What is a text within the Digital Humanities, or some of them, at least? Digital Humanities 2012: Conference Abstracts. Hamburg: Hamburg University Press, July 2012, pp. 143-45.
Tomasek, K. and Bauman, S. (2013). Encoding Financial Records for Historical Research, Journal of the Text Encoding Initiative, 6. doi: 10.4000/jtei.895.
Vogeler, G. (2014). Modelling digital edition of medieval and early modern accounting documents, Digital Humanities 2014: Conference Abstracts. Lausanne: EPFL and UNIL, July 2012, pp. 398-400.
W3C. OWL Web Ontology Language. http://www.w3.org/standards/techs/owl#w3c_all.
W3C. OpenAnnotation Data Model. http://www.openannotation.org/spec/core.
XBRL Global Ledger. https://www.xbrl.org/thestandard/what/globalledger/.