DH 2016 Abstracts

A lesson in applied minimalism: adopting the TEI processing model

The Guidelines of the Text Encoding Initiative Consortium (TEI) have been used throughout numerous disciplines producing huge numbers of TEI collections. These digital texts are most often transformed for display as websites and camera-ready copies. TEI Simple (Rahtz et al, 2014) project was the first one to propose more prescriptive approach providing the baseline rules of processing TEI into various publication formats, while offering the possibility of building customized processing models within the same infrastructure. For the first time in history of TEI there exists a sound recommendation for default processing scheme, which significantly lowers the barriers for entry-level TEI users and enables better integration with editing and publication tools. The TEI Simple project was a Mellon-funded collaboration between the TEI Consortium, Northwestern University, the University of Nebraska at Lincoln, and the University of Oxford.

The new (on track for acceptance by early 2016) TEI method for documenting processing models gives editors and TEI customisers a method for high level recording of processing intentions in a machine-processable but implementation agnostic manner. Nevertheless, the processing model is a new proposal and needs to be extensively tested before announcing it a success. As the TEI Technical Council works to integrate the TEI processing model extensions created by the TEI Simple project, we endeavour to employ it on real world projects, both of which have been running for a significant number of years and have already produced vast collections of material: historical documents of the US Department of State, Office of Historian ( http://history.state.gov/ ) and the corpus of Ioannes Dantiscus’ correspondence ( http://dantiscus.al.uw.edu.pl/ ). The Office of the Historian publishes a large collection of archival documents of state, especially those appartaining to foreign relations. The Dantiscus project spans over ten thousand original sources from the early sixteenth century including correspondence, poetry, and diplomatic documents. This makes it a good test case for implementation of the TEI Processing Model because it is far beyond the scope of the original TEI Simple sample collections. Having the material previously published with custom-built XQuery/XSLT packages means that we are in a position to compare the results of using an approach based on the processing model with the previous one in terms of the quality and consistency of final presentation but also in more quantitative ways like the size of necessary code base, development time and ease of the long-term maintenance.

The first challenge is, obviously, rephrasing the transformations previously formulated in XQuery/XSLT using ODD meta-language extensions proposed by TEI Simple project. Preliminary results are very encouraging even though, as expected, it became necessary to extend the behaviours library to accommodate some specific needs. From the developer’s perspective it is immediately clear that using the TEI processing model brings substantial advantages in development time and results in much leaner and cleaner code to maintain. For the Office of Historian project figures suggest code reduction by at least two-thirds in size. Numbers are even more impressive realizing that the resulting ODD file is not only smaller, but much less dense, consisting mostly of formulaic <model> expressions that make it easier to read, understand and maintain, even by less skilled developers.

To a lesser extent, but it is still interesting to see if, thanks to the additional layer of abstraction that processing model brings to the table, the editors can become more independent from developers in tweaking the processing rules. This heavily depends on the personal predilections of the editor, but again, in cases where editors are already deeply involved in the decisions about encoding on the level of XML markup and do have some fluency in XPath and/or CSS our results show that it is perfectly reasonable to expect them to tailor the existing high-level processing models to fit their specific needs in a majority of cases. We will also investigate the effect of incorporating the Processing Model into eXist-db native database and application framework (Meier et al, 2016) environment in terms of easening the learning curve, for the non-technical users in particular.

The processing model at the time of writing this paper proposal is not a mature technology yet, in the sense that it still lacks the critical mass of its practitioners as well as formal acceptance by the TEI Technical Council (although this will have been integrated into the TEI infrastructure by the time of DH2016). This presentation aims to present both challenges and open questions as well as already demonstrated advantages of applying this technology. It will draw on the evidence from early adopters available by the time of DH2016. It is not only the quantitative measures of improvements in technical implementations that will be reported on, but the variation in methodologies employed by the test projects and others.

Bibliography

Meier , W . and Turska, M. (2016). TEI Processing Model Toolbox Documentation , http://showcases.exist-db.org/exist/apps/tei-simple/doc/documentation.xml?odd=documentation.odd (accessed 5 March 2015)
Rahtz, S., Mueller, M., Pytlik-Zillig, B., Turska, M. and Cummings, J. (2015). TEI Simple Processing Model Specification , http://htmlpreview.github.io/?https://github.com/TEIC/TEI-Simple/blob/master/tei-pm.html (accessed 5 March 2015)