DH 2016 Abstracts

Sustainable publishing - Standardization possibilities for Digital Scholarly Edition technology

After decades of building digital resources for humanities research, such as Digital Scholarly Editions (DSE), and making them available to researchers and the broader public, we are at the point where many of these resources can be connected to one another and are more and more accepted by the scholarly community. However, we also experience the challenge to maintain all the various Digital Scholarly Editions which were built on a diverse base of different technologies. This is especially complex as Digital Scholarly Editions are “living” objects. On the one hand that means that the content can be extended and refined continuously. Hence they are never finished. On the other hand the technological basis must be kept accessible, secure and running. Those two processes can be summarized under the term “data curation”.

If we assume that a Digital Scholarly Edition not only consists of the marked up texts, mostly XML documents, but also of another layer on top of the XML documents, the functionality layer – all the interactive parts, the visualizations and the different views on the texts, indexes or other research material, such as images or audio documents – it is obvious that data curation can become an unlimited complex task. This functionality layer provides an enormous additional benefit to the texts. A Digital Scholarly Edition can be seen as a tool which is used to analyze the XML documents, thus as part in the research process which must be preserved to reproduce research results which often cannot be achieved without the functionality layer.

A Digital Humanities resource usually undergoes a typical life cycle and is built by a team of team members with a variety of competences that are needed for each task:

Analysis of the sources to be edited (humanities scholars)
Requirement Engineering (the whole project team)
Design of the data or document model, choosing what standards to use (scholars, database-, markup-, metadata-specialists)
Choosing, adopting, and/or developing software tools for transcription, editing and publishing (software developers, scholars)
Installing and maintaining development servers and web servers (system administrators)
Conceptual design and implementation of the web publication of the Digital Scholarly Edition (web designer, web developer, scholars)
Preparation for long term access and archiving (documentation- and metadata-specialists)
Service support and maintenance after project finished (data curators)

At each step of this life cycle decisions are made which have impact on the subsequent steps. The first two steps of the list constitute the foundation on which the whole Digital Scholarly Edition is built on, from the data model over the choice of software tools until the publication as well as data curation.

Digital Scholarly Editions are sufficiently described from a methodically point of view regarding the document and text modeling (Pierazzo 2015, Sahle 2013). An analytical description from the technological point of view still is a desideratum. To make a comprehensive data curation possible a technological publishing concept which uses standardized components is needed. Such a concept can consist of standards for a formal project documentation, a description of the used technologies, the provided interfaces and APIs, a design paradigm for typical user interaction tasks, and many more. Standards on the data- and metadata-layer are broadly accepted and in use – one example are the Guidelines of the Text Encoding Initiative (TEI – http://www.tei-c.org) – but they are still missing for the functionality layer.

A high standard critical Digital Scholarly Edition can only be built in a sustainable way and be maintained when it follows technological standards which still have to be developed. The paper will present a first tiny step of a proposal for a minimal standard from the technological point of view of a Digital Scholarly Edition. It focuses on experiences made during the last ten years working on XML-based Digital Scholarly Editions built with certain tools, such as eXistdb ( http://exist-db.org). Hence the proposed solution cannot be valid for all the different kinds of Digital Humanities scholarly resources.

A possible next step towards such a formal description could be to package those XML-documents together with the source code of the functionality layer in a standardized self-descriptive format. An option for this task could be the EXPath Packaging System ( http://expath.org/modules/pkg/), which works well for XML-based Digital Scholarly Editions and is widely used by Digital Scholarly Editions which are published using eXistdb. The main purpose of such a packaging system is not connectivity or interoperability rather than maintenance and data curation. The packaging system can be extended gradually to a technological publishing format which incorporates the aforementioned aspects such as a project description format.

A possible formal project description format for the documentation will consist of the following information:

The name of the project and all involved institutions and persons.
The status of the project: planned, work in progress, published, or finished.
The applied technologies and standards.
The licenses, which are used for research data, source code and other components such as fonts, audio or video documents.
Information about where to find the source code, if the source code is available under an open source license.
Information about provided APIs and other interfaces to retrieve the research data and metadata in various formats (XML, JSON etc.) or get structured information about persons or places to be processed further in other contexts (In case of a correspondence edition metadata about the letters should be prepared in the Correspondence Metadata Interchange Format (CMIF) to be reused by http://correspsearch.bbaw.de).
Contextual details about the data producers, how data are collected etc. (More at Faniel 2015)
Canonical citation rules and instructions for persistent referencing of current parts and older versions of the research data.
A standardized change log, which can be evaluated by other services.

Of course this list can be just a first suggestion and does not provide all the information that can be given about a project. The project description must be accessible under a standardized URL (e.g. http://home.of.project/api/projectdescription) and can be serialized in different formats, such as XML or JSON, for further processing. That would allow a Digital Scholarly Edition to be registered at a central directory where all information and updates of various Digital Scholarly Editions which follow the same publishing model are collected automatically. Such a central directory does not exist yet. Currently existing directories collect information manually and describe projects externally, so changes and updates are harder to track.

The success of such a publishing model depends on pragmatic usage possibilities and a critical mass of Digital Humanities scholars and projects who publish their Digital Scholarly Edition using this publishing model. It is difficult to find a standardized, generic approach in the world of Digital Scholarly Editions as every project encounters a different set of problems and a different set of uses. Thus it is important as developers to not make too many assumptions about the nature of a project and further the development of a technological publishing standard in continuous exchange with the scholarly community and in very small steps which take into account the diversity across the Humanities.

Bibliography

Faniel, I. (2015). Data Management and Curation in 21st Century Archives, http://hangingtogether.org/?p=5375> (accessed 6th March 2016).
Pierazzo, E. (2015). Digital Scholarly Editing, Theories, Models and Methods. Ashgate.
Sahle, P. (2013). Digitale Editionsformen, Zum Umgang mit der Überlieferung unter den Bedingungen des Medienwandels, Bände, Norderstedt: Books on Demand 2013, 3.