After decades of building digital resources for humanities research, such as Digital Scholarly Editions (DSE), and making them available to researchers and the broader public, we are at the point where many of these resources can be connected to one another and are more and more accepted by the scholarly community. However, we also experience the challenge to maintain all the various Digital Scholarly Editions which were built on a diverse base of different technologies. This is especially complex as Digital Scholarly Editions are “living” objects. On the one hand that means that the content can be extended and refined continuously. Hence they are never finished. On the other hand the technological basis must be kept accessible, secure and running. Those two processes can be summarized under the term “data curation”.
If we assume that a Digital Scholarly Edition not only consists of the marked up texts, mostly XML documents, but also of another layer on top of the XML documents, the functionality layer – all the interactive parts, the visualizations and the different views on the texts, indexes or other research material, such as images or audio documents – it is obvious that data curation can become an unlimited complex task. This functionality layer provides an enormous additional benefit to the texts. A Digital Scholarly Edition can be seen as a tool which is used to analyze the XML documents, thus as part in the research process which must be preserved to reproduce research results which often cannot be achieved without the functionality layer.
A Digital Humanities resource usually undergoes a typical life cycle and is built by a team of team members with a variety of competences that are needed for each task:
At each step of this life cycle decisions are made which have impact on the subsequent steps. The first two steps of the list constitute the foundation on which the whole Digital Scholarly Edition is built on, from the data model over the choice of software tools until the publication as well as data curation.
Digital Scholarly Editions are sufficiently described from a methodically point of view regarding the document and text modeling (Pierazzo 2015, Sahle 2013). An analytical description from the technological point of view still is a desideratum. To make a comprehensive data curation possible a technological publishing concept which uses standardized components is needed. Such a concept can consist of standards for a formal project documentation, a description of the used technologies, the provided interfaces and APIs, a design paradigm for typical user interaction tasks, and many more. Standards on the data- and metadata-layer are broadly accepted and in use – one example are the Guidelines of the Text Encoding Initiative (TEI – http://www.tei-c.org) – but they are still missing for the functionality layer.
A high standard critical Digital Scholarly Edition can only be built in a sustainable way and be maintained when it follows technological standards which still have to be developed. The paper will present a first tiny step of a proposal for a minimal standard from the technological point of view of a Digital Scholarly Edition. It focuses on experiences made during the last ten years working on XML-based Digital Scholarly Editions built with certain tools, such as eXistdb ( http://exist-db.org). Hence the proposed solution cannot be valid for all the different kinds of Digital Humanities scholarly resources.
A possible next step towards such a formal description could be to package those XML-documents together with the source code of the functionality layer in a standardized self-descriptive format. An option for this task could be the EXPath Packaging System ( http://expath.org/modules/pkg/), which works well for XML-based Digital Scholarly Editions and is widely used by Digital Scholarly Editions which are published using eXistdb. The main purpose of such a packaging system is not connectivity or interoperability rather than maintenance and data curation. The packaging system can be extended gradually to a technological publishing format which incorporates the aforementioned aspects such as a project description format.
A possible formal project description format for the documentation will consist of the following information:
Of course this list can be just a first suggestion and does not provide all the information that can be given about a project. The project description must be accessible under a standardized URL (e.g. http://home.of.project/api/projectdescription) and can be serialized in different formats, such as XML or JSON, for further processing. That would allow a Digital Scholarly Edition to be registered at a central directory where all information and updates of various Digital Scholarly Editions which follow the same publishing model are collected automatically. Such a central directory does not exist yet. Currently existing directories collect information manually and describe projects externally, so changes and updates are harder to track.
The success of such a publishing model depends on pragmatic usage possibilities and a critical mass of Digital Humanities scholars and projects who publish their Digital Scholarly Edition using this publishing model. It is difficult to find a standardized, generic approach in the world of Digital Scholarly Editions as every project encounters a different set of problems and a different set of uses. Thus it is important as developers to not make too many assumptions about the nature of a project and further the development of a technological publishing standard in continuous exchange with the scholarly community and in very small steps which take into account the diversity across the Humanities.