In an unprecedented and very successful crowd sourcing project, the most important resource for abbreviations in Latin and Italian has been digitised and is now freely available. The e-learning project Ad fontes managed to accumulate and systematize all 14’357 abbreviations contained in the most renowned collection, the Dizionario di abbreviature by Adriano Cappelli (Cappelli, 1928). The digitized and systematized abbreviations offer new ways to access handwritten texts. Besides enhanced search methods (with wildcards, uncertainty, abbreviation placement etc.) it will soon be possible to add new abbreviations and rectify entries that were handled incorrectly by Cappelli.
The poster presents the crowd-sourced digitisation of the printed Lexicon and considers specific problems and lessons learned while dealing with the crowd; it gives insights into new possibilities regarding the research of abbreviations, discusses possible ways to deal with abbreviations, and at the same time raises questions concerning requirements and nice-to-haves for an application that hopes to become an ever-growing resource to abbreviations found in historical sources.
Following Cappelli’s model, each abbreviation is presented as a „facsimile“ of a hand-written abbreviation followed by a transliteration of the letters present, the placement of the abbreviation symbol(s) in a grid, if applicable the category of context (e.g. legal or medical) as well as the period the manuscript stems from.
All this information could be identified by users without further knowledge of paleography or Latin and thus offered ways to allow non-experts to take part in the crowdsourcing process. In addition to Cappelli’s system, we had the participants place the abbreviation symbols within a 3x3 grid. This allows the introduction of a new search parameter refining the search according to the positioning of abbreviation marks. Unlike the data indicated in Cappelli (for which only a very limited number of mistakes and typos were entered), the placement of abbreviation marks was highly problematic and needed to be corrected in most cases by expert validation. In a next phase we would not include this part in the regular crowd sourcing process, but instead have it done separately.
Within 23 days, all of the abbreviations contained in Cappelli were digitised by our crowd sourcing participants. Mobilising such a crowd was made possible by a highly connected academe and archivists (via mailing lists such as digital medievalist as well as social media). Apart from the option of remote online participation, we planned a crowd sourcing event at the University of Zurich that was supplemented by smaller events at universities in Oxford and Berlin.
The information gained allows – based on very provisional data – conclusions concerning the use of abbreviations in Latin manuscripts. Of the 9’000 most common words in Latin (according to the SLU corpus, Pavur, 2009) 1094 (12.2 %, roughly 13% if we subtract words consisting of 3 letters or less which are usually not abbreviated) could potentially be abbreviated, only taking basic forms of words into account (no flections or conjugations). Compared to the Vulgate, the data shows that of the 38’138 words occurring in the Bible, 1083 exist in abbreviated form. Therefore the data demonstrates what could be expected (Traube, 1907): The abbreviations were not especially or solely conceived for the use in handwritten Bibles but for a variety of texts. A digital Cappelli is thus able to show in a quantitative way what specialists already suspected.
The abbreviations are being offered through the platform Ad fontes (www.adfontes.uzh.ch, Kränzle and Ritter, 2004) as well as the web app App fontes (t.uzh.ch/adf). A batch download of all data including the images is possible.
The digitization and systematization of the abbreviations according to Cappelli opens possibilities not included in the printed version: Unidentifiable letters and/or letters that are not part of the roman alphabet do not need to be known by the user; instead, they can use wildcards in order to get satisfactory search results. Generally, uncertainties won’t prevent a successful search, they will only increase the number of results.
Currently, a feedback loop (concerning emendations of the Cappelli) as well as the possibility to add further abbreviations are being developed. Especially the second part will once more use the power of a specialized crowd, at the same time assembling a resource for the use of everyone working with manuscripts.
By crowdsourcing the digitisation of the Cappelli by a heterogeneous group of people, we proved that there is an interest in and a need to deal with abbreviations (Pluta, 2016). With its new resource, Ad fontes will facilitate how abbreviations in handwritten documents are dealt with in the 21st century.