Described as 'a great monument to European literature' (David and David, 1964; 180), Jacob and Wilhelm Grimm’s masterpiece Kinder- und Hausmärchen (hereafter KHM) has captured adult and child imagination for 200 years. International cinema, literature and folklore have borrowed and adapted the Brothers’ fairy tales in multifarious ways, inspiring themes and characters in numerous cultures and languages. While commonly and erroneously considered the fathers of the genre, the fairy tales were not original to the Brothers. In fact, Jacob and Wilhelm collected and adapted their stories from earlier works, some of them dating back to as far as the seventh century BC, and made numerous changes to their own collection (David and David, 1964, p. 183), producing seven distinct editions between 1812 and 1857. In these four decades of writing and rewriting, the fairy tales changed in number, style and content in accordance with historical, social and literary influences. And yet, how did forty years of revisions not confuse the tradition and transmission amongst followers and readers? Indeed, some fairy tales were changed almost beyond recognition. What makes them so timeless and memorable? What is it that immortalises these tales? An answer to this question can be found in the motifs the Brothers borrowed from earlier traditions and disseminated by way of their famous collection.
Motifs, defined by Prince's Dictionary of Narratology (2003) as '[...] minimal thematic unit[s]', pervade the Grimm collection and are stable elements interlacing the seven editions of the KHM. The puss with the boots ( Der gestiefelte Kater) and the concept of the breadcrumb trail (originating in the Hänsel und Gretel tale) are both examples of motifs, and they recur not only throughout the Grimm editions, but also over time and space. The occurrence and repetition of motifs within the Grimm collection is a form of intratextuality, a term used to describe the internal relations within a text or an author and, in our case, within the KHM editions. But a motif may also appear in other authors across traditions and languages, thus creating intertextual relations, relations that the KHM may have with other texts.
The breadcrumb trail generated by these motifs in literary history, and internationally spread through the Brothers Grimm, has been extensively studied by folklorists, historians and literary critics. Akin to memes, motifs are a form of information transfer and reuse, which opens up numerous opportunities for computational research in cultural evolution and transmission. Interestingly, however, the study of motifs has not yet fully explored the affordances of digital methods. Many authoritative volumes and ontologies have been published in print, such as the well-known Enzyklopädie des Märchens, or the Estonian Folktales and the Catalogue of Portuguese Folktales, but only a few digital projects or digital editions of these print sources exist. One such digital initiative is the Aarne-Thompson's Motif-Index 1, a crucial contribution to the field, often used as a reference system for the production of folktale catalogues.
The situation, however, is different for fairy tales, inasmuch as digital copies of many folktale collections are freely available from Google Books or the Internet Archive, 2 or from online collections, such as the Nederlandse VolksverhalenBank initiative 3 or the Satorbase project 4, fostering intertextual research never before possible. 5 Indeed, we can now leverage hyperlinks and APIs in order to automatically retrieve specific and previously inaccessible information across the web, and to connect existing resources for comparative studies. Moreover, no effort has yet addressed the cross-cultural relations of fairy tales, giving way to opportunities for interdisciplinary, multilingual and big data research.
The new project 6 described in this paper is one such opportunity, whereby an international and interdisciplinary team of computer scientists and humanists is semi-automatically crawling digitised texts and the web to produce a multilingual motif index that uses the Grimm collection as its base reference. 7 More specifically, we combine knowledge acquired from existing print and digital resources with the deployment of the Google Search 8 and Google Books APIs 9 in order to automatically retrieve as many motifs across the web in as many languages as possible, and hence to explore the intratextual and intertextual relations that characterise the motifs' hosting texts. The end goal is twofold; on the one hand, we provide a comprehensive reference resource for scholars in the field and interested citizens alike and concurrently revise the Aarne-Thompson Index; on the other, by testing state-of-the-art text reuse and retrieval algorithms on a sample of these diverse and large datasets, we are able to refine our methods in order to accommodate further web-scale queries and thus sharpen our understanding of why and how motifs changed.
The case studies we are working with to address our research questions are three Brothers Grimm tales: Snow White, Puss in Boots and The Fisherman and his Wife. These were chosen on the basis of their differing degree of popularity in order to better understand how transmission affects popularisation.
Our research starts with digital and clean copies of the Grimm texts, downloaded and catalogued from TextGrid 10 and Wikisource 11. Next, our international team of researchers and student assistants collects digitally available translations and/or editions of the three tales in multiple languages 12 and manually enters them into a database, where information about the web source, the tales, the language, the work and the author is stored. 13 Once this manually-compiled dataset is complete, we deploy the TRACER suite of text reuse algorithms (Büchler, 2013) to trace additional motifs in other digital libraries or corpora. At the same time, we use the Google Search and the Google Books APIs to search for motifs at a much larger scale, effectively crawling the web.
Like the KHM, we believe this project appeals to a wide and diverse audience not only because of its subject matter, but also because of its international and interdisciplinary character. Our international group operates at the intersection of Computer Science and the Humanities in the arena that is Digital Humanities. This project is unique insofar as each and every member of the team can contribute a piece of his or her own culture, adding a personal and familiar touch to this joint endeavour. By exploring these different cultures, we aim to establish fruitful collaborations and, in so doing, broaden the boundaries of the Digital Humanities.
Furthermore, we believe that this project fully engages humanists in the digital process of tracing texts through space and time. Following the motif trail back in time allows humanists to investigate lines of transmission of folktales and to potentially uncover additional trails through which other documents or stories travelled. At the same time, it enables the computer scientists in the team to identify any shortcomings in our algorithms and to better understand what to automatically feature when tracing this type of information in a digital ecosystem.
The Aarne-Thompson Motif-Index can be accessed at: http://www.ruthenia.ru/folklore/thompson/ (accessed 18 October 2015).
For example, the 1550-1553 Venetian collection Le piacevoli notti by Giovanni Francesco Straparola, at: https://goo.gl/fAA0J6 (accessed 21 October 2015).
Available at: http://www.verhalenbank.nl (accessed 1 January 2016).
Available at: http://satorbase.org/ (accessed 1 January 2016).
The increasing availability of digital and digitised assets allows us to access information more easily and to potentially uncover previously unknown or unchartered territory.
Starting in October 2015 and running until December 2018.
The team does not include but consults folklorists. We start with the Grimm collection as we already have clean data to work from.
Available at: https://developers.google.com/custom-search/ (accessed 26 October 2015).
Available at: https://developers.google.com/books/?hl=en (accessed 26 October 2015).
Available at: https://textgridrep.org/browse/-/browse/nxvg_0 (accessed 26 October 2015).
Available at: https://de.wikisource.org/wiki/Kinder-_und_Haus-M%C3%A4rchen_Band_1_%281819%29 (accessed 26 October 2015).
eTRAP is currently a team of twelve people from seven nationalities speaking eleven different languages.
An example may be of use in clarifying the point. Grimm's Snow White corresponds to Pushkin's Сказка о Мертвой Царевне и о Семи Богатырях ( The Tale of the Dead Princess and the Seven Knights in English). The two tales differ in many points, including the title of the tale. In Pushkin the princess is protected by seven knights (семь богатырей) whereas in the Grimm tale it is seven dwarves. Despite the differences, the motifs of the beautiful princess and of her seven protectors link the two stories. To hyperlink and map these versions and their differences, we use a combination of Thompson identifiers for tales, VIAF identifiers for authors and works, and customised identifiers where existing ones do not apply. This semi-automatic approach allows us to populate our database with both content and metadata, and establish relations between the different versions.