DH 2016 Abstracts

Contextualizing Receptions of World Literature by Mining Multilingual Wikipedias

Reading in translation is an impoverishment, not so much because of the fluctuating quality of a translation or the loss of a perceived 'original work' but because of the elision of sociolinguistic context and the difficulty in conveying that lost context to readers. That world literature is taught almost always in translation at universities in America and elsewhere (a situation driven by the low capacity for foreign language instruction at the university level and the broad linguistic reach of world literature courses) compounds this problem. Of the estimated 20.2m American undergraduates in 2015, an estimated 1.6m are enrolled in foreign language courses, and, historically, only 17% of those reach the higher levels of proficiency necessary to read literature (Goldberg et al., 2013). The problem of American monolingualism has even affected the publishing industry; as noted translation theorist Laurence Venuti argued (1998), there are more translations of English texts into other languages than ever before while fewer foreign texts are translated into English. This trend means that American students are even less likely to feel comfortable with literature in translation than students from non-English speaking countries.

However, conversations about literary summarization, a core practice of critical reading, take place globally and are preserved in the background of pages on every national language Wikipedia. As each language community summarizes their translations of works of world literature in the form of Wikipedia articles, they generate for every crowdsourced entry a discussion page and a history of that page; together, they reveal that literary work's history of reception for a reading community in a given language. By striving to synthesize an authoritative, peer-reviewed summary of text native to or translated into that community's language, each group highlights their concerns, thought processes, and the challenges posed by a given work. The goal is to make these differences and conversations more visible to readers through techniques from natural language processing for automatic translation, topic modeling, and the visualization of topic models; in so doing, we aim to develop a method by which the digital humanities can address a core problem in comparative and world literature.

Machine translated Wikipedia discussions can help reveal to monolingual audiences the degree to which cultural pragmatics influence the reception of key (and popular) works of world literature. Although flawed, automatic translation has been found in some languages to be comparable with human translation, at least in regards to cohesion and formality (Li et al., 2014). Presenting parallel national-language conversations – such as the national language conversations about topics like translating the title for Camus' L'Étranger / The Stranger / Der Fremde/ The Foreigner and a visualization of the change-over-time of Goethe's Faust I summarizations in English, German, and Spanish – would help demonstrate the practical reception of a work in a language community. While work has been done on multi-lingual topic models (Ni et al., 2009), our research assumes that there will be both alignment and misalignment of topics across the various languages of a work; as such, our project resists the urge to normalize those topics into one category on the basis of an imperfect vector model of semantic similarity. Furthermore, experiments on iterative summarizations, such as elements of The Tale of Genji, demonstrate how even basic tasks in literary scholarship contain cultural dimensions and can thus reveal strong cultural patterns and biases held by different populations (Kashima, 2000). Along with the works mentioned above, this research explores the Wikipedia conversations around J.D. Salinger's Catcher in the Rye / Der Fänger im Roggen / El guardián entre el centeno o El cazador oculto / L'Attrape-cœurs / Il giovane Holden and Homer/Omero's The Odyssey / Die Odyssee / Odisea / L'Odyssée / Odissea across English, German, Spanish, French, and Italian Wikipedia pages.

In an increasingly digitized cultural landscape, the pedagogical use of Wikipedia and similar platforms has gained a great deal of traction in certain fields. While it is often derided as a dubious source for information, Wikipedia has proven to be a successful arena for instructing students on the distillation of information, writing in the public sphere, and collaborative writing (Purdy, 2009; Vetter, 2014; Sweeney, 2012). As a multilingual space, Wikipedia has offered many scholars the opportunity to examine the ways in which the cross-cultural sharing of information takes place (Nothman et al, 2013; Filatova, 2012). In some instances, Wikipedia specifically has been used as a place for the comparison of knowledge and representation across cultural and geographic divides (Callahan and Herring, 2011). These scholars provide a framework for examining the cross-linguistic aspects of Wikipedia in order to highlight cultural differences and deconstruct colonial power structures that privilege the English language (Ensslin, 2011). In the available scholarship, the use of Wikipedia in teaching literature has been largely ignored; only a few studies exist and those mainly address literary studies' reticence to incorporate Wikis into pedagogy (Bayliss, 2013). However, quite a few studies suggest that Wikipedia can occupy a distinct operational space in the university that supplements established pedagogical spaces and practices (Gorard and Selwyn, 2015; Knight and Pyke, 2012).

Our computational framework follows these possibilities to supplement two established methods of teaching translation in a world literature classroom. The first method consists of placing a work of literature in translation – a passage from a novel, for example – in relation to the same work in its original language. The hope is that students can analyze the differences between the original and the work in translation and, thereby, understand the way the literary text undergoes a 'new life' in translation. This method not only assumes a sophisticated knowledge of foreign languages among the majority of students but also brackets the question of how these works are read in the original language by native-language readers. The second method juxtaposes several translations of the same work into English. This has the advantage of not assuming knowledge of a foreign language. For example, students might read various translations of the 19^th-century French poet Baudelaire in English, starting from English translations from the late 19^th century and continuing through translations that have been published in the last twenty years. The advantage of this method is clear: students can easily see, in their own language, the way translation can change the meaning of a poem as they read it from one translation to another. While this method successfully bypasses the problem of foreign language competency, it amplifies the second problem: the student is even more divorced from the source-language context since all emphasis is placed on the way English speakers respond to a work of literature.

New methods, therefore, are needed to expose foreign texts and foreign contexts. Using computational methods to enter the themes and arguments about specific texts in other languages precludes some of the need for high-level competency in a foreign language in order to understand the debates about world literature in a non-American – and especially non-English – context. Automatic translation and topic modeling facilitate encounters with Wikipedia 'talk' pages in other languages. The rough output of these automated and annotated procedures preserves some of the estrangement of working across languages as the translation is rough, clearly communicating its nature as translation — and therefore serves its purpose without fully replacing the source text. With such experiments, our project broadly addresses two questions: how different language populations create summaries that are culturally distinct, and how these differences can be folded back into meaningful encounters for readers of works in translation. The overall method for this project is to identify the subset of well-documented and significant works, mine the relevant Wikipedia entries and conversations, and develop preliminary code to identify the linguistic features of the entry (e.g. topics, use of modals, noun density, phraseological structures, complexity measures, etc.) and the nodal points of the crowd's conversation that yielded the page. As an example, consider the discrepancies across the Italian and English discussion pages of The Odyssey / Odissea as represented by ten 5- to 10-term topic models.

Table 1. Named Topics Models from English and Italian Discussion Pages of Omero/Homer's The Odyssey / Odissea
Terms, Italian Discussion Page	Topic Name
odissea originali palla contributi registrarmi	Contributions and Registration
wikipedia forum migliorarla posto figlio arrivassero tenter tentare importanza	Need for More Expert Contributors
ulisse niente aggiunto dante manchi riferimento elenco procedo cielo parte pietose condizioni	References to Dante
poseidone divina generale voglia	Unclear
commedia incontra quesiti accecato ostacola competenti voce	Other Web Sources
ripristino aggiungerei siti porre partenza traducendo manna materia versione	Older Versions are Better than Newer Versions
dir notizie rete evidente polifemo	Unclear
inglese qualcuno pensa piccola inferno appositi pagina discutere film serve deciso	Using English Wikipedia to Backfill Italian
esattamente pecca opere fin tema passo potere utenti	Reflections on Contributors
pare so troia attendo basandomi mesi trova provvisorio	Reflections on What to Write

Terms, English Discussion Page	Topic Name
odyssey homer work titles guideline searching works section promotional directed	Editing Guidelines and Work Title
february page common dab crazynas people epic redirect iliad dictionary	Genre and Other Works
talk odysseus december journey extant account term proposal giu september	The Journey
word article wp called style subject similar adventures locations toronto	Locations in the Story and Writing Style
utc play topic primary part poem link note odisseus july	Poetry
title article refer word davidiad edited cynwolfe don titles voyage	Voyages
edit western preceding map request www apology talk zcc suggest	Mapping the Journey
medea akhilleus euripides noun don current literature click crazynas english	Characters
utc april added source staged meant argument written fall cite	Sourcing and Staging
talk comment unsigned university musical oldest review tedickey tomb point	Ceremony

What comes across in this comparison is the somewhat different concerns of the two reading communities. The Italian discussion reflects concerns with the expertise of the editing community, a social reflection common to Wikipedias, and with connections between Homer and Dante, a figure more central to Italian literary identity. The English page reflects a concern with the Odyssean journey and its possible real-world correspondences. More commonality was found in another example comparing the discussion pages of J.D Salinger's The Catcher in the Rye. These alignments and misalignments of reader's concerns can destabilize the primacy of concerns held by a given reading community and speaks to one of the core benefits of reading world literature.

Bibliography

Bayliss, G. (2013). Exploring the Cautionary Attitude toward Wikipedia in Higher Education: Implications for Higher Education Institutions. New Review of Academic Librarianship, 19(1): 36-57.
Callahan, E. and Herring, S. (2011). Cultural Bias in Wikipedia Content on Famous Persons. Journal for Information Science and Technology, 62(10): 1899-1915.
Ensslin, A. (2011). 'What an un-wiki way of doing things': Wikipedia's Multilingual Policy and Metalinguistic Practice. Journal of Language & Politics, 10(4): 535-61.
Filatova, E. (2012). Information Overlap in Multilingual Wikipedia and Summarization. International Journal of Cooperative Information Systems, 21(4): 383-403.
Goldberg, D., Looney, D. and Lusin, N. (2015). Enrollments in Languages Other than English in United States Institutions of Higher Education, Fall 2013. Modern Language Association. First published online February 2015. https://apps.mla.org/pdf/2013_enrollment_survey.pdf.
Gorard, S. and Selwyn, N. (2015). Students Use of Wikipedia as an Academic Resource—Patterns of Use and Perceptions of Usefulness. The Internet and Higher Education , 28: 28-34.
Kashima, Y. (2000). Maintaining Cultural Stereotypes in the Serial Reproduction of Narratives. Personality and Social Psychology Bulletin, 26(5): 594-604.
Knight, C. and Pyke, S. (2012). Wikipedia and the University, a case study. Teaching in Higher Education, 17(6): 649-59.
Li, H., Graesser, A. C. and Cai, Z. (2014). Comparison of Google translation with human translation. Proceedings of the Twenty-Seventh International Florida Artificial Intelligence Research Society Conference. Pensacola Beach, FL, May 2014, pp. 190-5.
Ni, X., Sun, J., Hu, J. and Chen, Z. (2009). Mining multilingual topics from wikipedia. Proceedings of the 18th international conference on World Wide Web. Madrid, Spain, April 2009.
Nothman, J., Ringland, N., Radford, W., Murphy, T. and Curran, J. (2013). Learning Multilingual Named Entity Recognition from Wikipedia. Artificial Intelligence , 194: 151-75.
Purdy, J. (2009). When the Tenets of Composition Go Public: A Study of Writing in Wikipedia. College Composition and Communication, 61(2): W351-W373.
Sweeney, M. (2012). The Wikipedia Project: Changing Students from Consumers to Producers. Teaching English in the Two-Year College, 39(3): 256-67.
Venuti, L. (1998). Scandals of Translation: Towards an Ethics of Difference. London: Routledge.
Vetter, M. (2014). Archive 2.0: What Composition Students and Academic Libraries Can Gain from Digital Collaborative Pedagogies. Composition Studies, 42(1): 35-53.