The methodical discussions on distant reading (Moretti, 2013) or macroanalysis (Jockers, 2013) have contributed to the awareness of quantitative methods in literary studies, a discipline which many assumed to be the last (and impregnable) stand against the onslaught of quantitative approaches in the Humanities. But actually this late fame is built upon many years of research. Over the last decades, research has been done on style, on aspects of the fictional world like plot and character, on aspects of text groups like genre or historical discourses etc. Most of the research in these fields follows by and large an empirical paradigm. Thus the situation in literary studies is structurally similar to the situation in social science disciplines like psychology or sociology in the past. All of them were using mainly hermeneutical approaches (including some anti-hermeneutical approaches using the same basic methods) until at some point new researchers introduced empirical methods into the field. Interestingly enough, in each case the adoption of empirical methods and the restructuring of the discipline followed its own logic and produced a unique history. In psychology, non empirical methods have been marginalized, and in sociology there is mix of quantitative and qualitative research. This very rough outline does not do justice to the many negotiations and disruptions in the fields, but it is enough to substantiate a point: the adoption of empirical methods in literary studies will probably also be a rather unique story. At the moment, we are right in the middle of it. It seems that empirical methods applied to one or very few texts haven’t yet yielded results interesting to larger parts of the fields – with the notable exception of stylometry – while their application to many texts opens up new research possibilities interesting to many. As far as we can see, literary studies is thus in a similar position as some other disciplines in the humanities, they also have their own history of adoption of empirical methods (like linguistics) or are now right in the middle of it.
In this situation, this panel wants to reflect on one aspect of this process: what happens to central concepts of a discipline in these negotiations and disruptions. We want to discuss questions like these:
Text as a theoretical concept became obvious for the librarians in ancient Alexandria, who realized that various copies of the same literary work tend to differ, in terms of several textual variants introduced in various papyri (Turner, 2014). This led to the idea of the archetype (an ideal representation of the “original”) that shines through imperfect copies. Since then, philology became an art of textual archeology, where the editor played the role of a demiurge and the author’s advocate. In the era of electronic resources, this concept has changed substantially, not only in digital scholarly editions, but also in quantitative text analysis. Traditionally understood as a vehicle of (hypothesized) authorial intention, the text became a sequence of characters bearing some information. A careful philological attitude towards author’s words was replaced by: (1) distant reading, which in fact means analysis involving no reading at all, (2) focusing on the most frequent words rather than on elaborated rare vocabulary, (3) distorting the original word order by applying the “bag-of-words” model of analysis, (4) high tolerance to textual errors, e.g. imperfect OCR. Such a deep redefinition, however, allows for assessing the history of literature on an unprecedented scale (Moretti, 2013; Jockers, 2013).
Main literary genres such as poetry, play, and novel are divided into different subgenres (mostly called “genres” in daily use). The advances in digital corpora and stylometric tools have broadened the genres that are being studied. Science fiction researchers Nichols et al. in 2014 severely criticized genre theory, stating that it has made ‘little tractable progress answering the questions “what is a genre?” and “How is one genre distinguished from others?” in the previous decades’. Nichols et al. discussed the development of a methodology to test genre-related hypotheses in a verifiable and falsifiable way, focusing on science fiction and fantasy. They made use of readers’ responses and text analysis to find out how subgenres can actually be distinguished. Elsewhere, Jannidis and Lauer (2014) presented a quantitative analysis in which genre played a role. Using Burrows’s Delta on (amongst others) the work of Goethe, they showed that this highly successful method for authorship attribution also works well in distinguishing genres, confirming research by others in this area. This should lead to renewed attention for what a genre is and how genres can be distinguished, thus fulfilling what Willard McCarty wrote in 2010: “Computing machines and scholarly intelligence change each other, recursively”.
The concept of “character” has been intensively researched in the last 20 years. Building on (or fighting against) ideas from the 1960s and 1970s there have been proposals to conceive “character” more as sign or more as a mental model. In digital literary studies there have been attempts to apply these results directly (Zöllner-Weber, 2008), preserving the complexity of the models but basically presupposing a manual analysis of the text. Social network analysis on the other hand provides tools to analyse many and complex social networks and has been applied successfully to plays and novels offering an insight into character constellations (Elson et al., 2010; Trilcke and Fischer, 2015). In this context the question how to identify references to characters in a text, which seemed trivial to traditional literary studies, has been researched in depth emphasizing the difference to named entities in non-fictional texts like news (Elson, 2012: chap 2.4). A recent attempt to address the issue of character types (Bamman et al., 2014) also shows the widening gap between the different modes of modeling literary concepts which will make it difficult to introduce these results into the mainstream discussion on characters.
Already some of the earliest ventures into the stylometry of translated texts focused on the balance between the translatorial and authorial voices. For a while, it seemed that Venuti’s concept of translator invisibility (1995) found vindication in stylometric analysis: in machine-learning classification experiments, translated texts tended to group according to the original author rather than the translator – surprisingly so, since the results were based on translator-generated most frequent words that have little one-on-one correspondence to those in the original (Rybicki, 2012). The application of new visualization methods such as network analysis has allowed to bring out the signal of the translator. The main challenge now is to identify the mechanism of the interaction between the two. Possible directions in this respect include expansion of linguistic features used in analyses, comparison of topic models obtained for originals and their translations, application of text reuse detection tools to translations of the same text and, at the same time, further attempts to bridge the gap between empirical evidence and linguistic theory.
When topic modeling (Blei, 2003) was transferred from information retrieval in expository prose to literary studies, it brought with it the promise of a sophisticated access to the themes in large collections of literary texts. However, “theme” and “topic” are far from identical (Schmidt, 2012; Rhody, 2012): rather, the semantic relation between words grouped in a topic may concern, among others, personnel, setting, narrative motives, or metatextuality as well as (depending on preprocessing) character names, specific registers, or foreign-language terms. Usually, only a minority of topics represent abstract themes. This can partly be explained by the fact that literary texts, unlike expository prose, frequently enact rather than explicitly discuss their most important themes. It also shows that, unlike literary scholars, topic modeling does not distinguish background and foreground information. Finally, the fact that topic modeling is entirely unsupervised and data-driven challenges established hermeneutic strategies. Nevertheless, it seems that established concepts from literary studies such as motive, setting, personnel and metatextuality may help better appreciate the complexity of topic models derived from literary texts. While literary texts challenge any simple understanding of topics, literary studies appears well-equipped to deal with their complexity.