DH 2016 Abstracts

Corpus Analyses of Multimodal Narrative: The Example of Graphic Novels

1. Introduction

This paper presents first empirical analyses and visualizations of a large corpus of graphic novels – an increasingly popular form of book-length comics aimed at adults – that is currently in the process of being assembled and digitized. We introduce an XML vocabulary and visual editor that we have developed for the annotation of our corpus, and reflect on the challenges presented by a cultural form that is characterized by the complex interaction of text and images. Analyzing the specific narrativity of this and other multimodal cultural forms (including illustrated books and magazines, theater, film, television and computer games), we argue, calls for a combination of quantitative and qualitative methods drawn from such diverse fields as narratology, digital art history, and cognitive science. In contrast to corpus analyses of literary texts, which have made great strides in recent years, comparable work on visual narrative still remains in its infancy and at the periphery of the digital humanities. While this can be traced, in part, to copyright issues, such scholarship also faces a number of crucial technical and methodological hurdles – from image description, classification, and object recognition to the operationalization of narratological concepts. Given the dominance of visual storytelling in modern and contemporary culture, overcoming these hurdles will represent an important contribution to the further development of DH research.

The introduction presents our corpus and the wider research questions of our interdisciplinary group. This is followed by a brief overview over the “Graphic Narrative Markup Language” (GNML), which builds on TEI, and the visual editor developed for the annotation of graphic novels, but which is also applicable to other multimodal forms. A version of this editor will be available as open-acess software by the time of the conference. Part two introduces a number of analyses and visualizations combining the study of text and images that make up graphic novels. The final part moves to the quantitative and qualitative analysis of graphic novels with the help of eye-tracking. This approach allows us to study the construction of storyworlds by empirical readers, and thus opens up an aspect of narrative that remains severely underrepresented in DH, and the humanities at large.

2. 1. GNML-Editor: Tools for (Semi-)Automatic Annotation

Whereas the automatic analysis of text corpora has become feasible in many instances, such automation currently remains a pipe dream for multimodal narratives. In the case of comics and similarly hand-drawn, or otherwise non-perspectival, images, object identification depends on lengthy training efforts and registers a relatively high error rate. Similarly, standard OCR programs fail at recognizing the (quasi) hand writing that dominates comics. As a consequence, our corpus study presently depends on manual and semi-automatic annotation. For this purpose, we have developed the XML-language GNML, which builds on TEI and previous efforts by John Walsh (2012), to describe all textual and visual properties of graphic novels. To minimize errors during the annotation process, our visual GNML-editor supports annotators with integrated spell checking and auto-completion mechanisms. An automatic recognition of panels is complemented by a function that recognizes the borders of individual captions, speech bubbles, and characters to accelerate annotation. Further automations, such as an in-built OCR for narrative text that conforms to standard fonts, are currently under development. As the conceptual basis of the editor (visual objects with graphic and textual characteristics) is not limited to comics but can be applied to other text-image combinations in visual culture (from illustrated manuscripts to film and TV), the editor will be generalized for the annotation of such formats.

3. 2. Quantitative Analyses and Visualizations of Graphic Novels

Part two presents approaches that combine image and text analysis for a number of structural features of graphic novels. Methods developed for digital literary studies, such as topic modeling, are of limited value for the analysis of visual culture given the dominance of images. In contrast, studies of large-scale image sets have so far shown little interest in narrative analysis. To complicate matters further, most narratological concepts are drawn from the study of literary texts, and it remains questionable to what extent they can be successfully applied to visual narrative.

In a first analysis of the corpus, which is still in the process of being digitized and annotated, we look at the historical development of the visual and textual elements of about 150 book covers of graphic novels. This includes a grammatical and semantic analysis of their titles with the help of a statistical language parser, as well as the stylistic and visual attributes of their design and cover images. In a second step, we move to more detailed studies of a first sub-corpus that consists of the ten most-cited titles within our larger set of graphic novels. Such a small sub-set does not allow for genre comparisons or for studying historical developments within the form. However, we can consider the narrative features of representative texts within our corpus. In order to do so, we compare a network analysis of characters with their visual prominence and respective share of text, and complement this with a stylistic analysis of the latter.

4. 3. Eyetracking Analysis of Multimodal Narrative

The experimental observation of eye movements has proven a reliable measure of the human processing of text and images, and allows us to form hypotheses about the construction of storyworlds by empirical readers. The final part of the paper aims to show the value of this method by considering excerpts from a first, explorative corpus of canonical graphic novels. In contrast to theoretical scholarship on comics, which has emphasized the primacy of images (Groensteen, 2009), our experiments demonstrate that readers focus most of their attention on the text. Not only is it usually read first, but many images are either not focused on at all, or analyzed purely in peripheral vision. Whether images are viewed depends, among other variables, on their informational content: if either visual aspects or the storyline continue from one panel to the next, it is much more likely that a panel will be skipped by the reader than if they are distinguished more clearly from its immediate predecessor. We also look at the interaction between visual and textual levels: do reading habits differ if text and images refer to distinct storylines? Finally, we report on experiments that focus on comic reading expertise, for which we propose a new empirical measure. In sharp contrast to the reading of text alone, where experience and reading speed are positively correlated, experienced comics readers focus on the visual aspects of the panels for an extended amount of time. This time appears to be invested wisely, since they are able to better understand the story, as shown by an empirical content test. Taken together, these results suggest that the text and image work together to transmit the narrativity of graphic novels, and that a specific type of expertise is required to understand multimodal narratives. This maps well onto the hypothesis that comics and other forms of sequential art use a particular kind of visual language (McCloud, 1993), which has been analyzed in psycholinguistic terms by Cohn (2013).

5. 4. Summary and Conclusions

We present a new DH project aiming at collecting and analyzing a corpus of graphic literature, enriched by human annotations as well as by a corpus of eye-movement recordings to measure the momentary distribution of readers’ attention. First example analyses on both global and local levels demonstrate the potential of this approach. A toolchain for description, annotation, and analysis of these data is being developed, and is of potential use for a wider field of studies in cultural analytics of image-related and multimodal material. In perspective, the corpus will be further enhanced by automated description, using features developed in the field of computer vision (Farabet et al., 2013; Krizhevsky et al., 2012; Rigaud et al., 2015; Serre et al., 2007).

Bibliography

Cohn, N. (2013). The Visual Language of Comics. Introduction to the Structure and Cognition of Sequential Images. London: Bloomsbury.
Farabet, C., Couprie, C., Najman, L. and LeCun, Y. (2013). Learning Hierarchical Features for Scene Labeling. IEEE Transactions on Pattern Analysis and Machine Intelligence,35: 1915–29.
Groensteen, T. (2007). The System of Comics. Jackson, MS: University of Mississippi Press.
Krizhevsky, A., Sutskever, I. and Hinton, G. E. (2012). ImageNet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing Systems,25: 1097–1105.
Lowe, D. G. (2004). Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision,60: 91–110.
McCloud, S. (1993). Understanding Comics: The Invisible Art. New York, NY: Harper Collins.
Rigaud, C., Guérin, C., Karatzas, D., Burie, J.-C. and Ogier, J.-M. (2015). Knowledge-driven understanding of images in comic books. International Journal on Document Analysis and Recognition,18: 199–221.
Serre, T., Wolf, L., Bileschi, S., Riesenhuber, M. and Poggio, T. (2007). Robust Object Recognition with Cortex-like Mechanisms. IEEE Transactions on Pattern Analysis and Machine Intelligence,29: 411–26.
Walsh, J. (2012). Comic Book Markup Language: An Introduction and Rationale. Digital Humanities Quarterly, (6–1).