As Virtual Reality emerges as an accessible technology, researchers have begun to experiment with its potential for data visualisation. In this paper I will use two particular VR visualisations of linguistic data as a springboard for discussing the affordances of different immersive technologies for research.1
The first visualisation is a geographically-anchored walkthrough of data held in the PARADISEC linguistic archive (Thieberger and Barwick, 2012). A user moves across a representation of a geographic region and sees markers on the landscape representing the metadata of the relevant PARADISEC materials for that location: number of speakers, amount and diversity of material held. Audio and text appear when the user gazes at a marker. Looking up, the user can also see the historical relationships between the languages as a network. Such a visualisation could be adapted to represent the holdings of other kinds of archives, so the discussion in this paper has implications for digital humanities more generally.
The second visualisation is a more abstract three-dimensional cloud of coloured points laid out on three axes. The user is located initially in the midst of this cloud, and can move through it in any direction. The colours, sizes and shapes of the points, as well as their location on the x, y, and z axes, can be mapped to variables of any kind of data, and therefore this too has implications beyond linguistic research. For my purposes, they represent linguistic features derived from the World Atlas of Language Structures (WALS) (Dryer and Haspelmath, 2013). To demonstrate one use case, different measures of linguistic complexity can be graphed on the x, y, and z axes (e.g. WALS features 1A+2A, 22A, 49A, etc). Colouring the points according to the geographical location of the languages, with shape determined by genetic affiliation makes it easy to explore clusters of types of complexity.
Both of these visualisations can be viewed in a pseudo-3D environment using WebGL in a desktop browser. For a more immersive experience, however, they can also be viewed inside a virtual reality headset such as the Google Cardboard or Oculus Rift. Alternatively, the visualisations can be displayed on the inside of a fulldome such as those used in a planetarium. Through a recent successful LIEF grant, Western Sydney University and a number of partners have been able to construct an ultra-high resolution experimental fulldome (‘DomeLab’ http://www.niea.unsw.edu.au/research/projects/domelab).
Display via virtual reality headset is the most straightforward sort of immersion. The viewer is embodied amidst the data, and simple movements from everyday life (e.g. looking around) translate directly into the virtual space. Display of the visualisation in a browser is at the other extreme of (non)immersion. However, many users are familiar with the translation of three-dimensional space onto a flat screen from modern computer games, so it can still give a sense of movement, rotation, and interactivity.
A hybrid of these two experiences, perhaps less familiar, is the dome. The user lies on the floor inside the dome, and the visualisation is displayed on the concave walls of the dome above. Because the user is surrounded by the screen, the experience is far more immersive than the display of a visualisation in a computer browser. Metaphors of movement and space have to be adapted for such a display, however, as most of the display surface is above the user.
This paper will discuss the reasons why we might want to explore linguistic data visually, and what we gain or lose by doing so in the various environments described above. Lev Manovich notes that information visualisation is the representation of datasets in such a way as to reveal structure (Manovich, 2010). Linguists are obsessed with structure. Since at least Pāṇini (4th century BC, see Vasu, 1891), linguists have conceived individual languages as highly structured; since the Neogrammarians if not before, ‘language’ as an abstraction that changes and exists outside of its speakers has also been considered a highly structured object of study; sociolinguists are interested in the structure of speech communities and speaker networks; psycholinguists and neurolinguists in the linguistic structures of the mind and the brain. Linguistics is therefore a field that is primed to search for and find structure in its data, and is constantly searching for new ways to do so.
Manovich also describes data visualisation as a matter of reduction and spatialisation, often involving a remapping of the non-visual to the visual. For linguists, reducing linguistic data to features and remapping it to the visual (audio to text, for example) are business as usual. What I would like to discuss in this paper is the spatialisation of the data, and the embodiment of the researcher(s) within that data space.
A very active branch of linguistics involved in data visualisation is linguistic typology. Typology concerns itself with how languages are distributed within the possibility space of all conceivable languages: i.e. “what’s where and why” (Bickel, 2007). Because linguistic data has an inherent geographical dimension, the predominant linguistic visualisation typologists use is a map. I have followed this lead for my PARADISEC visualisation. However, Manovich also points out that both designers and their audience tend to treat spatial dimensions as primary (Manovich, 2010:8). Therefore it makes sense also to experiment with mapping these to other, perhaps more significant features of the linguistic data. This is the impetus behind my data graph visualisation. By translating the WALS data to a more abstract 3D graph, different relationships, clusters and structures become apparent. In the full paper I demonstrate examples of this.
Data visualisations can have a variety of purposes (see e.g. Dransch, 2000; Purchase et al., 2008). They can be a way to explore data to generate ideas or observations that then inspire future research. They can be an attempt at scientific modelling, instantiating more fully formed pre-existing ideas. Or they can be a way of communicating information to others. An early study of user experience in 2D, 3D and immersive data visualisation (Modjeska, 2000) showed that users generally find 2D representations of the data more efficient, but enjoyment and motivation increases with the degree of immersiveness. This suggests that VR visualisations might be best suited for the playful exploration of data described above than for serious scientific modelling. They may also be well suited for communication of ideas when it is not essential that the audience grasp complex details.
These are therefore the motivations behind the two linguistic data visualisations I describe. With the PARADISEC map visualisation it is not essential that a user come away with perfect recall of what exactly the archive holds, but rather with a sense of its richness, and increased motivation for using the archive in the future. The data graph visualisation is aimed at researchers who want to explore language data creatively in order to generate new ideas.
Finally, I discuss how the affordances of the VR headset versus the Dome depend on one’s beliefs about the locus of knowledge production. In the ‘lone wolf’ researcher model, an individual generates ideas in a relative vacuum, or at least, the important connections for the researcher are among and with the data itself, not with other people. For this model, a VR headset is ideal. In the world of the headset, the researcher is alone with the data: the mundane is quite literally blacked out. The dome, on the other hand, is designed to be experienced communally. The floor is covered with large pillows, and people lie head-to-head or side-by-side. Curtains delimit the boundaries, but this threhold is permeable: people come and go. It is natural to discuss the visualisations above with those who lie beside you in the half-dark. In this way, research conversations become a kind of academic pillow-talk, naturally imbued with the playfulness, intimacy, and gentleness of such. In the VR headset, you become part of the data; in the dome you become part of a community.
My development of the linguistic visualisations discussed here was funded by a transdisciplinary grant from the Centre of Excellence for the Dynamics of Language (COEDL).