The goal of the study is to show links between lexical and social diachronic change. The study is conducted in the culturomics framework (Michel et al., 2011). In contrast to the Big data approach the study promotes the idea of medium data, i.e. amount of data which allows both to make quantitative and qualitative analysis (Bonch-Osmolovskaya, 2015). The main characteristics of the medium data are:
The research is based on the data from Russian National Corpus (ruscorpora.ru) (see Plungian, Sitchinava, 2003). The study pursues changes of context frequencies for the lexeme road in the period from 1800 till 2000, and correlates the observations with social and economic progress as well as change in conceptual language space
Russia is a big country, so transportation has been traditionally a critical problem. The choice of the word road for culturomics study is based on our expectations of the concept’s centrality for the economy, society and culture in Russia of the 19 th–20 th centuries. Road appears to be a productive sign in terms of semiotics of art (Tchepanskaya, 2003,), that’s why I expected to collect numerous relevant contexts both in fiction and nonfiction. At the same time road in Russian has several meanings, the nature of its polysemy has been treated a lot in previous works (Arutiunova, 1999). We can distinguish three basic meanings which are contrasted by the position of Observer (Paducheva, 2006) – the one that percepts the road. The first meaning is road as a physical object, a line on the ground the observer sees while standing on it. It can be characterized by the quality of its surface or surrounding landscapes (i.e. dirty road). The second meaning is road as a vector, a line on a map, that connects two points (i.e central road). The observer operates in this case with the abstract idea of the road’s topology. The third meaning is metonymical and it stands for the travel-event the Observer experiences while moving along the road (i.e. tedious road). Finally due to semiotic abundance road is frequently used in metaphorical sense (i.e. life path = “road of life”). At the same time, the first three meanings present the most important parameters that determine mobility of population: quality of roads, connectedness between localities and time and quality of journey. Therefore, it seems insufficient to track frequency change of road occurrences in the corpus in general, but it is important to distinguish how the frequency of different meanings has been changing.
Different meanings of road can be captured by attributive constructions as adjectives usually refer to only one sense. The corpus has been divided into 7 time periods from 1800 to 2000. To make the sub-corpora comparable the 19 th century has been divided into two periods of 50 years and the 20 th century into five periods of 20 years. The contexts, containing constructions of adjective plus road has been extracted from every sub-corpus. The noisy entries has been removed, the data has been lemmatized and normalized as ipm. As a result, I obtained a database with 15000 constructions, containing more than 1500 unique adjectives.
On the next step, all the adjectives have been categorized by 20 semantic domains. The domains correlate with four basic meanings of road defined below but render more specific characteristics of different road parameters. The most frequent construction “zheleznaya doroga” (literary metal road, meaning railroad) has been selected in a separate category.
Then I applied hierarchical clusterization to the data of 20 categories, see Figure 1
The data of the categories in one cluster has been summarized and then plotted on the graph (see Figure 2)
The data allows plenty of research scenarios, comparing different domains, such as, for example:
In this abstract, I will focus on the most prominent changes of cluster graphs. As Fig 2 shows the railroad (RR) cluster and the direction and centrality(D and C) cluster are the most distinctive in their behavior. In the beginning of the 19 th century, the existence of big central roads from one town to another completely determined mobility opportunities of Russian population. We see that more than 50% of all the occurrences of road are associated with D and C attributes ( Warsaw road, big road – as a specific term of central road). In the 1851, the railroad between Moscow and St.Peterburg has been open and this fact nicely correlates with the crossing of the RR and D and C graph in the period of the 1850s. The intensive growth of the RR cluster in the second half of the 19 th century reflects not only the growth of railroad communications in Russia but also great conceptual influence of the railroad innovation, which can be also traced in numerous literary pieces of this period such as Tolstoy’s Anna Karenina or Dostoevsky’s Idiot. The intriguing fact about RR cluster is its consistent fall in the 20 th century that may of course correlate with developing automobile transportation. The sharp fall of RR cluster in the 1960s corresponds to the growth of civil airlines; see Figure 3 that demonstrates quite the opposite trend for air transportations starting from the 60s
The most important generalization that arises from the observations above is that in the 20 th century the topological (vector) meaning of road is consistently fading while the reference to a road as a physical object on the contrary increases in frequency. In other words, while economic and industrial growth results in diversity of mobility means, road as a concept in lexical space has changed the balance of its meanings reducing the connection idea. At the same time the metonymic usage of road as a journey has been increased in the 20 th century as a well as the metaphorical usage, the both categories are very similar in their data values so they have formed one common cluster. In 1960s, the connection idea is transferred from direct usages of D and C cluster to figurative usage of Journey and Metaphor cluster. This means that we can document the moment when the idea of connection is separated from the physical movement along the road. The tedious road now is sitting in the airport for many hours waiting for your flight.