One of the most discussed issues in Biblical studies in the past 15 years is the history of Biblical Hebrew. Standard works on this issue assume that one can distinguish between Archaic Biblical Hebrew, Early Biblical Hebrew and Late Biblical Hebrew (Saenz-Badillos; 2004, Hurvitz; 2013). This position has been challenged in a number of recent publications, in which the authors state that the variation that can be found in Biblical Hebrew is better explained by assuming that there have been different styles of Biblical Hebrew (Young, Rezetko and Ehrensvärd, 2008) in use throughout the biblical period, which is roughly the whole first millennium BCE. A complicating factor in the research on the biblical texts is that we know relatively little about their transmission in the centuries after their composition.
In general the manuscript used for research on Biblical Hebrew is the Codex Leningradensis, which was created in 1008/1009 CE. There exist older manuscripts of the Hebrew Bible; by far the most important ones are those found in the Qumran Caves which can be dated to the beginning of the Common Era, but many of these manuscripts survive in a fragmentary state.
Many studies on the diachrony of Biblical Hebrew are concerned with Hebrew vocabulary. These have resulted in long lists of early lexical items that were supposed to be replaced gradually by late alternatives. These late alternatives can often be identified as loans from languages like Aramaic and Persian (Young, Rezetko and Ehrensvärd, 2008).
One of the problems of studying vocabulary as a gauge of linguistic change is that the vocabulary could have been manipulated easily during the process of transmission. Scribes could change words, thereby consciously archaizing the language of a text.
In order to solve this problem, we study syntax instead of vocabulary. Forming sentences takes place on a less conscious level than choosing words, and therefore this is a better way of studying continuity and change in the history of Biblical Hebrew. In our study we investigate the use of prepositions accompanying a whole range of verbs. In the literature on linguistic variation in Biblical Hebrew the use of prepositions and other function words in various contexts has been studied before, but this has always been done in a very restricted way. Sometimes only a few biblical texts had been studied or the data had been extracted from one manuscript exclusively (Hornkohl, 2014:218-38).
In our research we will focus on verbs of motion and on stationary verbs. In the former category we find verbs like בוא (bōʔ, to come), עלה (ʕālā, to go up) and יצא (yāṣā, to go out), in the latter we find verbs like ישב (yāšav, to sit) and עמד (ʕāmaḏ, to stand). These verbs have in common that in most cases they have a locative as complement, which is often introduced by a preposition (Oosting, Dyk and Glanz, to be published). It is known that various prepositions can be used with a given verb and this variation can be found in parallel texts in the Codex Leningradensis, within specific biblical books and between different manuscripts (Kutscher, 1974).
The use of function words like prepositions is well known in authorship attribution (Argamon and Levitan; 2005, Garcia and Martin; 2007, Segarra, Eisen and Ribeiro; 2013), but in the case of ancient religious texts, detecting the author of a text is a controversial issue. Not only could texts have been adapted during the transmission of the complete text, also the composition of a text may have had a long history. Therefore we do not try to find the supposed author of a text, but based on the study of prepositions accompanying verbs of motion we would like to find out what is the main factor of the variation in the use of these prepositions. Is it related to diachronic development of the Hebrew language or to the way the texts were transmitted or both or are there still other options? We investigate these issues by comparing the thousands of instances of prepositions accompanying verbs of motion and stationary verbs in:
1. different books in the Codex Leningradensis (Genesis, Exodus, etc.)
2. different manuscripts (e.g. compare Isaiah in the Codex Leningradensis with the Great Isaiah Scroll)
3. parallel texts.
This kind of research can only be conducted in a proper way if the textual data is in place, not only for the research proper, but also for those who want to reproduce this research later on. Therefore we base our research on the Amsterdam Hebrew Text Database (Van Peursen et al., 2015). This database is Open Access and can be downloaded from DANS, a national research archive in the Netherlands. Without downloading, the material in the database can be accessed through the website SHEBANQ (https://shebanq.ancient-data.org and Roorda, 2015b). Here the text of the Codex Leningradensis can be browsed, and while doing so, the user has access to a wealth of annotations that represent linguistic information and additional observations. In particular, there is an extensive set of cross-reference annotations between virtually all parallel passages (Roorda, 2015a). With the help of clustering techniques and entropy calculations we are investigating the challenge of linguistic variation in Biblical Hebrew and the transmission of the biblical texts. Results of others (Hornkohl, 2014:218-38, Rezetko and Young, 2014:380-83) and ourselves show that this approach leads to significant progress in our understanding of these issues.
The relevance of this research for digital humanities in general is that it explores challenges such as working with ancient languages of which we have only fragmentary evidence and (religious) texts with a long history of composition and transmission. While there is a lot of literature on these issues in traditional studies, it is clear that digital methods of research have not been pursued to their full potential yet.