DH 2016 Abstracts

Tackling Terms In Furetiere’s 'Dictionnaire Universel'

First published posthumously in 1690, Furetière’s Dictionnaire Universel had aroused controversy well before its publication. Nevertheless, it was quickly followed by an enlarged and corrected version edited by the protestant scholar Henri Basnage de Beauval. It is this extended version of 1701, with its broad coverage of terminological language that is our subject.

The Furetière project seeks to render the entire 1701 dictionary available as an open access digital resource in an XML-TEI compliant format. Given the size of the task, and the current total lack of finding with little hope in the short term, the first stage will be an attempt to map and describe terminological coverage by reference to a small number of themes, namely architectural, legal and maritime terminology. This paper will demonstrate the mapping procedure being carried out using the Atlas Ti Computer-Assisted Qualitative Data Analysis Software (CAQDAS) and the building of a model for the TEI encoding.

1. Historical background

Furetière was both a member of the French Academy and participant in the dictionary building team, hence the uproar at his publishing what was seen as a rival dictionary. This explains why a Catholic priest should end up being published by the protestant publisher in the Netherlands, Arnaud Leers. The dictionary was a success, but still needed much revising. This was carried out by the French émigré and scholar, Henri Basnage de Beauval. Despite being the most complete edition, only one attempt at digitising was ever made (Wionet and Tutin, 2001). Our aim is to encode the Basnage dictionary but with cross references to the edition of 1690 and the publication of Corneille (1694) so as to see how the terminological entries evolved and to what extent complementary information can be found in the Corneille dictionary (Williams, Forthcoming).

2. Using CAQDAS

One important task for the Digital Humanities community will be to bring or adapt existing tools to disciplines that are often less digitally aware. Thus, in using a CAQDAS tool to explore the three dictionaries, we aim not only to use a very powerful tool to allow pre-digitising analysis and mapping of data, but also to bring this technology into the sphere of literary analysis.

CAQDAS were essentially created to meet the needs of sociologists carrying out rigorous qualitative analyses on data from multiple supports. Literary specialists also carry out qualitative analyses, but often tend to use highlighters to work on printouts of PDFs. By using a CAQDAS to work through a document as big and as complex as the Furetière’s dictionary, we aim to show how an electronic highlighter that allows coding and network analyses can be used in humanities research, and particularly in our own.

There are a number of commercial CAQDAS on the market, and only one open-source tool. Although open-source tools are important to the community, we have adopted Atlas Ti as being a very powerful tool that is evolving rapidly with new functions.

The 1701 Basnage edition is over 4000 pages of very tight text. Text quality is poor which precludes use of OCR. Whilst it is structured, we are in a period of great experimentation in dictionary compilation leading to a complex meta and microstructure. The only way forward is to read the text. Using Atlas Ti, we skim the pages so as to locate and code entries designated as terms, to find the different introductory formulae, as well as spotting potential search formulae for unmarked terms, and to list the domains and crafts to which they belong. The coding system allows us to carry out a bottom-up conceptualisation of the dictionary with the quotations allowing us to create a headword list that links directly to the entries in the PDF file. Knowing the terminological domains covered by the dictionary is interesting in itself, but it also gives further keys for reaching unmarked terms. Networks allow us to reorganise domains and crafts into groups, such as legal terminology – represented by several domain names- and maritime terminology, which is often closely linked to language of fortification, architecture and law. Atlas Ti allow to output data in a machine-readable format so that the headword list could be transformed as an organised lexicon for use in the XML encoding process.

3. XML TEI

Use of Atlas Ti as an electronic highlighter with all its coding and management functions does permit a full qualitative analysis of the data. However, it is still not possible to share the data itself. Digitising the entire dictionary is a mammoth task that is only feasible using crowd sourcing over a long period of time. Marking up only thematically designated terminological domains allows us to create and test a model for the data as well as making available machine readable data rapidly. The dictionary is unique in having definitions accompanied by lengthy examples and citations thus providing both technical and phraseological information that will only be retrievable using in-depth mark-up. Whereas Wionet and Tutin (op.cit) marked up one letter, our plan is to attempt to follow the terminology through the entire dictionary.

The first stage consists of marking up terms from a small number of highly productive thematic fields isolated using the CAQDAS analysis. The first field to be explored in depth is related to maritime activities as designated by ‘terme de marine’ (maritime term) so as to open a collaboration with French historians so as to compare an élite vision of naval terminology as compared by the situation in a major naval port, that of Lorient which founded in 1666 by the Compagnie des Indes, and then became a major naval base from 1703.

Our aim is to illustrate the decisions taken and how these affect output through visualisation, but also analysis using linguistic analysis tools as TXM and a database system as BaseX. Once more advanced, the data will be put online using a query interface. At the moment, we are sharing code on GitHub under the name Basnage.

Mark-up is being carried out using Oxygen so as to mark-up using the TEI guidelines and use XQuery to ensure consistency. Entries can be extremely complex, as will be illustrated by reference to the verb Abatre. This means that the TEI guidelines do not always adapt easily to what is found in the text. However, we are endeavouring not to customise the guidelines so as to retain full compatibility with other dictionaries and allow easier linking with source texts.

The dictionary tends to group words orthographically so that there is a main headword in large capitals, which also carried the grammatical information, but then a series of subentries that generally have the headword in small capitals. These subentries may include the specialised terminological usage. To complicate affairs, polysemy is illustrated within an entry with short comments and examples. To handle this, we are using <superEntry> to group the whole, and then <entry> for what might be considered as subentries. <Sense> is used to cover polysemy within a given entry. The main entry for ABATRE has three main senses illustrated by a series of synonyms, with each sense accompanied by numerous examples, and occasionally a citation with bibliographic reference. The examples frequently contain collocations that activate a particular synonym, it would be useful to mark-up and illustrate this. Similarly, citations generally only give a link to a person, often via an abbreviated name, as in MEN for Ménage or ABL for M. d’Ablancourt. These are generally listed at the beginning of the dictionary, but it is not always the case and sometimes the abbreviations are inconsistent. Sometimes, even if a text is not named, it is possible to link to a source document. Given that Basnage was a prolific letter writer at the centre of network of European scholars, and the author of the Histoire des Ouvrages des Savants, and important task will be to cross link to this valuable source of information.

Collocation mark-up means that we can link dictionary analysis to words in a wider context as fund in Frantext or, when a text has already been digitising by using a language analytical tool as TXM. Mark-up also aims to make an onomosiological analysis possible using BaseX rather than simply presenting the data in a linear semasiological format.

4. Conclusion

This is a mammoth mark-up task which we believe is rendered easier by mixing tools so as to permit on-going analysis whilst gradually digitising the whole into XML compliant TEI. This strategy means that other scholars can use data without having to wait for the entire digitalisation process to be completed. In so doing, we seek to explore data whilst collaborating in the dissemination and improvement of digital tools and also a contribution to the art of digital mark-up of early dictionaries.

Bibliography

Williams, G. (Forthcoming). Le temps des termes: les termes et la phraséologie dans les dictionnaires du 17 siècle. In De Giovanni, C. (Ed.), Fraseologia E Paremilogia: Passato, Presente E Futuro. FrancoAngeli: Milan.
Wionet, C., Tutin, A. (2001). Pour informatiser le Dictionnaire universel de Basnage (1702) et de Trévoux (1704): Approche théorique et pratique. Honoré Champion: Paris.