This experiment takes place within the “Haine du Théâtre”1 Project, which aims at analysing theatre debates in Europe by using scientific approaches and critical editions of polemical texts. The reflections of the team were primarily focused on the discovery of the circumstances and arguments used in theatre all across Europe, not limited to France, but including England, Spain, Italy, and the Germanic area. The timeframe encompassed the last decades of the 16th century up to the beginning of the 19th century. The purpose of the Project is to explore the grey areas of the controversies in order to outline a global overview of the situations which led to these polemics, discovering where and how they began, their chronological discrepancies in the different countries, and the links between them and their contemporary resurgences.
The total collection of the Haine du Théâtre Project related to France made up of 300 texts in the PDF format. The XML/TEI critical edition of 27 texts has been achieved manually (examples of the main titles of this TEI corpus are D'Aubignac (1666), Conti (1666), Pierre Nicole (1667), Voisin (1671), Vincent (1647), cfr. <http://obvil.paris-sorbonne.fr/corpus/haine-theatre/>) and this small corpus is used for the semantic analysis we present here.
This collection is a homogenous combination of texts in which the different authors express their approval or their condemnation of theatre. Using the ontology we created, our interest was to discover to what extent the personal judgment of each author was celebratory or derogatory about theatre and, also, which arguments were mostly involved in their critiques.
In this direction, we had two main goals:
An ontology is a good way to automatically analyse many texts together along two parameters: documentary and linguistic. On the one hand, by improving a state of knowledge about 17th century French ; on the other hand, by refining the vocabulary linked to theatre controversy.
The ontology organizes the knowledge of our domain (polemical texts about theatre) as structured points of view (condemnation, defence, etc.) as well as 44 structured classes (concepts) related to critical controversies.
These classes report the axiological points of view (the judgment of the authors), their objects (such as jongleur, actrice, etc.), and the thematics of the polemics ( religion, emotions, etc.).
To detect the salient terms pertaining to each concept, we realized that in most texts of the modern period, authors' judgements revolve around quoted authorities. The context and the deep knowledge of the corpus together with expertise in 17th century French constitute the cornerstones of our approach. It is very important to fully understand the contextual meaning (in the 17th century) of the selected salient terms.
The idea we came up was to look at the relative importance of the various semantic fields in each chapter of the collection, then extract a priori the content of a chosen text. We began to use this structured lexicon of outstanding terms related to theatre polemics for the corpus annotation and its analysis.
To our knowledge no other comparable ontology about theatre polemics exists; therefore, we will present the method we conceived keeping in mind that those classes are not exhaustive and that new questions will require the creation of new classes.
Despite the high specificity of the domain, this ontology model and the automatic annotator are deployable in other contexts, such as literary criticism and theatre critique in different languages. For this purpose we will translate the model into English (the set of the attributes useful for the annotator).
We created an annotator in order to markup all the forms of the outstanding terms, recorded as lemmas in the ontology (“horrible”, “horribles”, etc.), except when:
The annotation tool enriches automatically the TEI files by setting down some <term> tags to mark all the forms, but also their lemma and the related semantic field.
Following this method we find a high density of salient terms (on average 9,5% of words in each the chapters) in our collection of texts. The combination of these linguistic signs and related semantic fields emerges as a semantic descriptor of the corpus content (e.g. http://obvil-dev.paris-sorbonne.fr/corpus/haine-theatre/vincent_traite-des-theatres_1647/vincent_traite-des-theatres_1647_6).
Without preconceived notions and previous knowledge of the corpus, it was possible for us to answer numerous research questions. By examining the intensity of condemnation in the corpus, we discovered that the most derogatory chapters belong to Nicole and Conti.
|chapter_id||condemnation_terms||defense_terms||chapter_length (words)||condemnation intensity (‰)|
Examples of the condemnation intensity.
By virtue of the possibility of making cross queries across multiple classes, we were able to obtain a list of the arguments developed by the authors – which can be confirmed by reading the chapters concerned. For instance, the most derogatory chapters about theatre were written by Nicole and Conti.
|chapter_id||“Théâtre” and “Morale Négative” thematics relative scores2 (‰)|
The most derogative chapters.
We can also compare the chapters more concerned with, for instance, condemnation and the thematic of “Women”. We discovered that the chapters more concerned with misogyny belonged to Voisin, Conti and Nicole. The discriminant terms for woman are, in these derogatory chapters: “femme, fille, bouffonne, maîtresse, comédienne”. Those are just examples of the queries we can construct with the annotation tool.
At this point, the results of the computational analysis reveal some critical conclusions about the textual tradition analysed. Continuing this kind of analysis for every concept present in the ontology, we found that some authors are more concerned with all the elements of theatrical debates, whereas others deal only with some of them.
In particular, we found a high concentration of annotated terms in Voisin, Conti, Nicole and Aubignac, whose chapters usually occupy the first 20 results for most thematics. On the contrary, authors like Vincent, Guillot-Gorju, Le Marcant and Gaule score high only when relating to economic issues.
|chapter_id||“Economy” thematic relative scores3 (‰)|
The concentration of the economic thematic. These results show that the economy is at the core of the descriptions by another group of authors: Vincent, Guillot-Gorju, Le Marcant.
For example, some very specific arguments are associated in the controversy, like women, passion and the economy. Moreover, results have shown that some authors of the corpus focus on only few topics of the polemics, whereas others, like Vincent, concentrate all the topics of the theatre polemic in his 11th chapter.
The HdT ontology organizes the knowledge about the theatre polemics and their vocabulary. In particular, it presents a hierarchical lexicon of salient terms and it reflects the points of view, objects and thematics of the polemic.
This ontology enables the exploration the corpus for research questions: the relative frequency of the salient terms ( lemma) and of their related class ( catégorie d’indexation) is a semantic descriptor of each chapter. Those descriptors can be exploited for the semantic exploration of the corpus, capitalizing on the ontology structure (terms / classes). The researchers can understand the intensity of the concepts they choose to analyse and thus can focus on the most significant chapters of the corpus.