XML Version
Schöch, C., Schlör, D., Popp, S., Brunner, A., Henny, U., Calvo Tello, J. (2016). Straight Talk! Automatic Recognition of Direct Speech in Nineteenth-Century French Novels. In Digital Humanities 2016: Conference Abstracts. Jagiellonian University & Pedagogical University, Kraków, pp. 346-353.
Straight Talk! Automatic Recognition of Direct Speech in Nineteenth-Century French Novels

Straight Talk! Automatic Recognition of Direct Speech in Nineteenth-Century French Novels

1. Introduction

In fictional prose narrative such as novels and short stories, various forms of speech, thought, and writing representation are ubiquitous and have been studied in great detail in linguistics and literary studies. However, beyond quotation marks, what are linguistic markers of direct speech? And just how ubiquitous is direct speech really? Is there systematic variation in the amount of direct speech over time or across genres? Especially for the field of French literary history, where typography is not a reliable guide, we really don't know.

This is regrettable, because being able to quickly and automatically detect direct speech in large collections of literary narrative texts is highly desireable for many areas in literary studies. In the history of literary genres, this allows to observe distributions and evolutions of a fundamental, formal aspect of the novel on a large scale. In narratology, differentiating narrator from character speech is a precondition for more detailed analyses of narrator speech, e.g. with regard to text type (descriptive, narrative, argumentative text). And in authorship attribution, it hereby becomes possible to discard character speech from a set of novels and perform authorship attribution on the narrator speech only, something which may improve attribution.

Against this background, the work presented here addresses both the question of how to identify direct speech in French prose fiction and that of how prevalent direct speech is in different subgenres of the nineteenth-century French novel.

2. Aims and hypotheses

Our first aim has been to use machine learning to automatically identify direct character speech in a small collection of French-language fictional prose. This is less trivial than it seems to be since in the French typographical tradition, direct speech is usually not marked with opening and closing quotation marks (figure 1). Rather, a long hyphen usually indicates the beginning of direct speech, whereas the end is left unmarked. In figure 1, the first highlighted direct speech continues after the insertion revealing who has just spoken (“lui dit-il, tout bas,”; he quietly said to him). In the second example, the direct speech ends after the speaker has been indicated (“dit une voix à la portière”; said a voice at the door). Our hypothesis is that there are enough linguistic markers of direct speech to make it possible to identify it automatically and reliably (for an overview of such markers, see Durrer, 1994).

Figure 1: Detail from Paul Féval, La Louve, 1857, p. 126 (Source: http://gallica.bnf.fr/ark:/12148/bpt6k6366934b)
Figure 1: Detail from Paul Féval, La Louve, 1857, p. 126 (Source: http://gallica.bnf.fr/ark:/12148/bpt6k6366934b)

Our second aim has been to use the best-performing algorithm to identify direct speech in a larger collection of French nineteenth-century novels and to study its distribution. Here, we hope to detect significant differences in the proportion of direct speech found in different novelistic subgenres. Research about other literary traditions supports this hypothesis (e.g. Allison et al., 2011).

3. State of the Art

Speech, thought and writing representation are common topics in narratology and stylistics (Genette, 2008, Leech/Short, 2007). Semino and Short’s 2004 quantitative study finds that direct representation is clearly the most frequent type in their English fiction sub-corpus. Brunner 2015 confirms this trend for her corpus of German short narratives. Here, the percentage of sentences containing direct speech is about 35% and varies widely over different texts (2%-72%).

Frequently, speech representation recognition is an auxiliary step to other tasks, e.g. knowledge extraction or speaker recognition (Krestel at al., 2008, Elson and McKeown, 2010, Iosif and Mishra, 2014, Sarmento/Nunes, 2009). Weiser and Watrin 2012 used a rule-based approach to extract unmarked quotations in French newspaper texts with success rates of 0.745-0.789. Brunner 2015 focuses on speech, thought and writing representation in German short literary narratives. Using machine learning with random forests, she reports an F1 score of 0.87 for direct speech in a sentence-based cross-validation.

4. Data

Figure 2: Distribution of novels per subgenre and decade
Figure 2: Distribution of novels per subgenre and decade

Our text collection contains 127 French novels published between 1840 and 1889. Three generic subsets can be distinguished, each of which is represented by approximately 40 texts: general novelistic fiction (so-called ‘littérature blanche’) is contrasted with specific subgenres, crime fiction (‘policier’) and fantastic novels (see figure 2). The narrative perspective is largely heterodiegetic.

5. Methods

5.1. Manual Annotation

To obtain a gold standard, 40 chapters from 20 different novels were randomly chosen from the collection and annotated manually. 5734 sentences were marked as either containing direct speech or not containing any direct speech; the former also include mixed sentences.

5.2. Preprocessing

To prepare feature generation, preprocessing was performed, the pipeline consisting of the Stanford CoreNLP-Tokenizer and Sentence-Splitter, as well as the TreeTagger for POS-Tagging and Lemmatization.

5.3. Feature generation

We modeled 81 features which we believed to be useful cues for the classification task (see the annex for a ranked list). Features are generated on a sentence-based level and can be divided into different categories:

  • Character-based: e.g. long hyphen marks, exclamation marks, question marks.
  • Lexical: e.g. deictic expressions, interjections.
  • Semantic: categories of verbs, from WordNet and the French equivalent WOLF: e.g. verbs of motion or perception.
  • Morphological: e.g. part-of-speech, verb-tense, lemma.
  • Syntactic features: e.g. number of commas, sentence length.

5.4. Classification

For the binary classification task (sentences containing vs. not containing direct speech), we used an annotation and classification framework developed by Markus Krug (Würzburg) wrapping LibSVM Support-Vector-Machine (Chang and Lin, 2011), Maximum Entropy (Nigam et al., 1999) and Naïve Bayes (John and Langley, 1995) and implemented in MALLET (McCallum 2002). Random Forest (Breiman, 2001) and JRip (Cohen, 1995) were applied using Weka. All experiments were validated using 10-fold cross-validation unless otherwise stated.

5.5. Error analysis

The machine learning algorithms’ incorrect assignments on the gold standard (false positives and false negatives) were manually analyzed in order to detect the errors' underlying causes.

5.6. Automatic tagging of unseen texts

Using the best-performing model, all sentences in the text collection were tagged for containing direct speech or not. The distribution of ratios of direct speech / non-direct speech was calculated for the three subgenres and five decades covered by the collection. Performance on these unseen texts was checked manually on a random sample. (We sampled 2300 sentences, i.e. 100 random sentences each from a sample of 23 novels stratified by ratio of direct speech.)

6. Results and Discussion

6.1. Recognition of direct speech

Table 1 depicts the performance for different conditions.

Table 1: Performance (10-fold cross-validation on the gold standard)
Table 1: Performance (10-fold cross-validation on the gold standard)

Our baseline is using the speech sign (i.e. the long hyphen) as the only feature, which yields an F1 score of 0.734. Random-Forest performs best, with an F1 score of 0.939, which we consider to be an impressive result. Even when excluding the speech sign from the features, we still reach an F1 score of 0.924, much better than the hyphen alone.

After inspecting the models, it becomes clear that only very few features carry strong cues for direct speech, namely (and unsurprisingly) the initial long hyphen. Most other features, taken separately, carry weak signals in either direction, but become relevant in combination.

Error analysis reveals that incorrect assignments (false positives and negatives) are frequently due to imperfect sentence segmentation. Several features which have been previously used to define and recognize direct speech (question / exclamation marks, interjections, verbal tenses) also cause incorrect assignments, especially in the context of homodiegetic narration, where the narrator is somewhat involved in the plot so that his narrator speech is similar to direct speech. Finally, letters are sometimes mistaken for direct speech, which makes sense given that in most of them, one person addresses one or several other people.

6.2. Distribution of direct speech in the corpus

We applied the best-performing algorithm (Random Forest) to the entire text collection. Evaluation shows a certain drop in performance, with a weighted average success rate of 0.844, indicating less-than-perfect generalization. We noted a welcome absence of any strong bias for either direct or non-direct speech. Our results suggest that the average proportion of direct to non-direct speech across the collection is 61% sentences with direct speech (and 39% without direct speech).

Figure 3: Ratio of direct to non-direct speech in 127 novels
Figure 3: Ratio of direct to non-direct speech in 127 novels

While variance is considerable (see figure 3), the proportion of direct speech in French nineteenth-century novels is overall much higher than expected (and higher, for example, than the 35% reported by Brunner 2015 for German novellas).

Figure 4 shows that both fantastic novels and crime fiction have a significantly higher median for proportion of direct-speech than ‘littérature blanche’, but do not differ significantly from each other (for significance tests, we used the non-parametric Kruskal-Wallis test at a significance level of 1%).

Figure 4: Distribution (left) and significance (right) of direct to non-direct speech ratios across three subgenres
Figure 4: Distribution (left) and significance (right) of direct to non-direct speech ratios across three subgenres

Figure 5 shows that only the ratios for the 1850s and the 1880s have a significantly differing level. However, because the decades do not have perfectly balanced subgenre proportions, this is probably due to a subgenre imbalance rather than an effect of the time period.

Figure 5: Distribution (left) and significance (right) of direct to non-direct speech ratios across five decades
Figure 5: Distribution (left) and significance (right) of direct to non-direct speech ratios across five decades

7. Conclusions and Future Work

Using a wide range of linguistic markers allows the reliable identification of direct speech, even in the absence of clear typographic markers. Performance is excellent to good (F1-score of 0.94 on the gold standard, weighted average success rate of 0.844 on unseen texts). Using our method reveals that nineteenth-century French novels contain a large proportion of sentences with direct speech (61% on average). Also, there are previously unseen differences in direct speech proportion for subgenre, but not for time period.

For future work, we plan to use several strategies to improve performance. One is to add more sequential information to our set of features. Examples include the position, inside a sentence, of certain lexical or typographical features as well as linguistic cues preceding and following direct speech. Also, we plan to expand our corpus to make it more balanced in terms of genres and decades. This will allow us to discover genre-related patterns of interest to literary historians in a more reliable manner and assess their significance with more confidence.

8. Supplementary material

Supplementary material can be found at: https://github.com/cligs/projects/tree/master/2016/dh.

9. Annex A: Features used

List of features used, sorted by descending rank by a one-rule classifier.

average merit average rank attribute

74.028 +- 0.168 1 +- 0 79 SPEECHSIGN

71.743 +- 0.16 2 +- 0 57 VER:impf

65.847 +- 0.234 3 +- 0 54 VER:pres

63.893 +- 0.155 4 +- 0 55 VER:simp

63.248 +- 0.136 5 +- 0 6 PUNCMARKDOT

59.48 +- 0.12 6 +- 0 29 MATCHINGPPER_SON

58.835 +- 0.094 7.7 +- 0.64 30 MATCHINGPPER_SES

58.695 +- 0.208 8.1 +- 0.94 24 MATCHINGPPER_IL

58.713 +- 0.104 8.4 +- 0.92 35 VERB_MOTION

58.364 +- 0.083 10.6 +- 0.49 28 MATCHINGPPER_SA

58.344 +- 0.417 10.8 +- 1.78 7 SENTENCELENGTH

58.172 +- 0.078 11.7 +- 0.46 61 VER:subi

57.492 +- 0.091 14 +- 1.41 25 MATCHINGPPER_ELLE

57.422 +- 0.103 14.5 +- 1.36 44 VERB_PERCEPTION

57.387 +- 0.248 14.9 +- 1.51 50 INNERSUBCLAUSE

57.356 +- 0.4 15.8 +- 2.09 48 UNKNOWNLEMMA

57.213 +- 0.07 16.5 +- 1.02 31 MATCHINGPPER_LEUR

57.143 +- 0.162 17.3 +- 1.1 60 VER:ppre

56.672 +- 0.042 20.2 +- 0.98 36 VERB_BODY

56.672 +- 0.115 21 +- 1.84 52 VER:cond

56.62 +- 0.136 21.7 +- 2.1 40 VERB_EMOTION

56.567 +- 0.072 22.3 +- 1.19 26 MATCHINGPPER_ILS

56.497 +- 0.033 23.9 +- 1.3 41 VERB_COGNITION

56.428 +- 0.044 25 +- 1 46 VERB_CONSUMPTION

56.201 +- 0.005 34.5 +- 4.06 20 MATCHINGPPER_VOTRE

56.339 +- 0.176 35.4 +-18.69 32 COMMAS

56.201 +- 0.005 35.8 +- 4.19 21 MATCHINGPPER_VOS

56.201 +- 0.005 35.8 +- 6.4 22 MATCHINGPPER_TOI

56.201 +- 0.005 36.3 +- 4.2 17 MATCHINGPPER_TES

56.201 +- 0.005 37.6 +- 7.35 5 PUNCMARKCOLON

56.195 +- 0.018 37.7 +-13.33 18 MATCHINGPPER_NOTRE

56.201 +- 0.005 38.2 +- 3.16 23 MATCHINGPPER_MOI

56.424 +- 0.296 38.4 +-25.85 47 VERB_COMMUNICATION

56.201 +- 0.005 38.6 +- 6.45 4 PUNCMARKEXCL

56.201 +- 0.005 38.7 +- 3.44 16 MATCHINGPPER_TON

56.201 +- 0.005 39.4 +- 4.82 15 MATCHINGPPER_TA

56.201 +- 0.005 39.6 +- 6.45 3 PUNCMARKQUSTION

56.201 +- 0.005 40.2 +- 8.81 8 MATCHINGPPER_JE

56.201 +- 0.005 41.8 +-10.17 9 MATCHINGPPER_TU

56.201 +- 0.005 43.5 +- 9.19 10 MATCHINGPPER_NOUS

56.201 +- 0.005 43.5 +- 2.84 13 MATCHINGPPER_MON

56.201 +- 0.005 44.6 +- 4.43 12 MATCHINGPPER_MA

56.201 +- 0.005 44.7 +- 6.47 11 MATCHINGPPER_VOUS

56.261 +- 0.436 45.6 +-27.28 1 AmmountOfPPER

56.201 +- 0.005 45.8 +- 9.65 75 INTERJECTION_FI

56.201 +- 0.005 48 +-14.72 76 INTERJECTION_HEP

56.201 +- 0.005 50.2 +- 9.34 73 INTERJECTION_EH

56.201 +- 0.005 50.2 +- 6.27 74 INTERJECTION_EUH

56.201 +- 0.005 51.3 +- 3.66 81 INTERJECTION_MADAME

56.203 +- 0.08 51.3 +-23.56 37 VERB_COMPETITION

56.201 +- 0.005 52.1 +-15.75 58 VER:infi

56.201 +- 0.005 52.3 +-16.54 56 VER:futu

56.201 +- 0.005 53 +- 8.91 78 INTERJECTION_OUSTE

56.201 +- 0.005 54.7 +-17.43 34 VERB_CONTACT

56.162 +- 0.116 55.6 +-18.7 33 VERB_WEATHER

56.201 +- 0.005 56.9 +- 6.55 64 INTERJECTION_OH

56.201 +- 0.005 57.3 +- 4.5 63 INTERJECTION_AH

56.135 +- 0.087 58 +-24.31 19 MATCHINGPPER_NOS

56.193 +- 0.015 58.7 +-13.46 77 INTERJECTION_OUF

56.201 +- 0.005 59.3 +- 9.42 67 INTERJECTION_HÉLAS

56.143 +- 0.07 59.6 +-16.69 14 MATCHINGPPER_MES

56.201 +- 0.005 59.9 +- 4.5 42 VERB_STATIVE

56.201 +- 0.005 60.2 +- 2.64 62 VER:subp

56.201 +- 0.005 60.6 +- 8.39 71 INTERJECTION_CHUT

56.201 +- 0.005 62 +- 6.36 70 INTERJECTION_HEM

56.193 +- 0.015 62.5 +-10.87 66 INTERJECTION_HEIN

56.197 +- 0.011 62.6 +- 8 65 INTERJECTION_HÉ

56.201 +- 0.005 62.7 +- 5.87 51 DEIKTIKA

56.201 +- 0.005 63 +- 4.07 80 INTERJECTION_MONSIEUR

56.201 +- 0.005 63 +-11.79 53 VER:impe

56.005 +- 0.298 63.8 +-24.78 38 VERB_POSSESSION

56.201 +- 0.005 64 +- 5.67 39 VERB_SOCIAL

56.201 +- 0.005 64.3 +- 4.86 45 VERB_CHANGE

56.197 +- 0.013 64.7 +- 8.74 68 INTERJECTION_BAH

56.201 +- 0.005 64.7 +- 5.27 59 VER:pper

56.197 +- 0.015 65.2 +- 7.08 69 INTERJECTION_HOLÀ

56.139 +- 0.062 66.7 +-20.16 27 MATCHINGPPER_ELLES

56.183 +- 0.008 71.8 +- 5.23 72 INTERJECTION_BRAVO

56.005 +- 0.121 74.2 +- 9.41 43 VERB_CREATION

56.079 +- 0.038 77.4 +- 1.56 2 AmmountOfDET

55.99 +- 0.142 78.1 +- 3.73 49 POSNPP

10. Annex B: Text collection

author-nametitleyearsubgenrenarration
BalzacPierrette1840blancheheterodiegetic
BalzacTenebreuseAffaire1841policierheterodiegetic
BalzacAlbertSavarus1842blancheheterodiegetic
SueMysteresParis021842fantastiqueheterodiegetic
SueMorneDiable1842fantastiqueheterodiegetic
SueMysteresParis011842fantastiqueheterodiegetic
FevalPPLoupBlanc1843blancheheterodiegetic
DumasEppstein1843fantastiqueheterodiegetic
FevalPPMysteresLondres11843policierheterodiegetic
FevalPPFanfaronsRoi1843blancheheterodiegetic
FevalPPMysteresLondres31843policierheterodiegetic
SueMysteresParis041843fantastiqueheterodiegetic
SueMysteresParis051843fantastiqueheterodiegetic
SueJuifErrant1844fantastiqueheterodiegetic
SandPecheAntoine1845blancheheterodiegetic
SuePaulaMonti1845fantastiqueheterodiegetic
FevalPPQuittance2Galerie1846blancheheterodiegetic
SandLucreziaFloriani1846blanchehomodiegetic
BalzacCousineBette1846blancheheterodiegetic
GautierPartieCarrée1848fantastiqueheterodiegetic
SueMysteresPeuple021849fantastiqueheterodiegetic
DumasFantômes1849fantastiquehomodiegetic
DumasOlifus1849fantastiquehomodiegetic
DumasColliersVelours1850fantastiqueheterodiegetic
SueMysteresPeuple031850fantastiqueheterodiegetic
SueMysteresPeuple041850fantastiqueheterodiegetic
SueMysteresPeuple071851fantastiqueheterodiegetic
SueMysteresPeuple061851fantastiqueheterodiegetic
AurevillyEnsorcelée1852fantastiquehomodiegetic
PonsonBaronne1852fantastiqueheterodiegetic
FevalPPReineEpees1852blancheheterodiegetic
PonsonFemmeImmortelle1852fantastiqueheterodiegetic
SueMysteresPeuple091853fantastiqueheterodiegetic
SueMysteresPeuple081853fantastiqueheterodiegetic
SueMysteresPeuple111854fantastiqueheterodiegetic
SueMysteresPeuple101854fantastiqueheterodiegetic
SueMysteresPeuple121855fantastiqueheterodiegetic
FevalPPMadameGilBlas1856blanchehomodiegetic
GautierAvatar1856fantastiqueheterodiegetic
FevalPPLouve21856blancheheterodiegetic
SueMysteresPeuple131856fantastiqueheterodiegetic
GautierRomanMomie1857fantastiqueheterodiegetic
SueMysteresPeuple161857fantastiqueheterodiegetic
DumasMeneurLoups1857fantastiqueheterodiegetic
SueMysteresPeuple151857fantastiqueheterodiegetic
PonsonClubValets21858policierheterodiegetic
PonsonExploitsRocambole31859policierheterodiegetic
PonsonExploitsRocambole21859policierheterodiegetic
PonsonExploitsRocambole11859policierheterodiegetic
SandElleLui1859blancheheterodiegetic
PonsonChevaliers1860policierheterodiegetic
FévalTénèbre1860fantastiqueheterodiegetic
FevalPPChevalierTenebre1861fantastiquehomodiegetic
AimardRodeursFrontieres1861blancheheterodiegetic
HugoMiserables1Fantine1862blancheheterodiegetic
PonsonTestamentGrainDeSel1862policierheterodiegetic
AboutOreilleCassée1862fantastiqueheterodiegetic
VilliersIsis1862fantastiqueheterodiegetic
FevalPPHabitsNoirs11863policierheterodiegetic
AurevillyPrêtreMarié1864fantastiquehomodiegetic
FévalVampire1865fantastiquehomodiegetic
GaboriauLerouge1865policierheterodiegetic
FevalPPHabitsNoirs2Coeur1865policierheterodiegetic
PonsonBreda1866fantastiqueheterodiegetic
PonsonResurrectionRocambole21866policierheterodiegetic
VerneCapitaineHatteras1866blancheheterodiegetic
PonsonDernierMot31867policierheterodiegetic
PonsonDernierMot41867policierheterodiegetic
GaboriauEsclavesParis21867policierheterodiegetic
PonsonDernierMot21867policierheterodiegetic
PonsonMiseresLondres31868policierheterodiegetic
AimardOurson1868blancheheterodiegetic
PonsonMiseresLondres21868policierheterodiegetic
PonsonMiseresLondres41868policierheterodiegetic
FevalPPHabitsNoirs3Rue1868policierheterodiegetic
PonsonFéeAuteuil1868fantastiqueheterodiegetic
FlaubertEducation1869blancheheterodiegetic
FevalPPHabitsNoirs4Arme1869policierheterodiegetic
FevalPPHabitsNoirs5Maman1869policierheterodiegetic
GouraudEnfantsFerme1869blancheheterodiegetic
GaboriauMonsieurLecoq21869policierheterodiegetic
ZolaFortuneRougon1870blancheheterodiegetic
PonsonCordePendu11870policierheterodiegetic
PonsonCordePendu21870policierheterodiegetic
GaboriauVieInfernale21870policierheterodiegetic
GaboriauDegringolade11872policierheterodiegetic
GaboriauDegringolade31872policierheterodiegetic
GaboriauDegringolade21872policierheterodiegetic
GaboriauCordeCou21873policierheterodiegetic
ZolaVentreParis1873blancheheterodiegetic
GaboriauCordeCou11873policierheterodiegetic
GaboriauArgent11874policierheterodiegetic
GaboriauArgent21874policierheterodiegetic
FevalPPVilleVampire1875fantastiquehomodiegetic
ZolaAbbeMouret1875blancheheterodiegetic
VerneHectorServadac1877fantastiqueheterodiegetic
MalotCara1878blancheheterodiegetic
AimardAuriacAigleNoirDacotahs1878blancheheterodiegetic
StolzSecretLaurent1878blancheheterodiegetic
FevalPPHommeSansBras1881policierheterodiegetic
LotiRomanSpahi1881blancheheterodiegetic
BoisgobeyOmnibus1881policierheterodiegetic
GaboriauAmoursEmpoisonneuse1881policierheterodiegetic
StolzMesaventures1881blancheheterodiegetic
FevalPPHistoireRevenants1881fantastiqueheterodiegetic
GouraudChezGrandMere1882blancheheterodiegetic
AurevillyHistoireSans1882fantastiqueheterodiegetic
MaupassantUneVie1883blancheheterodiegetic
RachildeMVénus1884fantastiqueheterodiegetic
BoisgobeyVoilette1885policierheterodiegetic
ZolaGerminal1885blancheheterodiegetic
OhnetGrandeMarnière1885blancheheterodiegetic
ZolaOeuvre1886blancheheterodiegetic
VilliersEveFuture1886fantastiqueheterodiegetic
BoisgobeyRubisOngle1886policierheterodiegetic
MalotZyte1886blancheheterodiegetic
LotiPecheurIslande1886blancheheterodiegetic
MaryRogerLaHonte1886blancheheterodiegetic
MalotConscience1888blancheheterodiegetic
BoisgobeyOeilChat11888policierheterodiegetic
BoisgobeyChat21888policierheterodiegetic
GouraudQuandGrande1888blancheheterodiegetic
BoisgobeyMainFroide1889blancheheterodiegetic
BoisgobeyOpera21889policierheterodiegetic
BoisgobeyMainFroide1889policierheterodiegetic
BoisgobeyOpera11889policierheterodiegetic
BoisgobeyDoubleBlanc1889policierheterodiegetic

Bibliography
  1. Allison, S., Heuser, R., Jockers, M. L., Moretti, F. and Witmore, M. (2011). Quantitative Formalism: An Experiment (Stanford Literary Lab, Pamphlet 1). Stanford: Standford Literary Lab.
  2. Breiman, L. (2001). Random Forests. Machine Learning, 45(1): 5–32.
  3. Brunner, A. (2015). Automatische Erkennung von Redewiedergabe: Ein Beitrag Zur Quantitativen Narratologie. (Narratologia: Contibutions to Narrative Theory Band 47). Berlin; Boston: De Gruyter.
  4. Chang, C.-C. and Lin, C.-J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3): 1-27.
  5. Cohen, W. W. (1995). Fast Effective Rule Induction. Twelfth International Conference on Machine Learning. Morgan Kaufmann, pp. 115–23.
  6. Durrer, S. (1994). Le dialogue romanesque. Style et structure. Geneva: Droz.
  7. Elson, D. K. and McKeown, K. R. (2010). Automatic Attribution of Quoted Speech in Literary Narrative. Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10).
  8. Genette, G. (1980). Narrative Discourse. An Essay in Method. Oxford: Blackwell.
  9. Iosif, E. and Mishra, T. (2014). From Speaker Identification to Affective Analysis: A Multi-Step System from Analyzing Children Stories. Proceedings of the Third Workshop on Computational Linguistics for Literature, pp. 40–49.
  10. John, G. H. and Langley, P. (1995). Estimating Continuous Distributions in Bayesian Classifiers. Eleventh Conference on Uncertainty in Artificial Intelligence. San Mateo: Morgan Kaufmann, pp. 338–45.
  11. Krestel, R., Bergler, S. and Witte, R. (2008). Minding the Source: Automatic Tagging of Reported Speech in Newspaper Articles. ELRA, Proceedings of the Sixth International Language Resources and Evaluation Conference.
  12. Leech, G. N. and Short, M. (1981). Style in Fiction: A Linguistic Introduction to English Fictional Prose. London; New York: Longman.
  13. McCallum, A. K. (2002). MALLET: A Machine Learning for Language Toolkit.
  14. Sarmento, L. and Nunes, S. (2009). Automatic Extraction of Quotes and Topics from News Feeds. Proceedings of DSIE’09 - 4th Doctoral Symposium of Informatics Engineering.
  15. Weiser, S. and Watrin, P. (2012). Extraction of Unmarked Quotations in Newspapers. Proceedings of the Eight International Conference on Language Resources and Evaluation.