DH 2016 Abstracts

Modelling Taxonomies of Text Reuse in the Deipnosophists of Athenaeus of Naucratis: Declarative Digital Scholarship

This paper presents work on documenting text reuse of fragmentary authors and of extant works. By fragmentary we mean authors whose texts are lost and known through quotations and references by other authors. Within ancient Greek literature 60% of authors is preserved only in fragments, showing the challenge of working with innumerable pieces of reuse scattered in our textual heritage (Berti et al., 2009). This work is necessarily prior to any specific research questions. We cannot inquire into, e.g., the historical works of Istrus the Callimachean until we can comprehensively and precisely catalogue the surviving fragments of Istrus; nor can we ask “how did intellectuals in the 3rd century CE read epic poetry?”, until we can comprehensively identify instances of Homeric text reuse and work with them in their context.

The term fragment is the result of print editorial practices, where chunks of text preserving traces of lost authors and works are extracted from their contexts and reprinted in separate collections. Even if such editorial workflow has produced invaluable results for reconstructing lost authors, the concept of textual fragment is problematic: It includes different kinds of text reuse and implies a certain degree of originality, which is difficult to assess and represent because the original text from which the reuse derives is hidden by the cover text, i.e., by the intention of the quoting author and the characteristics of the preserving context (Most, 1997; Schepens, 2000; Berti, 2013).

Our data model defines taxonomies of text reuse for representing references to authors and works not as separate chunks of text but as contextualized annotations, expressing their nature of reuse of textual evidence. These annotations include not only the portion of text classifiable as a reuse, but also biographical and bibliographical data preserved in the source text.

Text reuse of fragmentary authors presents the challenge of documenting text aligned with no extant exemplar. Text reuse of extant works presents additional challenges of aligning as precisely as possible (but no more precisely than is possible) two or more extant passages of text that may differ in small ways or large. Our data model documents uniquely instances of text reuse and it is developed on the Canonical Text Services (CTS), which is a protocol for identifying and retrieving passages of text based on concise, machine-actionable canonical citation. It is founded on the assumption that a “text” can be modelled as “an ordered hierarchy of citation objects” (Smith and Weaver, 2009). CTS URNs can identify passages more grossly or more finely; they can identify a range of passages at various levels of specificity; by the addition of an indexed substring, a CTS URN can identify a particular string within a passage of text (Blackwell and Smith, 2012). CTS is one component of a larger digital library architecture, developed for the Homer Multitext project and called CITE (Collections, Indices, Texts, and Extensions): http://www.homermultitext.org/hmt-doc/cite/.

In order to produce citable analyses of text reuse in their context, we have been working with the Deipnosophists of Athenaeus of Naucratis, which is the account of a banquet where learned men quote authors and works of Greek literature concerning a wide range of topics related to dining and food. The Deipnosophists is significant because it is a very rich collection of many different kinds of text reuse of fragmentary authors and of extant works (Braund and Wilkins, 2000; Lenfant, 2007; Jacob, 2013).

Our data model specifies four subjects of analyses:

Authors: enumerate and identify authors reused by Athenaeus;
Works: enumerate and identify works reused by Athenaeus;
Mentions: catalog every mention of authors and works in the text of Athenaeus, including his vocabulary for identifying them. For example, Athenaeus may mention that a work by Archestratus of Syracuse was known by four different names (i.e., Gastronomy, Life of Pleasure, Science of Dining, or Art of Cooking); this would generate five entries in this list: one mention of Archestratus, and four mentions of the same work.
Reuses: uniquely identify instances of text-reuse in the text of Athenaeus.

A fifth analysis will also include the twenty-two learned men who take part in the banquet described by Athenaeus and who are actually the characters who quote and reuse a huge amount of authors and works.

We need seven records to produce citable analyses of the above mentioned subjects:

Analysis Record URN. Every documented instance of text reuse (authors, works, mentions, reuses) has a CITE URN uniquely identifying this instance in a CITE collection.
Sequence Number. The collection of instances of text reuse is an ordered collection; each item has a sequence number, reflecting the item’s sequence in the text of Athenaeus. This value is programmatically generated by a CTS-aware script before publishing the collection.
Analyzed Text. A CTS URN defining, as precisely or imprecisely as necessary, the span of text in the Deipnosophists that is the subject of this analysis of text reuse. The scope of the Analyzed Text is determined by the nature of the text reuse. In the case of authors and works, this CTS URN identifies a passage in the Deipnosophists that serves to justify the inclusion in the respective list. When an author or a work is reused often, the passage should be a clear, unambiguous reference (e.g., “Homer says …”).
Reused Text. While the Analyzed Text identifies a coherent and contiguous span of text, as it appears in the edition being analyzed, the Reused Text is a string identifying only the text being reused. The Analyzed Text provides context and a basis for alignment, while the Reused Text gives us the flexibility to call out non-contiguous text, to normalize text, or even to promote morphological forms determined by indirected statement to those appropriate for direct speech, without doing violence to our source-edition. The Reused Text record allows us to represent different intepretations of the same text reuse, especially in the case of non-verbatim quotations.
Alignment URN. This collection documents reuse of extant authors and works, for which we have extant editions with canonical citation. The Alignment URN is a CTS URN pointing to the quoted extant author (identified with a CtsGroupUrn) or to one specific edition of the reused work (identified with a CtsWorkUrn) that (a) justifies our claim of text reuse, and (b) is the basis for attaching a citation of a still extant work to this analysis.
Analytical Edition URN. The collected instances of text reuse of extant work in the Deipnosophists represent a new edition of these works, whose text-content is based on our analysis of our project’s edition of Athenaeus. The Analytical Edition URN is a CTS URN to an Athenaeus Edition of these works; the citation-value is based on that of the Alignment URN; the text-content of this edition is the Reused Text in Athenaeus. The Analytical Edition gives us an orthogonal view of the text reuse of extant authors in Athenaeus.
CITE Collection of Lost Works. For text reuse of lost authors and works, there is no citation scheme, nor any inherent order to the text. For these, we produce a collection of text-reuse. This Collection can be cited by CITE URNs.

Initial work on documenting text reuse has been focused on references to Homer’s Iliad in the Deipnosophists (data available at http://digitalathenaeus.github.io/). The aim is to extend our data model including the categorization of different kinds of text reuse and further concrete examples of references to fragmentary authors and extant works in the Deipnosophists of Athenaeus of Naucratis.

Bibliography

Berti, M. (2013). Collecting Quotations by Topic: Degrees of Preservation and Transtextual Relations among Genres. Ancient Society 43: 269–88. doi:10.1145/1555400.1555442.
Berti, M., Romanello, M., Babeu, A. and Crane, G. (2009). Collecting Fragmentary Authors in a Digital library. In Proceedings of the 9th ACM/IEEE-CS Joint Conference on Digital Libraries. New York: ACM, pp. 259–62. doi:10.1145/1555400.1555442.
Blackwell, C. W. and Smith, D. N. (2012). Four URLs, Limitless Apps: Separation of Concerns in the Homer Multitext Architecture. In Muellner, L. (ed.), Donum Natalicium Digitaliter Confectum Gregorio Nagy Septuagenario a Discipulis Collegis Familiaribus Oblatum. Washington, DC: The Center for Hellenic Studies. http://chs.harvard.edu/wa/pageR?tn=ArticleWrapper&bdc=12&mn=4846.
Braund, D. and Wilkins, J. (2000). Athenaeus and His World. Reading Greek Culture in the Roman Empire. Exeter: University of Exeter Press.
Jacob, C. (2013). The Web of Athenaeus. Center for Hellenic Studies: Harvard University Press.
Lenfant, D. (2007). Athénée et les fragments d’historiens. Actes Du Colloque de Strasbourg (16-18 Juin 2005). Paris: De Boccard.
Most, G. W. (1997). Collecting Fragments. Fragmente sammeln. Göttingen: Vandenhoeck & Ruprecht.
Schepens, G. (2000). Probleme der Fragmentedition. (Fragmente der Griechischen Historiker). In Reitz, C. (ed.), Vom Text Zum Buch. St. Katharinen, pp. 1-29.
Smith, D. N. and Weaver, G. (2009). Applying Domain Knowledge from Structured Citation Formats to Text and Data Mining: Examples Using the CITE Architecture. Text Mining Services 129. http://katahdin.cs.dartmouth.edu/reports/TR2009-649.pdf