If, as Ta-Nehisi Coates has suggested, the most lasting traumas of United States history can be traced back to Americans' "need ... to think that they are white" (2015), what might digital literary studies be able to tell us about this self-inflicted delusion? Specifically, how might quantitative textual analysis help us reconstruct the process by which ethnicities, nationalities, religious groups, and other identity categories became dominated by the all-encompassing category of race? One might expect literature to present a complex field of racial discourse, one in which what sociologist Matthew Snipp calls "administrative definitions of race” (2010) coexist with self-identifications and ethnic stereotypes, colonialist fantasies and half-forgotten family trees. Literary scholars specializing in race and ethnicity have established a rich tradition of close readings that attempt to disentangle this discourse within particular texts, generating information about how individual writers navigate race in the United States. How would this understanding change, however, if we expanded our scope to hundreds of writers and two centuries of American history? What discontinuities or consistencies might we find in the language associated with different ethnicities? Would historical changes in civil rights, immigration, and territorial expansion be visible on the level of fictional discourse?
To approach these questions, we have assembled a corpus of 193 works of American fiction across a range of genres. Our selection begins in 1789: the year of Washington’s election and the establishment of the Constitution, it also saw the publication of William Hill Brown’s The Power of Sympathy, considered the first American novel. 1964, on the other hand, marked the beginning of President Lyndon Johnson’s systematic immigration policy reforms, culminating in the Immigration and Nationality Act of 1965, which finally erased the restrictive quotas that had limited entry into the U.S. for non-white, non-western European populations. Within this extended period, major changes in government policy toward various racial and ethnic groups -- the Indian Removal Act, Emancipation, the Chinese Exclusion Act, the Immigration Act of 1917, the internment of Japanese citizens -- provide cardinal points that guide our analysis and raise fundamental issues about the relationship between fiction and the world it represents. What kinds of socio-political shifts make a difference in literary characterization? Can literature change the direction or accelerate the pace of social change, as in Abraham Lincoln’s oft-quoted but probably apocryphal claim that Harriet Beecher Stowe’s Uncle Tom’s Cabin incited the Civil War?
To probe our assumptions about the language of identification in the novel, we combine methods that both investigate the formal features of the novels as a whole and extract racialized discourse as it attaches to particular characters. We suggest that the semantics of identity, whether racial, cultural, ethnic or national, operate at two discrete levels: 1) embedded within discourse such that it acts as a background to the particular worldview of the text; 2) at the level of character, where the lexemes of identity become a self-aware system of description, whether leveraged by the narrator, in the reactions of other characters or internally as part of a character’s self-articulation. We argue through this project that characters embody a set of racialized identifiers that operate against a set background understanding of the meaning of identity within the text -- a dialectic between intratextual characterization and intertextual stereotyping that has its origins in, but expands significantly upon, the model of marginalized characters articulated in Alex Woloch’s The One vs. the Many (2003). The goal of our project is to tease apart these two levels of discursive identity in order to reassemble a new history of the discourse of race in American fiction as it evolves against the backdrop of history and a changing set of aesthetic principles in novel writing.
The first stage of this investigation examines the discourse of identity categories as they propagate throughout our corpus. We begin with a set list of various racial determinants that include national origins (German, Italian), ethnic identifiers (Jew, Arab) and racialized categories (Negro, Indian) and identity the pattern of language that attaches to these descriptors over time. In our first pass, we extract the collocates of each of our terms and identify which, if any, are significantly distinctive of that term. 1 For example, the term “foreign” appears significantly often as a collocate of the names of European countries, but never within the vicinity of racial descriptors, or ethnic identifiers. We then extend the process and trace a new set of second order distinctive collocates from the terms we have identified, to see which trace back to our initial set of identifiers and which introduce new discourses into the semantics of race. By visualizing our results as a dynamic network of interconnected language we trace the connections between our primary identifiers, as well as how these relationships change over time. To extend the above example, in the nineteenth century, a language of “foreignness” is connected with European nationalities, while “America/n” is distinctively used as a descriptor of African Americans. By the mid-twentieth century, the terms describing African Americans shift away from the emphasis on their “Americanness” and instead incorporate a set of terms, such as “descent” and “blood” borrowed from the foreign discourse of immigration. Such analyses can help us to identify in precise historical detail both the moment at which particular national or ethnic groups became American and the related but not the identical moment at which they, in Noel Ignatiev’s phrase, “became white” (1995) -- that is, when the language surrounding those groups became unmarked. At the same time, finding consistencies in the language applied to different racial or ethnic groups at different historical moments grants support for Theodore Allen’s claim that the concept of race names “a pattern of oppression (subordination, subjugation, exploitation) of one set of human beings by another,” where the “phenotypical” identity of those sets is less important than the structure of their relationship (2012). This work builds upon the previous work of the members of the Literary Lab on the language of human identification in Anthropology, presented at the 2015 Digital Humanities conference and forthcoming in Current Anthropology, although it represents a substantial methodological extension over this early project, as well as shifting the emphasis from scholarly writing to literary representations of identity.
The second phase of our project examines the construction of individual characters against this backdrop. This not only allows us to observe which characters actively resist the discourse of the period in which they were created, but also how the evolution of this terminology as applied to individuals differs from the cultural construction of identity as a historically contingent socio-cultural phenomenon. That is, does a character described as “black” inherit the descriptors of identity from the language of the period or is discourse radically more individuated on a character by character basis? To test this assumption, we adopt a similar approach to the “BookNLP” developed by Bamman, Underwood and Smith (2014), using Named Entity Recognition to extract characters coupled with a set of scripts to perform co-reference resolution. We then tag the corpus for part-of-speech and extract the distinctive adjectives that appear in dependent positions within 50 words of each mention of the character. This allows us to identify, with reasonable precision, a descriptive terminology for each character. We then tag each character for the racial, ethnic or national identity that is given in the novels and compare our descriptive discourse for characters who embody similar identities across texts, using a set of semantic network diagrams that allow us to trace the contiguities of identity both across genre and across time.
By combining these methods, we are able to see, for the first time, not only how the distinctive language of identity alters over the history of the American novel, but how the discourse of characterization functions as both a vehicle for the standard tropes and stereotypes for identity, as well as a point of resistance to the dominant representational language of a given period. It also provides new insights into the process of characterization, especially in regard to the representation of immigrant or minority identities.
Significance is determined using a Fisher’s Exact test to measure the observed values against the expected frequency of the term as a collocate, using an alpha of 0.05.