Social Networks and Archival Context (SNAC), initiated in 2010 as a R&D project, is now being transformed into an international cooperative. SNAC’s original research objective was to demonstrate that descriptions of people, embedded in the descriptions of historical records that document their lives, could be extracted and used to reveal the social networks within which their lives were lived and provide integrated access to geographically dispersed historical records. SNAC’s early success led to plans to establish a sustainable international cooperative to maintain and expand these descriptions of people. The long-term technological objective is a platform to support a continuously expanding, curated corpus of reliable biographical descriptions of people linked to, and providing contextual understanding of, the historical records that function as primary evidence for understanding their lives and work. The SNAC Cooperative will benefit librarians, archivists, and researchers, and will provide traditional historical researchers integrated access to distributed historical records and the social contexts within which the records were created and used. It will provide prosopographic researchers with methods for reconciling and establishing reliable social networks, and will enable archivists and librarians to share descriptive data while also making descriptions more effective.
Archival description source data encompasses both descriptions of historical records as well as authority data for corporate bodies, persons, and families documented in historical records. OCLC WorldCat, sixteen archival consortia (representing hundreds of individual repositories), over thirty repositories, and two digital humanities research projects contributed their source data to SNAC. The holdings of over 4000 repositories are represented:
During SNAC’s R&D phase (2010-2015), this source data was processed in three distinct steps.
The first step resulted in 6,719,064 Encoded Archival Context – Corporate Bodies, Persons, Families (EAC-CPF: an archival encoding standard hosted by the Society of American Archivists and developed in collaboration with the international archival community).
After performing identity resolution processing (match and merging), we had:
The prototype history research tool (http://socialarchive.iath.virginia.edu/snac/search ) allows researchers to find persons, organizations, and families; to read biographic information about them; to explore the social networks within which they existed; to locate historical records that document their lives, related resources, and external links associated with that name. Associated links are provided for ArchivesGrid and Digital Public Library of America, as well as “sameAs” links to Wikipedia, VIAF, WorldCat Identities, and others.
Researchers have welcomed SNAC for its research economies: SNAC’s History Research Tool provides integrated access to distributed primary (archival) and secondary (published) resources, eliminating or at least substantially ameliorating the need to track down resources in multiple archival catalogs. Painstakingly locating these resources is a labor-intensive, time-consuming activity in the current research environment, with successful discovery and assembling of the data highly dependent on persistence and serendipity. Indeed it is likely that some of the information found in the SNAC records might never be discovered using current methods. SNAC also makes explicit what has been, at best, implicit in archival description: the social-professional-intellectual networks within which the lives and work of the people documented in historical resources took place. It exposes the vast global social-document network that connects the past to the present. Ed Ayers, President of the University of Richmond and a Civil War historian, wrote that:
SNAC promises to change the way history is imagined and written! For all that the digital revolution has revolutionized, the heart of research lies within the primary record embedded in archives large and small. The pioneering work of SNAC will unlock that record, revealing connections and patterns invisible to us now.
Alan Liu, Professor of English, University of California, Santa Barbara and Director of Research Oriented Social Environment (RoSE), describes SNAC’s potential:
SNAC employs state-of-the-art computational techniques to do three things very well: 1) unlock information originally recorded for specific purposes in library and other archival finding aids to make them usable in new contexts; 2) connect widely-distributed information of this sort from around the world; and 3) marry the “library” or “archive” model of knowledge to a whole other model of social networks that both humanizes our understanding of the way knowledge emerges from communities of knowledge creators and seekers, and speaks powerfully to today’s “social network” generation.
SNAC is building a humanities resource that benefits humanities researchers, but ongoing development and refinement of identity reconciliation techniques are of further benefit to humanists engaged in prosopographical research. Names alone are weak identifiers: multiple people can have the same name and one person may have multiple names. A number of factors influence our ability to reliably identify people. Indeed, the larger the domain from which names are drawn, the higher the likelihood that a name is shared by several people.
Though each step in the processing described above presents intellectual and technical challenges, the most challenging is identity reconciliation. A fundamental human activity in the development of knowledge involves the identification of a unique “real world” entity (e.g., a person or book) and recording facts that, when taken together, uniquely distinguish that entity. Establishing the identity of a person, for example, involves examining available evidence, including the existing knowledge base, and recording facts associated with him or her (such as names, dates and places of birth and death, occupation, etc.). This is an ongoing, cumulative activity that both leverages existing established identities and establishes new identities. Identity reconciliation is the process by which an encountered identity is compared against established identities, and if not matched, is itself contributed to the established base of identities. The networked computing environment presents opportunities for using algorithm-based inference methods to compare newly encountered entities with established identities to determine the probability that a new entity represents the same person or thing as an established identity. This ongoing expansion of the base of reliable identities is an interplay of human research, knowledge recording, and computational methods.
It became clear early on that the biographical data extracted and assembled from archival resource description constituted a valuable independent resource that could (and should) be maintained and further developed cooperatively. Development of a cooperative began back in 2011 and it recently entered its pilot phase with a group of fourteen inaugural institutional members that support the potential benefits of aggregated description and access demonstrated to date in SNAC, and, further, embrace the idea that the resources amassed should be cooperatively built and maintained in order to fully realize these benefits. The initial members represent research archives, libraries, museums (art and natural history), government archives, and institutional archives. The U.S. National Archives and Records Administration (NARA) serves as the secretariat for the Cooperative, while the Institute for Advanced Technology in the Humanities (IATH), University of Virginia, hosts the technological infrastructure. SNAC is led by IATH, working collaboratively with NARA, the California Digital Library, and the iSchool at UC Berkeley. The National Endowment for the Humanities (2010-2012), the Institute for Museum and Library Services (2011-2013), and the Andrew W. Mellon Foundation (2012-2017) have provided funding for SNAC.
Because family names, as traditionally formed, lack sufficient qualifying information and thus commonly result in false positives, no matching was done against family names. In the final production, two family names were rejected as malformed.