We propose a strategy for conducting digital humanities teaching and research that prioritizes publishing data above all other project activities. Drawing on our experience working with faculty, librarians, and graduate students on a critical edition in TEI of Charles Baudelaire’s Les Fleurs du Mal, we demonstrate how adopting a data-first strategy fosters research, collaboration, pedagogy and scholarly communications in the digital humanities.
The Corpus Baudelaire Project began at Vanderbilt University in 2013, when a hybrid group of approximately ten scholars, who had recently learned how to encode literary texts in the TEI, aspired to do something practical with their new skills. The group developed a connection to Vanderbilt University Library’s W. T. Bandy Center for Baudelaire and Modern French Studies; who exhaustively collects Baudelaire’s works, including Les Fleurs du Mal. The work itself was published in four editions: 1857, 1861 (containing 35 additional poems, the Tableaux parisiens, and lacking six poems censored by the Second Empire), 1866 (including Les Epauves or The Scraps, and the six poems missing from the 1861 edition), and the posthumous 1886 edition. Participants in the Corpus Baudelaire Project are encoding all the editions using the critical edition apparatus in the TEI.
We describe our data-first approach to Corpus Baudelaire Project, which minimizes otherwise common tasks such as developing databases or coding interfaces, and argue for its advantage over alternative approaches in fostering collaboration, pedagogy, and new forms of publishing. We also suggest that our data-first approach may also productively be generalized to any digital humanities projects developing significant quantities of data.
A data-first approach differs from other forms of digital humanities scholarship by minimizing startup costs and reducing complexity. Whereas digital humanities projects aim above all to produce some form of online digital edition or interactive website, a data-first approach invests primarily in producing and sharing data with others. “It’s the data, stupid!” is our informal slogan.
A data-first approach to DH involves at least three steps: licensing, curating, and publishing datasets online. The second two steps are likely to be iterative and emergent.
By prioritizing these three activities above other forms of digital humanities, we simultaneously lower the barriers for participants to join our project while offering them the opportunity to publish and begin receive credit for their work almost immediately. Crucially, credit is allocated with respect to contributions, not by seniority or other hierarchical designations; the data bear witness directly to their creators.