DH 2016 Abstracts

Digital Resources and Research Data in the Digital Humanities: The Digital Knowledge Store at the Berlin-Brandenburg Academy of Sciences and Humanities

The Digital Knowledge Store was developed from March 2012 to April 2015 at the Berlin-Brandenburg Academy of Sciences and Humanities (BBAW) and funded by the Deutsche Forschungsgemeinschaft (DFG). In this first project phase a search infrastructure enables centralised access to all digital resources of the BBAW was created. The DFG granted funds for a second period in which the Digital Knowledge Store will be expanded with a focus on deployment options and integration for partner research institutions.

The BBAW hosts over 170 research projects with over 1.2 million digital resources. For the first time these resources and their metadata were completely integrated into a central full text index and made accessible through an innovative user interface. The resources hosted at the BBAW vary widely in terms of content, formats and languages. The main part of the resources are provided as digital editions and translations in formats like XML, HTML and PDF but also as electronic catalogues, documentations, databases, digital full text collections and dictionaries. The Digital Knowledge Store can be queried in different languages via the morphologically analyzed index. We utilize a number of language technologies (Bing, DONATUS) to enable this multilingual search. The search also covers automatically and manually created metadata which enhance the resources, connect them semantically and provide additional information to the user. The metadata of all resources are provided as well via a machine readable web service (OAI-PMH) and in that way become part of the Linked Open Data Cloud.

The biggest challenge in building an interdisciplinary research data infrastructure like the Digital Knowledge Store was the heterogeneity of the digital resources created at the BBAW in the last 20 years. Hosted on different servers in different databases they vary widely in regard to content and access possibilities. It was the main task to access these data generically and bundle them in the central Apache Lucene index and in a metadata scheme adapted to the needs of the academy (based on OAI-ORE). Specific import modules were implemented for the various projects and resource collections which integrate the varying data structures of the research projects. Semantically connected suggestions are provided by integrating Semantic Web Technologies (e.g. DBpedia) and Text Mining components which extend the query term and invite the user to discover and explore the academy's projects.

The second project phase of the Digital Knowledge Store running until 2017 will expand its possibilities especially in terms of sustainability and availability. There is a heavy demand by academic institutions for sustainable longterm research infrastructures which can meet the specific requirements of research data in the humanities, e.g. the integration of heterogeneous resources and content handling. One important goal in the next stage is to broaden the target user group. The software components of the Digital Knowledge Store will be provided as an installation package. This will enable Partner institutions to run their own Knowledge Store adapted to their own digital resources. In order to coordinate further development and collaboration with future users an open workshop will be held in April 2016 in Berlin.

Another topic in the next project period will be the development of guidelines for the minimum structural and technical requirements that resources and metadata have to meet to be integrated easily into the index. The guidelines will include objectives for the (technical) quality of the resources and their metadata. These best-practice-recommendations can become a general recommendation in the digital humanities for building and maintaining resource collections and a reference on how to deal with the quality of resources and metadata beyond their specific use case. Our partner institutions will successively optimize and adjust them to their specific needs. Additionally workflows for the manual and automatic supply of metadata will be created and specified. Further development goals are the automatic evaluation and integration of user feedback into the query process as well as visualization components.

Bibliography

Ballsun-Stanton, B. (2012). Asking About Data - Exploring Different Realities of Data via the Social Data Flow Network Methodology. The University of New South Wales. http://www.mendeley.com/download/public/2110651/4867189482/58d704a31071163071cb0a391f17ad202fe958ff/dl.pdf (accessed 17 October 2015).
Ballsun-Stanton, B. (2009). Philosophy of Data (PoD) and Its Importance to the Discipline of Information Systems, AMCIS 2009 Proceedings. Paper 435. http://aisel.aisnet.org/cgi/viewcontent.cgi?article=1443&context=amcis2009 (accessed 17 October 2015).
Burdick, A. (2012). Digital Humanities. Cambridge: Cambridge University Press.
De Roure, D. (2011). Machines, Methods and Music: On the Evolution of e-Research. In High Performance Computing & Simulation, Oxford. http://users.ox.ac.uk/ oerc0033/preprints/hpcs11.pdf (accessed 17 October 2015), pp. 8–13.
Dörk, M. et al. (2011). The information flaneur: a fresh look at information seeking. In CHI 2011. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1215-24.
Fensel, D. (2004). Ontologies. A Silver Bullett for Knowledge Management and Electronic Commerce. Springer-Verlag Berlin Heidelberg.
Franklin, M., Halevy, A. and Maier, D. (2005). From databases to dataspaces: a new abstraction for information management. ACM Sigmod, 34(4): 27–33.
Haffner, A. (2012). Internationalisierung der GND durch das Semantic Web. http://www.kim-forum.org/Subsites/kim/SharedDocs/Downloads/DE/
Berichte/internationalisierungDerGndDurchDasSemanticWeb.pdf?__blob=publicationFile (accessed 17 October 2015).
Jannidis, F. (2010). Methoden der computergestützten Textanalyse. In Nünning, V. and Nünning A. (eds), Methoden der literatur- und kulturwissenschaftlichen Textanalyse. Ansätze - Grundlagen – Modellanalysen. Stuttgart, pp. 109–32.
Unsworth, J. (2011). Computational Work with Very Large Text Collections. Journal of the Text Encoding Initiative, 1: 1-9.
Voß, J. (2013). Describing Data Patterns - A general deconstruction of metadata standards. Pd.D thesis, Humboldt-University Berlin, http://edoc.hu-berlin.de/dissertationen/voss-jakob-2013-05-31/PDF/voss.pdf (accessed 4 March 2016).
Voß, J. (2013). Was sind eigentlich Daten? In LIBREAS. Library Ideas No 23. http://libreas.eu/ausgabe23/02voss/ (accessed 17 October 2014).
W3C Working Group (2014). RDF 1.1 Primer. http://www.w3.org/TR/2014/NOTE-rdf11-primer-20140225/ (accessed 17 October 2015).
Ward, D., Hahn, J. and Feist, K. (2012). Autocomplete as Research Tool:A Study on Providing Search Suggestions. Information Technology and Libraries, 31(4): 6-19.
Whitelaw, C., Hutchinson, B., Chung, G. and Ellis, G. (2009). Using the Web for Language Independent Spellchecking and Autocorrection. In EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, 2: 890-99.
Whitelaw, M. (2012). Towards Generous Interfaces for Archival Collections. Paper for International Council on Archives Congress. http://mtchl.net/wordpress/wp-content/uploads/2013/10/Whitelaw_ICA_GenerousInterfaces.pdf (accessed 17 October 2015).