This workshop introduces a curation-oriented web crawler called Hyphe. This software, developed with and for Social Sciences and Humanities scholars, aims at providing a method and a tool to build a research corpus from web content (web pages and HTTP links). It provides a web mining tool wrapped with a User Interface and curation features (defining web pages aggregates, filtering contents, expansion method) required by Social Sciences and Humanities scholars.
We will focus on using the web crawler and will not take the time to present web studies, digital sociology or digital methods in general. Participants should have basic knowledge of the web and already consider it as a legitimate field for scientific investigation (Ackland, 2013). Participants are encouraged to come with ideas regarding which websites would be interesting to study for their personal research agenda (a list of entry points).
The web is a field of investigation for social sciences, and platform-based studies have long proven their relevance. However the generic web is rarely studied in and of itself, though it contains crucial embodiments of social actors: personal blogs, institutional websites, hobby-specific media… We realized that some sociologists see existing web crawlers as “black boxes” unsuitable for research though they are willing to study the broad web. Hyphe is a crawler which was developed with and for social scientists, with an innovative “curation-oriented” approach meant to address two of the main social science problems when working with web mining: how to build a corpus and how to delineate an actor’s presence (Jacomy et al., 2016).
The workshop will first introduce Hyphe’s software and methodological principles through a guided case study. The participants will be guided through their first use of Hyphe to build their own web corpus.
We will start the workshop with a presentation of our software Hyphe and its methodological principles. It will be done through its application on a case study. We offer to map the Digital Humanities communities through the many websites used to present and organise associations, conferences, research projects, research labs… We will build such a corpus live during this first part to introduce the participants to the main concepts and practical steps one should meet when building a web corpus. The teachers will have prepared the corpus before the workshop with a series of most common use cases and issues. The subject of digital humanities is proposed first because these communities use web communication a lot, and secondly to better engage the participants with a subject they are familiar with.
After this extensive presentation of Hyphe, participants will be invited to engage in practice themselves. Individually or as groups of two, they will be given access to their own corpus on an online version of Hyphe and will be invited to map web communities on their subject of research following Hyphe’s iterative curation process:
We will conclude this part with a discussion on methodologies to wrap-up the workshop.