When an earthquake struck Nepal in 2015, the band One Direction sent a tweet encouraging their fans to donate to relief efforts. This one tweet was retweeted a few times, but quickly lost in a flood of other tweets about One Direction’s tour. Simultaneously, an Indian Hindu extremist politician flooded his Twitter stream with rumors that Christian missionaries were coercing conversions from Nepalis in exchange for humanitarian aid. Additionally, an Indian religious group mixed substantial numbers of tweets about a movie they had released with tweets about their relief efforts in Nepal. These are just a few of the users who engaged with the disaster from a distance: they had different motives for tweeting about the disaster, and different levels of engagement with it. We call these users “onlookers:” they tweet about a disaster, but are not directly affected by it.
This paper analyzes onlooker behavior: it argues that onlookers who will send a few tweets can be predicted by their interests, while onlookers who will tweet heavily about it have few, if any, shared interests. We show that onlookers who primarily tweet about entertainment topics and news topics are likely to mention the disaster, yet send few tweets about it. On the other hand, onlookers who tweet substantially about a disaster after it happens are difficult to identify before the disaster occurs because they do not share common interests aside from the disaster.
Natural disasters often attract significant attention on Twitter, both by those affected and those who are distant. A substantial amount of research has explored how social media causes users to engage with political, social, and humanitarian problems; however, opinions on social media’s effectiveness—whether it causes users to donate money, stay informed, or participate in campaigns—are mixed. Some argue that displaying concern in social media is more about acquiring social capital than effecting change (Shulman, 2009; Gladwell, 2011; Morozov, 2012; Morozov, 2014), while a Pew Research Center survey finds that social media does create change (Raine et al., 2016). Additionally, many have argued that social media was important though not essential to protests in Egypt (Mazaid, 2011; Tufekci and Wilson, 2012) and other nations (Raine et al., 2016; Shirky, 2011). One analysis found that charities’ use of social media does not increase donations (Malcolm, 2016), while another finds that certain tweeting strategies do (Gasso Climent, 2015) although tweets may not raise awareness about the charity’s causes (Bravo and Hoffman-Goetz, 2015). Where all these studies concur is that social media enable a substantial amount of discourse about crises. The question we explore is how to predict how much attention Twitter users pay to crises: social media presents the opportunity for a user to send a single retweet about a disaster—as many One Direction fans did—or to sustain interest by following other users and sending many tweets about the event over a period of time.
Additionally, there is little question that social media is useful for those directly affected by disasters. A substantial amount of research finds that social media helps first responders (Regalado et al., 2015; Dugdale et al., 2012; Omilion-Hodges and McClain, 2016; Burns, 2015; Xiao et al., 2015; Kaewkitipong et al., 2016; Meng et al., 2015; Madianou, 2015; Palen, 2008). In fact, specialized algorithms have been developed for that purpose (Pohl et al., 2013a; Pohl et al., 2013b; Platt et al., 2011). Little work examines users who tweet about disasters at a distance, however, despite the large numbers of such users. We examine these onlookers because they produce a large amount of the tweets about humanitarian crises.
We use quantitative text analysis to identify tweets about the earthquake, to cluster onlookers based on shared interests, and to derive a correlation between onlookers’ interests and the number of tweets they sent about the earthquake. This section outlines our methods.
To attain a broad sample of onlookers, we gathered a dataset of over 5 million tweets sent by around 15,000 users in the three weeks following the Nepal earthquake. We harvested the data from the Twitter REST API by searching for any tweets that mentioned the word “Nepal” from April 24 to May 8. We randomly selected 15,000 users from this set and harvested all of their tweets sent between April 24, 2014 and May 5, 2015. We attempted to capture only English-speaking users to increase the likelihood that we would capture users not directly affected by the earthquake, but we still captured some users who tweeted in multiple languages. This left us with roughly 11,000 onlookers.
To determine how many tweets a user had sent about the earthquake, we trained a Naive Bayesian classifier using MALLET (McCallum, 2002) on a set of 100 onlookers’ tweets (totaling about 30,000 tweets), marking them as either quake-related or not. We applied the trained classifier to the remainder of the dataset to count each user’s quake-related tweets. Spot checking showed this technique had acceptable accuracy.
To find shared interests, we used Latent Dirichlet Allocation (Blei et al., 2003), treating all of a user’s tweets as one document. We ran LDA with MALLET with various numbers of topics, and settled on 80. These topics represented a broad span of themes: greetings, news, entertainment, technology, plus four topics directly related to the earthquake. We then looked for connections between onlookers by building an edge list of shared topics, creating a weighted edge between two onlookers if over 25% of both onlookers’ tweets consisted of a shared topic. The edge weight was the product of their affinities to that topic. Using Gephi (Bastian et al., 2009), we then ran a weighted Louvian modularity algorithm (Blondel et al., 2008) over this onlooker graph to generate communities of users.
This experiment resulted in 21 communities of onlookers being identified. The communities were labelled using the strongest topics in each.
ID | Label | Average Number of Tweets | Users | Average Quake-Related Tweets |
0 | Foreign language (Spanish) | 419 | 882 | 9 |
1 | Japanese Music | 403 | 135 | 5 |
2 | Greetings | 326 | 199 | 5 |
3 | Portuguese/Fifth Harmony | 710 | 658 | 7 |
4 | News Media | 977 | 11 | 1 |
5 | News Media | 652 | 1312 | 22 |
6 | News/Politics | 600 | 1236 | 26 |
7 | Indonesia | 386 | 416 | 8 |
8 | Foreign language (unknown) | 495 | 312 | 5 |
9 | Unclassified | 372 | 589 | 25 |
10 | Dera Sacha Sauda | 1732 | 54 | 205 |
11 | News about Russia | 780 | 30 | 18 |
12 | Shopping | 1153 | 226 | 11 |
13 | Greetings | 476 | 1010 | 11 |
14 | Greetings | 439 | 1347 | 5 |
15 | Science and animals | 521 | 108 | 13 |
16 | One Direction | 388 | 1085 | 10 |
17 | Foreign Language (Italian) | 553 | 47 | 14 |
18 | TV/Music | 598 | 722 | 5 |
19 | Music Videos | 649 | 14 | 30 |
20 | Nepal | 393 | 748 | 125 |
After pruning out the foreign language communities in the dataset and some that were difficult to classify (Communities 0, 7-9, and 17), we can further group these onlookers into three broad interest groups: Casual Users (Communities 1-3, 12-14, 18, and 19), News and Pundits (4-6, and 13), and Engaged Users (20). We divided these subgroups based on the topics that were strongest in each, but these subgroups correlated with the number of quake-related tweets that each sent. They are described in the table below.
Category | Proportion | Quake-Related Tweets/Week | Topic Affinities |
Casual Onlookers | 46% | 3 | Entertainment, greetings |
News Onlookers | 25% | 6 | News and politics |
Engaged Onlookers | 12% | 10 | Nepal |
Casual Onlookers. Onlookers in these communities showed high activity but low engagement with the disaster, sending an average total of three quake-related tweets a week. Their primary topics of discussion were entertainment, or greetings and positive sentiments. This is the largest group.
News Onlookers. These accounts are either the accounts of professional news outlets or amateur pundits. We find low average quake-related tweets in this group as well: users sent an average of six relevant tweets per week. News outlets generally moved from one topic to another quickly, and pundits only sustained interest in the topics that appealed to them.
Engaged Onlookers. This group sent the most quake-related tweets of all users; the strongest LDA topics in this group were two “Nepal earthquake” topics. However, users in this community had few other topics in common with each other.
This breakdown suggests a model for predicting the number of tweets onlookers send about events. There will be roughly three categories of onlookers: Casual Onlookers, News Onlookers, and Engaged Onlookers. Casual Onlookers will consist of roughly 50% of onlookers, and will send only a few tweets over the first three weeks. Membership in this group is predicted by an interest in entertainment topics. The number of News Onlookers will be half the size of the Casual Onlookers, but they will be roughly twice as engaged. An onlookers’s affinity to this group will be predicted by a general interest in news. Finally, the Engaged Onlookers will send 10-20 times as many tweets as the Casual Onlookers, and will comprise slightly over 10% of onlookers. This group sends the most tweets about an event, but membership in this group cannot be predicted from their preexisting interests.
We find that it is easy to predict shallow engagement with a disaster on Twitter, but difficult to anticipate sustained interest. Onlookers who tweet about entertainment are likely to pass on at least a few messages about donating money because entertainers are likely to post these messages, and fans are likely to retweet them. On the other hand, onlookers who tweet more about an event are likely to have preexisting interests that intersect with a particular aspect of the disaster, but the relevant interests are hard to predict because doing so would require knowing the nature of the disaster ahead of time. For example, to know the Hindu extremist would tweet about rumors of coerced conversions in Nepal, we would have to predict a crisis that would produce such rumors.
Additionally, we acknowledge that our model is derived from a single case study. As such, we treat it as provisional pending further experiments. We hope to confirm this model with future work.