Mining Hope:

Preserving and Exploring Twitter Data for Digital Visual Studies

By Aaron Beveridge and Nicholas M. Van Horn

In her 2017 Kairos article “Mapping Obama Hope,” Laurie Gries heavily relies on data visualization to deepen our understanding of Obama Hope’s highly circulatory and distributed rhetorical life. As a data driven method, data visualization can be understood as the act of collecting data, processing it, and making sense of it visually through statistical methods. As a rhetorical product, a data visualization can be as simple as an embedded static image that compares categorical data—like a pie chart or a bar graph—or a dynamic digital interface, allowing people to interact with the data and select how they are compared, the type of graph or chart that will be used, and the time/date range that will frame the visual description. From a design standpoint, a data visualization can be as basic as creating a graph or chart in Microsoft Excel or Google Sheets, using hand coded data from an observational study, or a data visualization can be as complex as calculating network-wide trends for an entire social network—in real time—to display lists of the hottest topics and #hashtags for various regions/locations/networks. As pervasive and ubiquitous digital objects, data visualizations are embedded in everything from websites and news blogs to mobile applications and virtual dashboards to also, as evident in Kairos (see also Gallagher and DeVoss), online scholarly journals. Yet, given the vast diversity of types and possibilities for designing data visualizations, one thing remains true for all of them: data visualizations are only as effective as their underlying data and the methods/methodologies used to produce them.

This point may seem obvious or trite, but as this chapter argues, it is a mistake to assume that the pervasive ubiquity of data visuals are free from the entangled problems of data-driven content that now dominate our professional and personal lives. Data has surely become “the new oil”—a popular metaphor demonstrating the powerful capabilities of data driven methods—but the oil metaphor also forces us to consider the harmful pollutants these methods have introduced to digital networks and the negative consequences they have on academic research. In this chapter, we are particularly concerned with how such pollutants constrain the access digital visual studies scholars have to the data they need to generate macroscopic studies of visual artifacts such as Obama Hope. Because of their shifting status as a resource of value—both economically and culturally—data visualizations maintain an inverse relationship with the underlying data that informs their making. In other words, as data visualizations have become more wide-spread and easier to produce, their underlying data have become financially lucrative and, therefore, less accessible. This generates significant challenges for digital visual studies scholars who want to use digital tools, replicate data-driven methods, peer review data-driven work, and access digital data to study visual artifacts at scale.

In the first section of this chapter, we explore some of these challenges not only so that digital visual studies scholars can remain cognizant of them as they move forward in this exciting area of research but also to identify the specific problem they pose for relying on Twitter data to do digital visual studies, which our own research presented here aims to address. But to be clear, we do not see these challenges as an essentialist disqualification of data-driven methods for digital visual research. In fact, quite the opposite. Much like the classical formulation of rhetoric, the tools and methods emerging in the an era of data-driven content function as both poison and remedy, depending on the methodology that frames their use. Just as data-driven tools and methods can be used to invade personal privacy, fuel targeted marketing, and perpetuate large-scale public manipulation, they can also provide powerful mechanisms for observing and understanding digital data. In terms of digital visual studies, they can especially be powerful tools for studying the circulation and rhetorical consequentiality of digital visual artifacts such as Obama Hope.

Consider, for example, one of the most important projects in the areas of data accessibility and the preservation of digital data—The Internet Archive, which works to preserve access to a broad diversity of digital sources with its WayBackMachine and other digital tools. As Blake Hallinan’s chapter in this collection makes transparent, the WayBackMachine provides a sustainable way to hyperlink webpages—ensuring that still-available webtexts do not suffer from link rot. For our research, we rely on a tool similar to The Internet Archive called DocNow, which “support[s] the ethical collection, use, and preservation of social network data.” One of DocNow’s most promising features is its catalog of Twitter datasets, which has enabled DocNow to grow their archive of Twitter data—they now have a wide diversity of social movement and political datasets. For example, they archived 39,622,026 tweet IDs related to public discussions of climate change from September of 2017 until May of 2019 (tags: climate change, environment, politics). They also have a #blacklivesmatter dataset, consisting of 17,292,130 tweets from January of 2016 until March of 2017 (tags: blacklivesmatter, activism). Taking advantage of this feature, the Obama Hope tweets dataset that we collected for this chapter is now available through DocNow, helping to ensure that this aspect of Obama Hope’s history is preserved and available for other researchers to use.

DocNow is just one example of how data driven methods and tools can be productive vehicles for driving digital visual studies research. In subsequent sections of this chapter, we explore how other data driven tools and methods can be useful for doing digital visual studies. More particularly, after describing what the humanities, broadly speaking, are calling macroscopic methodologies, we describe how deploying exploratory and descriptive methods (using MassMine and R) help generate macroscopic visual histories of digital artifacts such as Obama Hope. As evidence, we present findings from research in which we collected and analyzed 24,945 tweets that document conversations about Obama Hope between 2008 and 2016.

We ultimately argue that data driven methods easily extend the types of research we already do, by providing new insights into artifacts like Obama Hope, and eventually, showing promise to enable us to study the networks themselves—working to better understand the wide range of effects resulting from the transition from user-generated content to data-driven content in digital networks. In the "Content/Scale" section of this chapter we explain the transition to data-driven content, as well as looking at how macroscopic methods have been developing across the humanities to study massive collections of cultural artifacts. But first, the next section addresses the business models that build and sustain data-driven technologies—identifying how proprietary approaches to digital data inhibit the study of visual artifacts at scale, and how macroscopic methodologies confront issues of scale across the humanities.

Next SectionBack to Top