Wednesday, September 13, 2017

Making a Case for “Small Data” Humanities Projects

By Kenton Rambsy

The International Data Cooperation (IDC) estimates that by the years 2020 we will have generated 50 zettabytes (or 44 trillion gigabytes) of information. Accordingly, over the last 10 years, people have grown fond of the term “Big Data” to describe the mass proliferation and collection of online information.

Even in humanities, the term has gained some traction. Particularly, in literary studies there have been many projects that focus on using topic modeling and text mining software(s) to sift through hundreds of books to observe broad characteristics of literature. In many respects, however, I think the concept of “big data” is a misleading when applied to the study of literary texts.

When discussing my work in digital humanities, some have labeled my interests in collecting and analyzing various features related to African American literature as a “Big Data Project.” I suspect this comes from the idea of “distant reading.” Franco Moretti’s concept of “distant reading” or the process of “understanding literature not by studying particular texts, but by aggregating and analyzing massive amounts of data,” has been at the center of data conversations in the humanities. Where close reading relies on analysis about the apparent inner workings of a literary text, distant reading compiles data about many, many works.

I’ve relied on earlier work by Franco Moretti for considering the consequences of integrating quantitative data to make broad assessments about black literature. Unlike Moretti, however, my research may be classified as a small data humanities project. Particularly, in my graduate course, Lost in the City, we sift through much smaller text samples. Computers are still needed to navigate 235, 417 words in a body of works and to identify recurring and unifying features of a given text. Also, digital methods are still needed in order to process and connect quantitative information to thematic themes in literature.

In regards to Jones’s short fiction, however, we are more concerned with the relationships between various data points. This comes from the belief that all data we collect pertaining to Jones’s short fiction can be fundamentally networked to reveal the links and interconnectedness between his works. In particular, using a smaller sampling helps us to clarify the significance of Jones using D.C. as a recurring setting across his two collections. We pay attention to how often (and in what contexts) references to landmarks appear, what directions are taken by characters, and what city quadrants are the settings for the stories.

We do not use massive data processing tools or methods to explore Edward P. Jones’s short story collections. Instead, we rely primarily on Voyant suite in order to highlight distinct areas of interest. These types of “small data” analyses clarify why geographic representations are so significant in literature—and especially in illustrations of a formerly majority black city. Our methods facilitate our ability to regroup more focused characteristics related to the city of Washington, D.C. and understand why place plays such a major role in the portrayal of black characters.

Lost in the City: A graduate-level literature course on Edward P. Jones

No comments: