New Forms of History: Critiquing Data and Its Representations
Frederick W. Gibbs
As the historical record becomes increasingly digitized, historians now have new research methodologies and modes of dissemination at their fingertips that have virtually no precedent. Many of these opportunities center on an increasing use of data and its representations in historical analysis, interpretation, and writing.
This is not merely cliometrics 2.0. Critical theorizing about encoding the historical record into a digital and manipulable form have transformed the process of quantification, pushing its complexity far beyond that of earlier efforts and debates. Furthermore, the notion of data has fundamentally changed. First, vast amounts of data from diverse sources can be assembled and (re)combined more easily than ever before. Second, myriad new digital tools have made representing data—through databases, interfaces, maps, and interactive visualizations—easily accessible to a wide range of historians. Third, not only is the existing historical record becoming digital, the historical record we are creating right now is fundamentally, and at times exclusively, digital. We as historians are going to have only more and complex data along with increasingly sophisticated ways of creating and consuming it in our future.
New kinds and uses of data pose new challenges for the discipline, each of which require their own theorizing and refinement through practice. How do we give properly scholarly credit for creating data that facilitates research? Where will data be published sustainably? How can data be reused most effectively? The critical question I would like to take up here, however, is how our understanding of, and interactions with, the "raw" materials of research will be facilitated by data, visualizations, and interfaces. Of course historians have always excelled at representing the historical record—provided that it took a narrative form. In this sense, we might consider the written narrative as a particular kind of representation, if not visualization, of data. As with any other kind of abstraction, it has its own strengths and weaknesses. So why do we continue to privilege only one kind of representation in historical scholarship? As we increase the variety of representations, how do data, databases, visualizations, and interfaces that help others explore the historical record and our interpretations fit into our typical peer review processes? How do these new products become integrated into the scholarly ecosystem? In what follows, I advance some arguments as to why data, its visualizations, and tools that help us interact and understand it, must not be treated as mere augmentations or enhancements to traditional forms of history. Rather, they should be subject to the same kinds of interrogations typically applied to narrative analyses and explanation.
Aspects of Novelty
On one hand, data visualization is hardly new in historical scholarship. Maps, charts, and graphs, among many other forms, have always been used to explain, clarify, and enhance long-form prose. On the other hand, two particularly noteworthy differences should encourage us to consider data, visualizations, and interfaces in new ways.
First, as the volume of digitized historical data grows, the visualizations that help make sense of data at large scales will play an increasingly significant role in our analyses and interpretations of the historical record. They have a new element of necessity. Whether considering data mining techniques (such as topic modeling and network analysis), tools for extracting, managing, and analyzing many thousands of images, or mapping vast quantities of data, visualizations are not merely complementary to more traditional narrative explanations but in fact enable research in the first place. More importantly, because of the sheer scale of data, visualizations are increasingly not designed, but computed.
Second, visualizations are becoming not only necessary for certain kinds of scholarship, but also easier to make even for simple illustrative purposes with free, off-the-shelf tools. With appropriately formatted data (not to trivialize this crucial and often laborious step), web services like Overview make visual topic modeling easy; tools like NodeXL and Gephi greatly facilitate complex network analysis; and mapping tools like QGIS, while not exactly intuitive, help users sidestep prohibitively expensive software that was required to make even rudimentary maps only a short time ago. Access to and facility with such tools mean that we now have—and will increasingly have—new perspectives on the historical record.
These aspects of novelty raise new epistemological questions. Although historians naturally recognize visualizations as symbolic representations rather than as "realistic" depictions of the historical record, sophisticated visualizations and interfaces are inevitably embedded with accidental signifiers, making arguments that their authors do not necessarily intend. The danger here is that it becomes difficult to distinguish features of data visualization and user interfaces that arise from deliberate design choices—and thus convey useful information—from those which are simply arbitrary artifacts of automation. Such danger only intensifies as datasets, tools, and visualizations become increasingly complex and routinely generated through automated and often only partially configurable algorithms and processes.
Many of the visualizations briefly mentioned thus far (topic models, network diagrams, GIS maps) may seem like the domain of those working in the digital humanities and can therefore seem irrelevant to those uninterested in taking up such methods. But I want to emphasize that data criticism, which includes both critical analysis of data and its representations, is not a digital history problem; it is a history problem. And it is one that we must take seriously if we want to continue to be effective evaluators of our colleagues' work. The following two sections briefly attempt to refute two of what I consider the most common reasons as to why we don't talk about data representation and interaction as much as we should.
Visualizing Popular Science. While visually intriguing, the historical insight here remains difficult to discern. But it certainly provides a basis for an interesting discussion that would scarcely be possible with text alone. Photo by Jer Thorp under a creative commons 2.0 license.
Historians are usually good critics of each other's work; it's one reason why we so highly value the peer review process. We routinely focus a critical eye on the use of evidence, methodological soundness, viability of interpretation, and strength of argumentation, to name a few evaluative categories. Developing facility for such critical inquiry is nothing less than a cornerstone of professional training. And while we are all well aware of the theoretical separation of form and content, we know that our published scholarship and our responses to it continually blurs the line between them. For instance, it is almost impossible not to see a sloppily written (not merely stylistically uninteresting) paragraph as reflecting to some extent the quality of the research behind it. At best, it's simply hurried writing. However, we may just as well take ambiguous expression as indicative of ambiguous thinking and possibly shoddy research.
We do not, however, level the same critiques toward visualizations and data interfaces. Yet why doesn't the quality of visualizations reflect the quality of scholarship the same way text does? Why don't we hold the aesthetics of data and its representations to the same standard as other forms of scholarly work? There's an easy excuse at hand: it remains easy to see data and design criticism as not part of what historians do because it is not typically part of the discipline's foundational training regimen. But such an approach is a mistake. An increasing number of courses, workshops, and seminars in digital history and digital humanities now offer theoretical and practical instruction in organizing, processing, visualizing, and creating interfaces to data. Historians are already doing it, but not always well, and the fringe-ness of these approaches and products decreases daily. Perhaps more importantly, historians have always been interpreters and communicators of the historical record, continually adopting new theoretical frameworks and methodological approaches to reinterpret the past in light of the present. To refuse to engage with historical data and its representations is to refuse to engage with the future of history.
Process over Product
Especially among those historians who support new methods and forms of publication, it can seem like to criticize the product—from computationally generated diagrams to websites that provide interactive visualizations or an exploratory interface—would be to criticize the new methods or even the effort to create more accessible forms of historical scholarship. To question the design choices of a diagram or website, to interrogate the certainty that it portrays, or to dispute the interpretive payoff, is, by extension, to undermine the research methodology as a whole. These concerns are well founded: these methods are new, interfaces to data are new, and maybe the process is indeed more important than the product. Fair enough. But there are two reasons why critical silence might be a bad idea.
One, the sunrise for methodology—a magical time when the historical sky takes on new interpretive hues (see rhetoric from any early digital history grant application)—has long passed into a demanding and unforgiving midday sun. Our visualizations and data interfaces must provide or at least suggest historical insight (not that they should necessarily be used as proof, but that's a separate epistemological conversation) or shed new light on old questions, rather than simply present a novel view of the historical record for novelty's sake. This is, after all, how we critique the value of all other historical work.
Two, there seems to be at times a fundamental incompatibility between digital historians' emphasis on the importance of process and the methodological opaqueness of most visualizations and databases. Any representation of data, no matter how well designed, is woefully incomplete. We need the data, encoding assumptions, data correction strategies, algorithmic disclosure, and methodological transparency. Historians would naturally scoff at any kind of analysis without adequate citation of evidence. The handling of data and its representations must be subject to the same critique.
To be clear: rough, incomplete, or underdesigned representations of data can be immensely useful. Still, we must require methodological transparency of their creation so that we can more deeply engage with the many layers of meaning embedded into the visualization and what its creator claims it represents. A stronger discourse around visualizations can help creators in the same way that textual review and critique does now. In many ways, visualizations and interfaces can provide a gateway into insight and analysis that data or text alone simply cannot. But design criticism needs to be a part of how we engage with digital scholarship at every level from blog posts, to printed articles, to elaborate web projects.
Where people run in San Francisco. It may be tempting to associate this kind of map with modern social media data, but even a sparse historical record visualized spatially provokes new questions about where things happened and where they did not. And it's a locus for important critical questions about the data itself-and what cannot be represented. Photo by Eric Fischer under a creative commons 2.0 license.
To be a bit more concrete, I'd like to pose a few sample questions that may be useful to keep in mind as we more thoroughly engage with provocative visualizations and interfaces. These are not meant to be exhaustive, but hopefully they can advance the conversation about using data in history and data visualization, as well as facilitate more explicit discussion about the methods and processes behind them.
- Why is it the way it is?
Even what appear to be simple representations of data are complex entities, full of design "choices," whether their creator deliberately considered them or not. When I say that we need more explicit critiques of visualizations and interfaces, I do not mean that we should simply nitpick about color choices or fonts (although they're important!). But design matters, even when the author is not interested in design per se. All visualizations are necessarily designed, even if by code and algorithmic computation. Are the relationships between the colors significant or arbitrary? Are the distances between entities meaningful or random? How are readers/users meant to engage with the diagram? How might database architecture implicitly favor certain kinds of analyses and interpretations? What does an interface prohibit as well as allow?
- What else might it be?
Interpreting representations of data presents new challenges for most historians, particularly because we are interested in not only data, but also the uncertainty, the ambiguity, and the context of the data. Such concerns become even more important amidst increasing technological and computational sophistication. Visualizations are not just about our data, but the notdata—the gray area around the necessarily reductionist and discrete entities that appear in diagrams and maps and databases. How well does the visualization (or a group of them) represent the interpretive work done to transform very specific local data (for example, information from a particular archival document) into a comprehensive diagram of how sources and their authors connect to each other? Does a particular visualization end up obscuring more than revealing? How can different visualizations be juxtaposed to create a more holistic representation?
- How might visualizations and interfaces prompt new questions regarding sources and interpretation?
Discussion about data, visualizations, and interfaces can provide an important locus for dialog about the nature of historical sources and methods in broad (even nondigital) terms. Especially at large scales, it forces us to more explicitly confront the ambiguities and insufficiencies of the historical record at our disposal. That quantification and visualization can yield a false sense of certainty and objectivity is a hollow critique. After all, doesn't the narrative form imbue a level of generalization and artificial causality (if not certainty) to our interpretations? Critiques of data, visualizations, and interfaces also provide an opportunity to talk about the methodological artifacts that arise when viewing the historical record at different scales, from the closest to the most distant readings. We must insist on such multiplicity in data representation because the inevitable distortions are informative, not aberrations to be minimized or explained away.
- How much do complex, data-driven visualizations need to developed solely through computational means?
When creating representations of data largely done through software (and especially at large scales), must representations remain free of direct manipulation after an initial algorithmic rendering? Is it acceptable to alter a computed representation in order to highlight a particular feature? To what extent might that be considered subversive or misleading? To what extent is that simply better communication? Is the visualization more about the unadulterated output of the tool (even if unfortunately treated as a black box) or about communicating an interesting historical phenomenon?
Epilogue: Toward a New Data Fluency
Sophisticated visualizations and interfaces often represent significant data-wrangling and algorithmic success, and for that they deserve praise in their own right. By no means do I intend to minimize the often exhausting preparatory work, programming skill, and creativity required to produce even humble visualizations from modest (and especially originally nondigital) datasets. However, such representations (whether diagrams, maps, databases, or entire websites), while they might represent computational success, do not always equal analytical, interpretive, or communicative success—all prerequisites to sound historical scholarship.
Not all historians will be enthused about gathering, producing, or working with data, which is fine. But disinterest alone is an insufficient reason to ignore it. The academic review process, however imperfect, usually succeeds not only because authors and reviewers share field-specific knowledge, but also because they share common methodological ground, even when allowing for variations across disciplinary specialties. But do we have any common ground for dealing with data and its representations?
Although clearly helpful in some cases, we do not necessarily need to understand the precise algorithmic processes behind data representations and interfaces in order to offer useful reviews and evaluation any more than an art critic must be an accomplished artist. But we should be able to gain insight from any visualization—coupled with how its creators use it and explain why we should take it seriously rather than as a computational artifact. Such critique of data and its representations should be an explicit part of disciplinary training even when the methods are not. That is, they should be in core history courses, not just digital ones. And of course fluency with data, visualizations, and interfaces is useful not only for the future of history, but also in making it easier to apply historical thinking in careers outside the academy, where it is all too painfully obvious that data rules.
Fredrick W. Gibbs is an assistant professor in the History Department at the University of New Mexico. His work addresses the theoretical and practical challenges at the intersection of digital humanities, history theory, information design, and data criticism.