The American Historian

From Index Cards to Text Files: Digital Workflows for Today’s Historian

Chris D. Cantwell

One of the first things I did when I started graduate school in 2003 was to contact a number of established historians whose work I admired. As a newcomer, I was eager to heed any advice these senior scholars might offer. One historian in particular was very generous with his time, talking with me in his office for over an hour about writing, teaching, and the academy. When the conversation turned to research, this established scholar pointed to his bookshelf and pronounced, “One day you too will have these things filling your shelves.” At first I thought he was talking about a row of published work. It soon became clear, however, that he was referencing the index-card cabinets that sat on his shelf. “Make sure you organize your cards early,” he advised, proudly thumbing through thirty years’ worth of 3” x 5” notecards. “I can’t see researching without them.”

So much of the advice this generous historian gave me turned out to be right. But his preferred method for organizing his research has undergone a revolution in the last decade. While scholars once primarily trafficked in material objects, historians now work with a great deal of digital material. Notecards, photocopies, and microfilms have been largely replaced by PDFs, jpegs, and searchable databases. The change in the types of materials historians use has also been accompanied with an increased scale of abundance, as a single scholar can now download in an afternoon what some scholars acquired in a lifetime.

Given this deluge of digital primary sources, it is incumbent upon today’s historians to establish a digital workflow that manages it all. While such a process on its face might seem as bewildering as the proliferation of primary sources themselves, there are thankfully a number of tools and tactics that can facilitate the organization of our increasingly digital research archives. Databases and asset-management systems can help store the digital content historians now acquire, while bibliographic software can connect that material with our writing and research. While the forms this work takes might be new, the principles that guide their organization are as old as the index card. In the end, every good digital workflow ultimately comes down to two activities historians have long engaged in: managing our files, and organizing our citations.

File Management

Just as there is no single perfect, catch-all research method, there is no single “right” digital workflow. The workflow you choose will depend on your proclivities as a researcher and the kinds of materials you use. But every digital workflow begins with managing the files we put on our computers. Before you download an article from JSTOR or upload a photograph from an archive, you should devise a file structure in advance that outlines how you will store not only the material you have now, but also the sources you will acquire later.

I, for example, prefer to organize my files by document type. On my hard drive is a master folder labeled “Research,” inside of which are two more folders named “Primary” and “Secondary Sources.” In my “Primary Sources” folder are a host of other folders relevant to the material I most often use such as “Annual Reports-Proceedings,” “Directories,” and “Archival Sources,” where I finally place my PDFs, photographs, and files. My “Archival” folder is actually parsed even further, containing a folder for every repository I visit, and within those folders are separate folders for each collection. I opt to organize my material this way and not by project in part because I prefer to have complete control over the location of my sources. And as I’ll discuss in a moment, this general organization scheme also works because there are citation programs that allow you to connect a single file to multiple projects, negating the need to organize your original files by research tasks.

Of course, creating space on your computer for research files is not the same thing as populating your folders with content. This also requires some preliminary decision-making, especially when naming files. Here again, it’s best to keep a consistent pattern for easier recall. When downloading contemporary journal articles from JSTOR or full texts from the Internet Archive, for example, I’ll often name my files by “author_title_year.pdf” before I put them in the appropriate folders. This ensures I’ll be able to search and find them easier later. Many of today’s operating systems such as OS X Yosemite for Mac and Windows 10 for PCs also allow you to add additional metadata to your files such as subject tags, so you can associate key terms, people, or places with a file. This, again, facilitates easier searching and recall.

For archival material I have personally digitized, however, file management becomes a little more cumbersome. One can return from a single archival trip with thousands of individual photos. Putting those raw files on your computer not only takes up a lot space, but also has the added disadvantages of generic file names like “IMG_1523.jpg” that turn a single three-page letter into three separate files. Thankfully, there are a number of new tools that can rename and reformat your files in batches. For fans of Apple, Macs come preinstalled with a program called Automator that can turn entire folders of large JPEG files into a single, more moderately sized PDF file. PC users can download a free program called Belveder that also reformats and renames large collections of individual files. In addition to these free and factory-installed options, there are also a number of paid programs such as NameDropper (Windows) and Hazel (Mac) that similarly allow you to let your computer files reflect the coherence of the documents you viewed in the archive.

Citation Organization

With your photos, files, and PDFs uploaded and organized, you are now ready for the historical discipline’s other main task: analyzing and interpreting all of this material. Here again digital tools have replaced the notecard, with Word documents and text files now holding the earliest insights scholars have about their sources. For some, creating a file structure for their research notes that follows or parallels their source material may suffice as a digital workflow. For example, I’ve occasionally saved my notes from secondary sources in text files I place in the folders described above. But for those wishing for a more dynamic research repository, the digital age has yielded a number of indexers and citation-management programs that can streamline and even automatize a great deal of the research process.

Of particular interest to historians is the plethora of bibliographic-management software that allows researchers to create, sort, save, and even share our reference material. Though the specifics of each program vary, software such as Bookends, EndNote, Mendeley, Sente, and Zotero all provide an interface that allows users to create lists of bibliographic entries that they can then manipulate in numerous ways. Users can attach research notes to each entry, as well as files from their computer, calling up primary sources with ease. Programs such as Zotero and Papers even automate this process, pulling both the bibliographic information and the full-text source from databases such as ProQuest or JSTOR with just a click of your mouse. Once they’re created, these entries can be connected with multiple research projects—allowing the same notecard to be in two places, so to speak. They can also be enhanced with additional metadata such as subject tags or related files, which can then be searched with ease, potentially revealing a connection between sources we might never have noticed in a box of index cards. But perhaps the most appealing quality of these citation programs is the ways they can facilitate the writing process by working with most word processors to automatically fill out and format your footnotes and bibliographies as you write. If you change or remove a footnote in revisions, these programs can then update the rest of the footnotes in accordance with your citation standards.

The major difference between citation-management programs is often the cost. Zotero and Mendeley, for example, are free while EndNote can cost up to $250. Proprietary citation-management software such as EndNote typically offers access to ongoing customer support as well as an occasional proprietary database of journal articles. So if you think you will need regular assistance in developing a digital workflow, proprietary software might be the way to go. But with a little extra effort to master the learning curve, using open-source software can enhance your research process even further. Zotero, for example, has a robust community of users that maintains a forum where individuals can turn for help and collaboratively develop new features for the program. Originally built by historians connected with George Mason University’s Roy Rosenzweig Center for History and New Media, Zotero can now perform a number of basic digital-humanities research methods through a suite of freely available plug-ins.

Given the dynamism of bibliographic software one might consider turning over the organization of their files to the programs altogether—especially when they automatically download files for you. Indeed, alongside citation-management programs there are also a number of indexers, or what a colleague of mine calls “everything buckets,” that can manage all of your personal information at once. Programs such as DEVONthink, EverNote, and SOHO Notes work almost like an annotated hard drive, allowing users to create research notes, embed documents or files, and then search all of these with subject tags. A number of scholars use these programs as their primary workflow. Yet I continue to manually connect entries in my preferred citation software (Zotero) with files so I can preserve the integrity of the file structure I laid out above. I do this because it gives me complete control over the location of the raw data of my research on the off chance that the program I use suddenly becomes unsupported or obsolete. Other historians should similarly consider if and how they can export their data from tools they use to ensure they, and not some third-party program, owns and controls their work.

The final component of every good digital workflow is a plan for the preservation of all of this data. While a spilled glass of water might merely have dampened a few index cards of the senior historian who advised me long ago, such an accident could wipe out an entire research collection if spilled on a hard drive. It is therefore imperative for scholars to have a system in place to create multiple copies of their files in multiple locations for safekeeping. Cloud-based storage spaces such as DropBox and Google Drive work great for your files, but you should also think about getting a program that backs up the entire hard drive of your computer that preserves your software settings as well. Both Macs and Windows come pre-installed with programs called Time Machine and Windows Backup and Restore, respectively, while curiously named programs such as Mozy, BackBlaze, and Spider Oak can be purchased to back up your computer remotely via an Internet connection.

Back to the Future

We often emphasize the “digital” when talking about developing digital workflows, yet the reality is we are still ultimately performing the very traditional tasks of managing our files and organizing our citations. Digital tools might allow us to do this in new ways and at greater scales, but even the most high-tech workflow begins with a scholar carefully considering how to organize materials for the easiest recall. And nor should we think transitioning to a digital workflow means the eradication of paper from our workaday lives. Despite my own commitments toward building a paperless research archive, I still have piles of printed sources and rolls of papers on my shelves. I do this in part because some documents, such as handwritten manuscripts and newspapers, simply don’t translate to discrete digital files well—and who wants to read tiny print on a laptop screen? But I also continue to keep some of my sources in print simply because at times I prefer to underline, highlight, and jot down thoughts on the page. Again, there is no “right” digital workflow. Every scholar should construct a system that best suits his or her needs.

But the benefits of building a digital research repository are many. For starters, digital files are far more portable than filing cabinets. I have a colleague who tries to do most of her revisions on a beach, the sun and the waves a reward for the drudgery of filling out footnotes. Digital archives also bring a certain kind of serendipity to the research process, revealing connections between individuals or organizations that may never have been apparent in stacks of printed files as we tag and search our sources. But to me the greatest benefits of a digital research archive are the ways digital files can be manipulated in ways print files never could. With digital files, scholars can begin to employ new, born-digital research methods such as text mining, topic modeling, and spatial analysis. Of course, the process of adopting such research methods requires an article in itself. But the process begins by transitioning our sources to digital formats. Because as much as organizing digital files might be based upon the principles of old, what we can do with these files is entirely new.

Further Reading:

Hattem, Michael D. “Digital Workflows for Historians.” The Junto: A Group Blog on Early America. June 18, 2013.

Landrum, Shane. “Camera, Laptop, and What Else? Hacking Better Tools for the Short Archival Research Trip.” Cliotropic, 2010.

Posner, Miriam. “Embarrassments of Riches: Managing Research Assets.” Miriam Posner’s Blog. November 28, 2011.

University Library, University of Illinois, Urbana-Champaign. “Digital Historian Series: Using Digital Tools for Archival Research.” LibGuides @ University of Illinois Library. March 9, 2015.