Category: Getting Started in DH

2014-2015 Workshops and Events

We are excited to begin yet another year of Demystifying Digital Humanities workshops! In addition to the workshops, we will continue to offer quarterly “Play with Your Data” development and feedback sessions, DH office hours bi-monthly, as well as a new quarterly event–DH Happy Hour! This new event is orientated around our primary objective which is to help facilitate collaboration by exposing the fantastic resources and training opportunities available on campus, and to make more visible the variety of digital projects in which both faculty and graduate students are engaged. Below is a detailed list of the dates and times of the autumn quarter events.

Autumn 2014 Workshops:

Saturday, October 18th: “What is Digital Humanities and Why Does It Matter?” 9:30AM-12:30PM CMU 202 (registration is still open)

Saturday, October 25th: “Managing and Professionalizing Your Online Presence and Identity” 9:30AM-12:30PM, CMU 202 (registration is still open)


Autumn 2014 “Play with Your Data” Session:

Wednesday, October 29th, 11:30-1:00PM, Location TBA


Autumn DH Office Hours (Location OUG):

Friday, October 24th 3:30-5:00PM

Friday, November 7th, 3:30-5:00PM

Friday, November 21st, 3:30-5:00PM

Friday, December 5th, 3:30-5:00PM


DH Happy Hour:

Thursday, October 16th 5:30-7:30 CMU 202








Workshop #1 Slides: What is DH, and Why Does It Matter?

Edited: see below for minor updates.

Here are the slides for the first workshop for 2013-14!

They’re also available as a PDF.


Additions: One of the projects that participants looked at during the workshop was the AutoBlake Twitter account. More information about that Twitter account, created by Roger Whitson, is available through his post (and talk for ICR 2013), “autoblake: repurposing Blake’s approach to critical making as a research methodology”.

Also, here’s the Simpson Center CFP for funding to attend DHSI at the University of Victoria in June 2014. We should also note that funding is available through DHSI itself — info is at the DHSI Scholarship page.

“Doing” DH at University of Victoria’s Digital Humanities Summer Institute

Attending University of Victoria’s Digital Humanities Summer Institute for the second time was a pleasant return. I felt less like an interloper this year and more like what I suspect should be the outlook of an attendee–i.e., a student, a collaborator, a soon-to-be more informed practitioner of DH. Last year I signed-up for a course on DH pedagogy, which “provided a ‘best practices’ approach to using digital humanities tools and processes for the purposes of communication, collaboration, and facility of research.” Although I found that experience to be an informative entry into the critical conversations surrounding DH, I returned to UW with only a cursory understanding of a few tools and a feeling that I just wasn’t “doing” DH. (I created a Google site to store and share the content from that course, available here.) Despite that fact, I experimented with a few data-mining programs and designed a text mining for an English Gateway course, which was received with more interest than resistance (the vain hope of any instructor).  This year I signed-up for an intensive, tool-specific class on digital spatial analytics, or what is more commonly referred to as Geographical Information Systems (GIS). What follows is a brief overview of what GIS is, what the course entailed, and then a brief reflection on the overall experience at DHSI.

Overview: What is GIS?

A geographic information system (GIS) integrates hardware, software, and data for capturing, managing, analyzing, and displaying all forms of geographically referenced information. GIS allows us to view, understand, question, interpret, and most important, to visualize data in many ways that reveal relationships, patterns, and trends in the form of maps, globes, reports, and charts.

Innovative applications of GIS in the Humanities abound, from Google Lit Trips, where we can follow the journey of Stephen Dedalus in Ulysses, to Dan Edelstein and Paula Findlen’s historical project at Stanford on “Mapping the Republic of Letters,” and even in disciplines like Art History, where multi-spectral imaging has been used to photograph paint clusters and specific pigments in the works of Jackson Pollack (The Geography of Art: Imaging the Abstract with GIS). Ultimately, GIS provides new roads for humanists to consider the significant relations of place, space, artifact, and memory visually over time.


The course I took this year, “Geographical Information System in the Digital Humanities,” was taught by Professor Ian Gregory from Lancaster University and aided by Norma Serra, a graduate student in Geography at University of Victoria. In the initial three days of the class we were provided with clear, step-by-step instructions, working through different aspects of ArcGIS, Quantum GIS (an open source program), and Google Earth.  Because it remains the single-most used program by universities and research institutes, we worked predominantly with ArcGIS, and were provided with two entirely different disciplinary sets of data: one from the social sciences that entailed the mortality rates of infants in the UK during the early 20th century; and one from the humanities that included a data set of places names from the poetry of Thomas Gray, William Wordsworth, and Samuel Taylor Coleridge. Working with data sets first from Geography, and then with the humanities, helped to convey the powerful transferability of GIS platforms and spatial data to frame and test critical questions through this form of visualization.

Once we successfully navigated the fundamentals of ArcGIS , we were encouraged to map our own data sets in the remaining two days of class. Several students came with data they had compiled over the last year or two with hopes to refine their understanding of GIS and fine tune the visualization of that data. Others, like me, came without any data, only a few half-baked ideas about how they might incorporate GIS into their work. This, I believe, is one of the great selling points of DHSI, in that the attendees are from a variety of different disciplines and possess varying degrees of technological knowledge, making the classroom sessions productive environments for collaboration and learning. In the end, I’m glad that I did not come with a fully formed project because it allowed me to be more open to the potentialities of GIS for my work and eventually my teaching.

Because my dissertation focuses on the impact of popular visual culture on the “high” literary marketplace in the 1800’s, I spent those final two days compiling and plotting a data set of place names of the visual cultural phenomena mentioned in “Book Seven” of William Wordsworth’s autobiographical poem, The Prelude. (I will showcase this map and discuss the process of generating it in more detail at the DMDH Showcase this fall.) In addition to locating the places where these performances occurred, each place name plotted on the map contained some form of metadata: lines from the poem where that place was mentioned and historical information regarding that theater, gallery, or exhibition. What made generating the data challenging, initially, was that in this section of the poem, Wordsworth rarely names specific places. More commonly, he cites only an actor or play by name, or more allusively, simply describes the particular experience shorn of any identifying markers. But this was just one of many challenges I faced regarding data collection.

Once I identified the concrete locations referenced directly or indirectly in the poem, I encountered my next hurdle–using Excel, the program used to compile the data and later export it to ArcGIS.  After a few basic questions were kindly answered by my fellows DHSI classmates, I was able to complete the data set and map it. But it is not that simple: once you have the data, you must choose what kind of map best represents the data. In my case, it was a Google Earth image because this was the most readily available image of London I could obtain free of any copyright restrictions. (To provide a greater level of visual authenticity, I am currently working to obtain borrowing rights for a digital reproductions of a 1805 map of London.) And though these visual cultural experiences were from the early 1800’s, these locations, fortunately, are still very much a part of the London urban landscape.  Once I plotted the data, a clear image of the visual cultural experiences available to Londoners at the turn of the 19th century emerged–as mentioned by one writer, of course. This interactive map on face value, however, does not unlock a missing piece to Wordsworth’s life in general or his poetry in particular. Indeed, GIS it is not the Rosetta Stone for the humanist, something geographers know all too well; instead, it will allow me to pose critical questions about an individual in relation to place and time, to consider the proximity of other significant events (or people) who were left unmentioned that eventually, in the case of my investigation, bring about the development of a modern cultured individual–to investigate how, in the words of Percy Shelley, culture both makes the individual and is made by them.  In a sense, I equate this type of investigation with an astrophysicist’s quest for dark matter, prompting the question: what can we learn from the negative space of these place names?

The Takeaway:

The five days at DHSI opened up many fruitful roads to pursue with respect to my own work and gave me the sense that I was actually “doing” DH, rather than merely thinking (or writing) about its methodologies and goals. Moreover, it has intensified my understanding of the relation between space and time in literature. This happens to be no coincidence, I believe, since the bulk of my work revolves around this dichotomy–the introduction to my dissertation takes an unnecessarily long digression (as most dissertations do) to foreground the cultural importance of Gotthold Ephraim Lessing’s essay “Laocoön, or On the Limits of Painting and Poetry,” which along with Edmund Burke’s writing on the sublime, places a wedge between the spatial (painting) and temporal arts (poetry). Indeed, space cannot be divorced from the critical conversation when we consider a work of literature as a cultural artifact, cue the “spatial turn” in the humanities. This turn, according to David J. Bodenhamer, “began with the pioneering work of social scientists such as Clifford Geertz, Erving Goffman, and Anthony Giddons and has been advanced in the humanities through the work of Michel Foucault, Michel de Certeau, Edward Said, and others whose investigation of space took the form of a focus on the ‘local’ and on context” (15). (A more detailed article on the “spatial turn” in the humanities is forthcoming). Spatial analytics allow humanists to explore memory, artifact, and experiences that occur in spaces and across time, but more importantly, I learned that GIS allows one to create a publishable text, and in some cases these digital texts carry a greater academic cultural capital that provide opportunities for junior scholars to publish a digital project in a more timely manner than is typical of academic print journal articles. This is precisely what I’m working towards with “Mapping the ‘Mighty City’: Wordsworth’s ‘Residence in London,’” so the next step for me is to explore (and exploit) the resources that we have at the university.  Fortunately for me and other budding spatial humanists, people like Luke Bergmann in the Geography department were hired for their expertise and innovation with GIS. It’s now a matter of taking that intensive training at DHSI and furthering it here with the resources available to us at UW. This, again, was one of the drawbacks I encountered upon returning from my first year at the institute: what resources were available to me regarding DH pedagogy? Who taught DH pedagogy?  I found that after learning about a specific DH tool, I was better prepared to put that knowledge to use and locate the network of resources available to assist me furthering this project.

But DHSI is much more than a classroom experience; there are unconferences, lightning presentations, and longer, more formal paper/project presentations, not to mention the lunchtime and dinner or tavern conversations. All of these events (both formal and informal) contribute to the uniqueness of DHSI, where the attendee can immerse herself in a constant flow of all things DH, find people working on similar projects and get inspired by those projects, in a friendly, non-hierarchical environment, where dean and graduate student, librarian and programmer come together to learn from each other and return to their home institutions with a new understanding of the resource networks available to them. Moreover, DHSI and DH has taught me that the solitary role of the graduate student in the humanities is a fate to which we do not have to resign ourselves, and I am discovering that some of my best work is coming out of these collaborative experiences and the process of re-imagining different ways to approach literary studies through other disciplines and technologies. But has this technology transformed my scholarship in a way that drastically sets me apart from other humanists? Of course not. DH does not transform the humanist into a Borg-like scholar, but instead can shows us the manifold nature of scholarship, and most importantly, it exposes the interdependence and deep integration of the disciplines, especially valuable in an era where the perception of the utility of the humanities is overshadowed by more pragmatic areas of study, such as engineering or business.  And though at times I felt like Neo as information was ceaselessly uploaded subject after subject for five days straight, I came away, this time, with a (developing) skill and several future collaborators with which to conspire, proving DHSI to be an invaluable experience–one I would recommend to any humanist, luddite or not.

Should My DH Project Be the Focus of My Dissertation?

Should I be doing a digital dissertation?

These are two different questions — but both are on the minds of many graduate students who are getting started in and excited by the digital humanities. It’s not surprising, given widespread uncertainty about whether a traditional dissertation is still the best milestone for a doctoral degree, and growing excitement about the potential of the digital humanities.

Amanda Visconti, who’s writing a digital dissertation, provides the best advice I’ve seen on what you need to be aware of at the beginning, and in terms of the actual product, if you’re planning to write a digital dissertation. A lot of her advice is relevant to the question that I’m asked regularly:

Should my digital project be the focus of my dissertation?

In other words, you want to build a brilliant web-based thingy, and write a more traditional dissertation about it that will satisfy your university’s requirements for submitting your dissertation in print or electronic format. At first, this might sounds much more manageable than doing a traditional dissertation and a digital project that the same time. Here’s what you need to consider.

Do you know how to build the project?

When I say “know how to build,” I mean that you’ve already become proficient in the technologies that you’ll need (HTML, CSS, MySQL, Ruby, etc.), and that you can make a realistic and steady schedule for development that you can stick to. If this schedule includes the item “Learn how to encode in TEI at DHSI in June,” then your schedule probably isn’t steady. It’s speculative, rather than realistic — you don’t know what your learning curve will be for working with the technologies that you need. You do not want the completion of your dissertation to be dependent upon your successfully building a digital project — especially one which will be technologically innovative or groundbreaking in some way.

Do you have the resources that you need to build, maintain, and publicize the project?

If your project requires specialized software (either for mapping or modeling purposes, for example); or, if your project is technically advanced enough that you need a developer to help with any glitches or crashes, then you need more resources and support than a graduate student writing a traditional dissertation. Does your department or school have grants or fellowships that will provide this support? It’s important to find out about this early, and also to find secure funding. You don’t want to start a dissertation project that’s dependent on your successfully competing for a grant that’s two years away. In regards to publicity, while it’s easy to think of it as free, provided you’re good at using Twitter, it’s still a substantial time investment — and thus, it, too, is a resource that you need to include in your planning budget.

What will you have to show potential employers after you graduate?

This question is partly about where your project lives. If it’s being hosted on university servers, then you’ll almost certainly need to move it to your own site when you graduate, meaning that you need to budget for web hosting.

If you’re planning to demo the project as part of your job search, bear in mind that you will almost certainly want to give it an aesthetic facelift (in addition to any updates that the platform may need to keep functioning).

However, this question is also about how your potential employers (especially academic ones) will respond, both to your digital project and to your textual dissertation. You may encourage them to examine both, but you need to consider what will happen if they only look at one or the other. How interesting will your dissertation be if it’s a narrative of what it took to make your project a reality? How well will your finished digital project convey the scholarship that went into constructing it?

Writing a dissertation about your digital project might seem easier than having a digital project on the side — but in fact, if you’re doing this, then you’re going to need to be highly thoughtful about how the two meld together.  That popular saying “a dissertation is not a book” refers in part to the depth of your research, and how that depth will change between the two — but it also refers to the smoothness and sophistication that one hopes the book will display. Holding your dissertation and your project to a highly sophisticated standard of presentation will add a lot more work to your schedule.

In short, I would say, no, your DH project shouldn’t be the focus of your dissertation. The humanities PhD is already enough work, and accompanied by enough risks, that you don’t need to add more.

Having your DH project on the side provides a number of benefits:

My Ph.D. doesn’t depend on my project being completed, or successful. That means that working on Visible Prices is far less stressful than it would be otherwise. Working on it has given me a sense of agency that I haven’t always felt as I’ve been writing my dissertation.

When people ask me what my agenda is for my first years as a junior academic, I have an interesting answer. That answer includes applying for a number of grants for which I’ll be eligible as new faculty, and ways that my project might contribute to my teaching both traditional literature classes, and classes that incorporate the technical side of the project, and the creation of metadata as a form of writing.

My project is more quickly understandable (and probably more interesting) to non-academics, or to academics who work in different fields. This has at least two benefits: not only does it make it easier to have a conversation with people that doesn’t immediately become abstruse, and include the phrase, “uh, I’ve never read that,” but also I feel at least three times as comfortable when I’m not having to navigate polite professions of interest in texts that most people are unlikely to read.

Although my project isn’t my dissertation, I’ve still successfully received grant and travel funding that supports it. This includes funding from my own university, and from external organizations. However, at this point, little substantial grant funding is available for project development before the Ph.D. degree is complete.

In the end, the question of how you should handle your digital project and your dissertation is contingent on multiple factors — most of all, what you want, and what your committee, department, and university will permit. And things are changing, and may change faster in the next few years. But even with greater infrastructure support for Ph.D. students doing DH projects, I suspect that you’ll be happier if you go in with your eyes open, having thought about these issues in advance.


What Digital Humanists Do

“The digital humanities is what digital humanists do.”  — Rafael Alvarado, Day of DH (reprinted in “Day of DH: Defining the Digital Humanities” in Debates in the Digital Humanities)

Alvarado’s point — that the field of digital humanities is varied and dynamic — is an excellent one. However, I want to supplement it by clarifying what some of those specific things are. The format of this post is shaped by (and owes a lot to) Miriam Posner’s excellent “How did they make that?”, which really helped me when I was wrestling with the organization of what I wanted to say. This post is meant as a precursor/companion to Miriam’s, because after teaching the DMDH workshops last year, I think that an even more basic introduction to some of the major DH activities can be helpful, especially for people who are a step before getting started — and figuring out whether getting started is something that they want to do.

What I’ve done is outline, briefly, what the project does, why you’d want to do it, and what’s involved in making the project a reality. While Miriam’s post will point you towards specific platforms and training resources, I’m describing major steps/milestones, i.e., obtaining a useable text.

My reasons for writing the post this way are twofold:

  • I’ve seen too many people obsessing over learning a complex language without giving much thought to where their material will come from; and
  • Obtaining a useable text (or if necessary, creating one) is work, and deserves to be foregrounded as such.

For each type of project, I’ve provided a few examples, with brief comments on the size of the project in question in order to distinguish the projects that are built and maintained by 1-2 individuals, as opposed to those supported by large teams and multiple organizations. It’s worth noting that these categories aren’t strictly separate — several of the projects listed below fit into more than one of them.

A knowledge site
A digital edition of a text or texts
A database
A semi-linear, customizable narrative that includes text, images, audio, and/or video
A large-scale text analysis or topic modeling project
A geographic mapping site
A digital 3D model
An online event
A crowdsourcing project

A knowledge site

What is it?

A collection of primary (and/or secondary) sources and resources for research and/or teaching.

Why do it?

  • To make a set of works in the public domain more accessible
  • To introduce academic or non-academic audiences to a particular subject or specific angle which they may be unaware of, and promote the subject/angle by making it more accessible for use in research or in the classroom
  • To start building up your reputation as being knowledgeable about a particular type of document
  • You’ve discovered an interesting cache of documents, and want to store them in a way that makes it easy for you to work with them, and/or find collaborators

What’s involved?

  • Finding images or texts for your target topic that are in the public domain and can be displayed; if necessary, contacting owners for permission
  • Cleaning and/or proofreading the texts — especially if any of them were produced using OCR
  • If you’re accepting contributions from other people, determining the standards for inclusion, and the methods for submission, including a standard format for contributions
  • Uploading the texts — whether to a ready-made platform like WordPress, Scalar, or Omeka; or to an HTML/CSS site that you build yourself
  • Writing commentary and instructions that help people use the site effectively
  • Publicizing the site, and responding to comments and/or criticisms
  • Adding or removing objects from the site as necessary

How big is this project?

It varies, but usually a good resource site is an ongoing long-term project, requiring the owners to manage the contents, and make sure that the site still works. Making the History of 1989 has 44 different people contributing to it in some fashion, according to its About page, 18 of them listed under the heading “Project Team,” which suggests that they’re involved in adding content to and maintaining the site. There are also 6 collaborators in addition to George Mason University. In short, this isn’t something that you can whack up onto the web on a Sunday afternoon.


Making the History of 1989 (as noted above, this is a large-scale project with substantial labor and funding support)
The Walt Whitman Archive (supported by 20+ people, and multiple grants)
The Triangle Shirtwaist Factory Fire (developed by 4 people, supported by Cornell University’s Kheel Center)
The Salman Rushdie Archive (an individually-run project by a Ph.D. student at George Washington University)

A digital edition of a text, or texts

What is it?

Perhaps you purchase a dusty old pamphlet in a Parisian flea market, and discover that it is not only relevant to your studies, but fascinating, and moreover, not available on the web in any format. Or, alternately, you find a box of texts in your institutional library archives; and ask and receive permission to put them on the web.

Why do it?

  • You want to make the text(s) widely available in an open and accessible format, and in a way that will put you in contact with users.
  • You have a specific concept for the edition that you think will be particularly illuminating.

What’s involved?

  • Your edition can be very simple (plain text), or much more complicated, allowing users to highlight/extract all the speeches by a single character, all characters of the same gender, and so forth.
  • For a simple, plain-text edition, your work is in prepping the text to make sure that it’s 100% accurate, and that it will display correctly for all users.
  • For a more complex edition involving any of the examples above, you’ll need a method of encoding the text that effectively describes the features that you want to focus on.

How big is this project?

That depends on the size of the text(s), and the complexity of your encoding framework. If you have more than 10-15 pages of text, then success will probably depend on your making a schedule (1 poem a week? 2 pages per week?) and/or finding collaborators to help.


The Blake Archive (20+ staff, and supported by multiple organizations
Darwin Online (3 primary project staff members, with contributions from multiple writers and support from numerous organizations)

A database

What is it?

A collection of data (such as bibliographic information), and a search interface that allows users to develop queries and navigate the information. Databases are distinct from archives in that they don’t necessarily contain texts — they may simply provide bibliographic information.

Why do it?

You work with a particular type of artefact for which few or no specialized catalogs exists.

You want to allow people to navigate artefacts using an unusual attribute that is not normally included or thought of as important.

What’s involved?

  • Obtaining enough information to make the database useful
  • Determining a cataloging structure for your data
  • Building the database (in a program like MySQL)
  • Creating a user interface so that other people can access it (using a scripting language like PHP, and HTML/CSS for styling the site.
  • Determining whether other people will be able to contribute, and if so, standards and formats for contributions
  • Updating, adding, and correcting entries as needed

How big is this project?

The size of this project can vary based on several factors:

  • If you don’t already know MySQL and PHP, then this may be a long-term project, due to the time it will take to learn those platforms
  • If you are building a database that uses standard categories (i.e., author, title, etc.) and highlights a particular genre of text, then this project may be executed more quickly, if you are able to borrow an already developed and open framework for your data.
  • If you are trying to develop a database that allows people to navigate using a new or unusual aspect of the texts or subject matter (see: Visible Prices, which will allow people to navigate using prices mentioned in texts), then this may be a long-term project.


Price One Penny (small scale, started and primarily maintained by one person)
Reading Experience Database (large scale, supported by a technical team, and managing and advisory boards)

A semi-linear, customizable narrative that includes text, images, audio, and/or video.

What is it?

An essay, or narrative, that encourages viewers to read its sections in different order, according to their interest or preference.

Why do it?

  • You work with a topic that intersects with other disciplines. A customizable narrative allows readers to process your information in a way that makes sense to them.
  • You want to write about a subject that has been documented with a variety of artefacts, including photographs and YouTube videos. A traditional journal article isn’t feasible. Alternately, you want to write about a topic that is dynamic and ongoing, and which may change dramatically, necessitating edits to your writing.
  • You don’t find the traditional academic essay format of 5,000-6,000 words, or 10-20 pages in a journal to be terrifically effective. Perhaps you see non-linear narratives, where the reader chooses a direction, as promoting livelier interaction between reader and text.

What’s involved?

  • Learning to use the Scalar platform (and some CSS, for styling)
  • Deciding how you’re going to organize your text(s).
  • Obtaining legal versions of any videos or images you want to use, in good quality format

How big is this project?

Scalar has a definite learning curve, and you can expect to feel awkward as you’re first working with it — but then it gets better, and you will most likely be able to do more, faster. Some people put essays in Scalar; others put dissertations. The main condition of scaling down is that if you have too few pieces, then readers will have little to customize with.


Text, Identity, Subjectivity: an individually-produced Scalar book
Teaching and Learning Multimodal Communications: an anthology originating from the UVic Maker Lab, edited by Jentery Sayers

Large-scale text analysis and topic modelling

What is it?

Topic modelling is a particular type of text analysis — and there are many sorts of text analysis — but both tend to promote what Franco Moretti termed “distant reading,” or, as Matthew L. Jockers describes it, macroanalysis. The idea is that working with a high volume of texts using computing techniques allows you to see patterns that are otherwise undetectable to the human eye.

Why do it?

  • To generate ideas that you might explore (perhaps an unexpected pair of words appear to be linked together when tracked through 1,000+ texts)
  • To get a fresh/different perspective on a text (or texts) that you’ve read so many times they feel stale.
  • To see how an author’s style changes over the course of his/her lifetime; or compare the author with others writing at the same time.
  • To find an alternative perspective on what features characterize a particular genre.

What’s involved?

  • Finding or creating reliable versions of the texts that you want to analyze.
  • Cleaning the texts up further so that whatever processor you choose can read them without errors. This usually involves putting the text in plain, unformatted UTF-8 encoding. Microsoft Word formatting will add strange characters throughout, so you’ll want to work with TextEdit, Notepad++, TextWrangler, Oxygen, or similar. Regular Expressions may help you accomplish what you need more quickly.
  • You’ll also need to learn how to work with whatever analysis tool you’re using. With topic modelling in particular, this may take time — but there’s no shortage of commentary written for humanities scholars.

How big is this project?

The size of this project depends on the preparedness of the body of texts you want to explore, and the complexity of the analysis that you want to complete.

There are trade-offs: creating data that will produce good results instead of garbage can be time-consuming, but worthwhile if you know that you’ll be working with it long term. Creating a set of data can also be a good way of meeting other people who are interested in the same things that you are.

As for the analysis part, using topic modelling and getting accurate results takes time, and lots of calibration and adjustment. Fortunately, if you’re curious, there are tools that you can use to get started with distant reading, and which require less of an investment. You’ll (probably) still need to do some polishing to get your texts ready — but once you’ve got clean texts, you can start playing around and having fun, and writing about what you find.

Try it out:

MONK (Metadata Offer New Analysis)
TAPor (Text Analysis Portal)
Voyant Tools
ManyEyes (must be run with Java-based browser, so won’t work in Chrome)

Quickstart text for experimenting with ManyEyes: Joseph Conrad’s Heart of Darkness (click the Visualize button to get started)

Text analysis is complex enough that I think it’s useful to point to a couple of introductory posts. For a good introduction to text mining, see Ted Underwood’s “Where to start with text mining”; for a good introduction to topic modeling, see Scott Weingart’s “Topic Modeling for Humanists: A Guided Tour.”


Topic Modeling Martha Ballard’s Diary (this is a write-up, rather than an interactive project, but it was created by one person. Comments highly recommended as a way of learning more about what was involved)
Using topic modeling to explore literary history via the Proceedings of the Modern Language Association (PMLA) (a two-person effort)

A geographic mapping site

What is it?

A site presenting maps and geographic data on a particular topic or text.

Why do it?

To present information in an alternative format to a traditional, argument-based essay

To explore a text (or texts) by grounding them in their geographical landscape, and explore the relationship between text and “real” space

What’s involved?

  • Finding sources for the information that you want to include in your map
  • Finding a map that reflects the landscape you are working with (particularly important if you’re working with pre-20th century texts)
  • If you’re working with historical maps, then it will be important to find software that will allow you to work with them: Neatline (free, used with Omeka), and ArcGIS (expensive are two of your best options)
  • Getting your data into the map (either manually, or through a bulk import)

How big is this project?

You can build simple projects using sites like Google Maps, Neatline (with Omeka), or GeoCommons; and if you know a little HTML, you can configure a Simile Timeline site without too much hassle. Any of these options are good as starting options, and each have slightly different capabilities and limitations. Depending on your goals, and whether you want collaborators, or have aspirations of making your project something that many people might use, you may want to invest in more sophisticated software (ArcGIS).


The Negro Travelers’ Green Book (supported by multiple departments, but primarily created by two people (Negro Traveler’s Green Book: About the Map))
PoetryBox (nonacademic site, built by two people)
Mapping the Lakes (funded by the British Academy, four collaborators listed)
Map of Early Modern London @ UVic (large scale project supported by 10+ people)

A digital 3D model

What is it?

A detailed model of a particular space or area based on archaeological/historical data, which allows users to explore and/or “walk” into the space or area.

Why do it?

  • To recreate the experience of being in a space that is no longer possible (or is of limited accessibility) due to modern development and/or fragility/decay.
  • As a component of a particular larger cultural or educational project.
  • To document a space or object for restoration and/or recreation.

What’s involved?

  • Obtaining detailed data about the space you want to recreate.
  • Understanding the data well enough to generate procedural rules, based on attributes of the space. These might include information on how windows are configured, what materials are used, etc.
  • Entering the procedural rules into the program of your choice, configuring it and adjusting/debugging as necessary.

How big is this project?

There are small projects, like Marie Saldana’s Digital Magnesia; and much bigger ones, like HyperCities and Digital Karnak, which are supported and funded by large teams and often, multiple organizations.

Some of the software that you may be working with (ArcGIS, 3D-modeling programs) may be prohibitively expensive, making this a “big” project just in terms of investment — but may be affordable through educational licenses.

How big this project is will depend in part on your specific intent, and what you expect to do with the model once you build it. Producing models with high accuracy will almost certainly take substantial time and labor, and you’ll want to think about why you’re making that investment.


Digital Magnesia (1-person project)
HyperCities (large-scale project supported by 10+ people and grants from multiple organizations)
Digital Karnak (large-scale project supported by 10+ people and multiple organizations)

An online event

What is it?

A social gathering that takes place online, and invites people to interact and/or collaborate on a particular project or goal.

Why do it?

  • You see a lot of energy and curiosity around a particular topic, and have an idea that will center that energy in one place.
  • Alternatively, you have a topic that people may be less aware of, but there’s an activity that people can participate in that will make them more aware.
  • Or: you have an idea for an activity that you think will lead to interesting results; or a project that can be accomplished if x number of people get involved.
  • Bottom line: you want people to be social, interacting with you, and with each other. You have a clear view of an outcome that will benefit them (either a learning or networking goal, or a sense of having participated in a useful and/or fun activity).

What’s involved?

On the surface of it, this might look like one of the simplest projects possible, in terms of the technical skills needed, but don’t be fooled — running an online course, even an informal one, successfully and professionally is probably the most intense labor commitment of all the projects listed here.

  • Create a schedule for the event (or for a class, a syllabus, reading schedule, and assignments/discussion prompts)
  • Choose discussion software (VanillaForums, ProBoards), and configure it
  • Figure out how you’ll measure participation levels (so that you can report accurately on what took place. For a class, set up a registration process and monitor your enrollment.
  • Publicize your course or event through social media and any relevant forums, making it clear what your goals are, and what the outcomes will be for the participants.
  • Execute your event, starting discussions and refocusing when necessary, moderating forums to make sure that all participants feel respected and safe, and making yourself widely available to answer questions and teach.

How big is this project?

It depends. You could scale it down considerably, organizing a weekend seminar in which you scheduled specific discussions at particular times. Doing so would allow you to do test-runs for a larger/longer event. And as with events that take place in real-time and real-space, having at least one collaborator will make a huge difference in your ability to maintain energy. Even so, this is a highly social project, and you should probably expect it to be accompanied by at least a few misunderstandings, and possibly trolls.


Global Women Wikipedia Write-In (GWWWI)
DHPoco Summer School (both the GWWWI and Summer School are 2-person projects, run by Adeline Koh and Roopika Risam)
Twitter vs. Zombiesadditional commentary by the organizers, Pete Rorabaugh and Jesse Stommel

A crowdsourcing project

What is it?

A project where the material obtained and/or processed or curated by a large number of people (i.e., the crowd). While some projects simply have large staffs, crowdsourcing implies that the majority of the people involved are not staff members — they’re just people who have something to contribute. Arguably, the best-known crowdsourced project is Wikipedia.

Why do it?

  • Your subject matter is made up of uncollected narratives, artefacts and memorabilia held by members of the public,  and you want to provide a space where these can be displayed together in a digital format.
  • Your subject matter consists of oral histories, and you want to collect them in one space, and make it easy for people to record their experiences.
  • You have an immense amount of material that needs some form of simple processing that does not require scholarly expertise (i.e., proofreading or transcribing).

What’s involved?

For a crowdsourced primary-source archive:

  • Finding a platform that will hold the types of artefacts you expect people to contribute. ( and are popular)
  • If you plan to allow individuals to contribute stories or images, determine the terms and conditions of the ownership and use of their materials.
  • Decide standards for including objects in the archive: what’s the minimum amount of information that is needed? What formats will the project accept?

For a crowdsourced labor project:

  • You need a platform which will allow users to contribute, and feel like their contribution is useful; and which can handle high traffic, if necessary.
  • You need a system of monitoring contributions for accuracy.
  • You need a clear statement on the rights of the contributors, and of the project managers.
  • You need a steady stream of communication letting people know what has been accomplished, and what needs to be done.

How big is this project?

By definition, true crowdsourced projects are big: they tend to require major infrastructure, and a strong commitment to managing contributors and meeting their needs.

For a recent write-up on designing DH crowdsourcing projects, see Mia Ridge’s notes and slides from the workshop she ran at the DH2013 conference.


Transcribe Bentham (large-scale project, with support from multiple departments at the University of London and grant funding)
University of Minnesota Memorial Stadium (supported by 10+ people, and multiple university offices)
Zooniverse (specific team size not listed on site)
Sindhi Voices Project (a smaller project, created and managed primarily by two people)