September 2, 2013

What Digital Humanists Do

“The digital humanities is what digital humanists do.” — Rafael Alvarado, Day of DH (reprinted in “Day of DH: Defining the Digital Humanities” in Debates in the Digital Humanities)

Alvarado's point — that the field of digital humanities is varied and dynamic — is an excellent one. However, I want to supplement it by clarifying what some of those specific things are. The format of this post is shaped by (and owes a lot to) Miriam Posner's excellent “How did they make that?”, which really helped me when I was wrestling with the organization of what I wanted to say. This post is meant as a precursor/companion to Miriam's, because after teaching the DMDH workshops last year, I think that an even more basic introduction to some of the major DH activities can be helpful, especially for people who are a step before getting started — and figuring out whether getting started is something that they want to do.

What I've done is outline, briefly, what the project does, why you'd want to do it, and what's involved in making the project a reality. While Miriam's post will point you towards specific platforms and training resources, I'm describing major steps/milestones, i.e., obtaining a useable text.

My reasons for writing the post this way are twofold:

I've seen too many people obsessing over learning a complex language without giving much thought to where their material will come from; and
Obtaining a useable text (or if necessary, creating one) is work, and deserves to be foregrounded as such.

For each type of project, I've provided a few examples, with brief comments on the size of the project in question in order to distinguish the projects that are built and maintained by 1-2 individuals, as opposed to those supported by large teams and multiple organizations. It's worth noting that these categories aren't strictly separate — several of the projects listed below fit into more than one of them.

A knowledge site
A digital edition of a text or texts
A database
A semi-linear, customizable narrative that includes text, images, audio, and/or video
A large-scale text analysis or topic modeling project
A geographic mapping site
A digital 3D model
An online event
A crowdsourcing project

A knowledge site

What is it?

A collection of primary (and/or secondary) sources and resources for research and/or teaching.

Why do it?

To make a set of works in the public domain more accessible
To introduce academic or non-academic audiences to a particular subject or specific angle which they may be unaware of, and promote the subject/angle by making it more accessible for use in research or in the classroom
To start building up your reputation as being knowledgeable about a particular type of document
You've discovered an interesting cache of documents, and want to store them in a way that makes it easy for you to work with them, and/or find collaborators

What's involved?

Finding images or texts for your target topic that are in the public domain and can be displayed; if necessary, contacting owners for permission
Cleaning and/or proofreading the texts — especially if any of them were produced using OCR
If you're accepting contributions from other people, determining the standards for inclusion, and the methods for submission, including a standard format for contributions
Uploading the texts — whether to a ready-made platform like WordPress, Scalar, or Omeka; or to an HTML/CSS site that you build yourself
Writing commentary and instructions that help people use the site effectively
Publicizing the site, and responding to comments and/or criticisms
Adding or removing objects from the site as necessary

How big is this project?

It varies, but usually a good resource site is an ongoing long-term project, requiring the owners to manage the contents, and make sure that the site still works. Making the History of 1989 has 44 different people contributing to it in some fashion, according to its About page, 18 of them listed under the heading “Project Team,” which suggests that they're involved in adding content to and maintaining the site. There are also 6 collaborators in addition to George Mason University. In short, this isn't something that you can whack up onto the web on a Sunday afternoon.

Examples:

Making the History of 1989 (as noted above, this is a large-scale project with substantial labor and funding support)
The Walt Whitman Archive (supported by 20+ people, and multiple grants)
The Triangle Shirtwaist Factory Fire (developed by 4 people, supported by Cornell University's Kheel Center)
The Salman Rushdie Archive (an individually-run project by a Ph.D. student at George Washington University)

A digital edition of a text, or texts

What is it?

Perhaps you purchase a dusty old pamphlet in a Parisian flea market, and discover that it is not only relevant to your studies, but fascinating, and moreover, not available on the web in any format. Or, alternately, you find a box of texts in your institutional library archives; and ask and receive permission to put them on the web.

Why do it?

You want to make the text(s) widely available in an open and accessible format, and in a way that will put you in contact with users.
You have a specific concept for the edition that you think will be particularly illuminating.

What's involved?

Your edition can be very simple (plain text), or much more complicated, allowing users to highlight/extract all the speeches by a single character, all characters of the same gender, and so forth.
For a simple, plain-text edition, your work is in prepping the text to make sure that it's 100% accurate, and that it will display correctly for all users.
For a more complex edition involving any of the examples above, you'll need a method of encoding the text that effectively describes the features that you want to focus on.

How big is this project?

That depends on the size of the text(s), and the complexity of your encoding framework. If you have more than 10-15 pages of text, then success will probably depend on your making a schedule (1 poem a week? 2 pages per week?) and/or finding collaborators to help.

Examples:

The Blake Archive (20+ staff, and supported by multiple organizations
Darwin Online (3 primary project staff members, with contributions from multiple writers and support from numerous organizations)

A database

What is it?

A collection of data (such as bibliographic information), and a search interface that allows users to develop queries and navigate the information. Databases are distinct from archives in that they don't necessarily contain texts — they may simply provide bibliographic information.

Why do it?

You work with a particular type of artefact for which few or no specialized catalogs exists.

You want to allow people to navigate artefacts using an unusual attribute that is not normally included or thought of as important.

What's involved?

Obtaining enough information to make the database useful
Determining a cataloging structure for your data
Building the database (in a program like MySQL)
Creating a user interface so that other people can access it (using a scripting language like PHP, and HTML/CSS for styling the site.
Determining whether other people will be able to contribute, and if so, standards and formats for contributions
Updating, adding, and correcting entries as needed

How big is this project?

The size of this project can vary based on several factors:

If you don't already know MySQL and PHP, then this may be a long-term project, due to the time it will take to learn those platforms
If you are building a database that uses standard categories (i.e., author, title, etc.) and highlights a particular genre of text, then this project may be executed more quickly, if you are able to borrow an already developed and open framework for your data.
If you are trying to develop a database that allows people to navigate using a new or unusual aspect of the texts or subject matter (see: Visible Prices, which will allow people to navigate using prices mentioned in texts), then this may be a long-term project.

Examples:

Price One Penny (small scale, started and primarily maintained by one person)
Reading Experience Database (large scale, supported by a technical team, and managing and advisory boards)

A semi-linear, customizable narrative that includes text, images, audio, and/or video.

What is it?

An essay, or narrative, that encourages viewers to read its sections in different order, according to their interest or preference.

Why do it?

You work with a topic that intersects with other disciplines. A customizable narrative allows readers to process your information in a way that makes sense to them.
You want to write about a subject that has been documented with a variety of artefacts, including photographs and YouTube videos. A traditional journal article isn't feasible. Alternately, you want to write about a topic that is dynamic and ongoing, and which may change dramatically, necessitating edits to your writing.
You don't find the traditional academic essay format of 5,000-6,000 words, or 10-20 pages in a journal to be terrifically effective. Perhaps you see non-linear narratives, where the reader chooses a direction, as promoting livelier interaction between reader and text.

What's involved?

Learning to use the Scalar platform (and some CSS, for styling)
Deciding how you're going to organize your text(s).
Obtaining legal versions of any videos or images you want to use, in good quality format

How big is this project?

Scalar has a definite learning curve, and you can expect to feel awkward as you're first working with it — but then it gets better, and you will most likely be able to do more, faster. Some people put essays in Scalar; others put dissertations. The main condition of scaling down is that if you have too few pieces, then readers will have little to customize with.

Examples:

Text, Identity, Subjectivity: an individually-produced Scalar book
Teaching and Learning Multimodal Communications: an anthology originating from the UVic Maker Lab, edited by Jentery Sayers

Large-scale text analysis and topic modelling

What is it?

Topic modelling is a particular type of text analysis — and there are many sorts of text analysis — but both tend to promote what Franco Moretti termed “distant reading,” or, as Matthew L. Jockers describes it, macroanalysis. The idea is that working with a high volume of texts using computing techniques allows you to see patterns that are otherwise undetectable to the human eye.

Why do it?

To generate ideas that you might explore (perhaps an unexpected pair of words appear to be linked together when tracked through 1,000+ texts)
To get a fresh/different perspective on a text (or texts) that you've read so many times they feel stale.
To see how an author's style changes over the course of his/her lifetime; or compare the author with others writing at the same time.
To find an alternative perspective on what features characterize a particular genre.

What's involved?

Finding or creating reliable versions of the texts that you want to analyze.
Cleaning the texts up further so that whatever processor you choose can read them without errors. This usually involves putting the text in plain, unformatted UTF-8 encoding. Microsoft Word formatting will add strange characters throughout, so you'll want to work with TextEdit, Notepad++, TextWrangler, Oxygen, or similar. Regular Expressions may help you accomplish what you need more quickly.
You'll also need to learn how to work with whatever analysis tool you're using. With topic modelling in particular, this may take time — but there's no shortage of commentary written for humanities scholars.

How big is this project?

The size of this project depends on the preparedness of the body of texts you want to explore, and the complexity of the analysis that you want to complete.

There are trade-offs: creating data that will produce good results instead of garbage can be time-consuming, but worthwhile if you know that you'll be working with it long term. Creating a set of data can also be a good way of meeting other people who are interested in the same things that you are.

As for the analysis part, using topic modelling and getting accurate results takes time, and lots of calibration and adjustment. Fortunately, if you're curious, there are tools that you can use to get started with distant reading, and which require less of an investment. You'll (probably) still need to do some polishing to get your texts ready — but once you've got clean texts, you can start playing around and having fun, and writing about what you find.

Try it out:

MONK (Metadata Offer New Analysis)
TAPor (Text Analysis Portal)
Voyant Tools
ManyEyes (must be run with Java-based browser, so won't work in Chrome)

Quickstart text for experimenting with ManyEyes: Joseph Conrad’s Heart of Darkness (click the Visualize button to get started)

Text analysis is complex enough that I think it's useful to point to a couple of introductory posts. For a good introduction to text mining, see Ted Underwood's “Where to start with text mining”; for a good introduction to topic modeling, see Scott Weingart's “Topic Modeling for Humanists: A Guided Tour.”

Examples:

Topic Modeling Martha Ballard’s Diary (this is a write-up, rather than an interactive project, but it was created by one person. Comments highly recommended as a way of learning more about what was involved)
Using topic modeling to explore literary history via the Proceedings of the Modern Language Association (PMLA) (a two-person effort)

A geographic mapping site

What is it?

A site presenting maps and geographic data on a particular topic or text.

Why do it?

To present information in an alternative format to a traditional, argument-based essay

To explore a text (or texts) by grounding them in their geographical landscape, and explore the relationship between text and “real” space

What's involved?

Finding sources for the information that you want to include in your map
Finding a map that reflects the landscape you are working with (particularly important if you're working with pre-20th century texts)
If you're working with historical maps, then it will be important to find software that will allow you to work with them: Neatline (free, used with Omeka), and ArcGIS (expensive are two of your best options)
Getting your data into the map (either manually, or through a bulk import)

How big is this project?

You can build simple projects using sites like Google Maps, Neatline (with Omeka), or GeoCommons; and if you know a little HTML, you can configure a Simile Timeline site without too much hassle. Any of these options are good as starting options, and each have slightly different capabilities and limitations. Depending on your goals, and whether you want collaborators, or have aspirations of making your project something that many people might use, you may want to invest in more sophisticated software (ArcGIS).

Examples:

The Negro Travelers’ Green Book (supported by multiple departments, but primarily created by two people (Negro Traveler’s Green Book: About the Map))
PoetryBox (nonacademic site, built by two people)
Mapping the Lakes (funded by the British Academy, four collaborators listed)
Map of Early Modern London @ UVic (large scale project supported by 10+ people)

A digital 3D model

What is it?

A detailed model of a particular space or area based on archaeological/historical data, which allows users to explore and/or “walk” into the space or area.

Why do it?

To recreate the experience of being in a space that is no longer possible (or is of limited accessibility) due to modern development and/or fragility/decay.
As a component of a particular larger cultural or educational project.
To document a space or object for restoration and/or recreation.

What's involved?

Obtaining detailed data about the space you want to recreate.
Understanding the data well enough to generate procedural rules, based on attributes of the space. These might include information on how windows are configured, what materials are used, etc.
Entering the procedural rules into the program of your choice, configuring it and adjusting/debugging as necessary.

How big is this project?

There are small projects, like Marie Saldana’s Digital Magnesia; and much bigger ones, like HyperCities and Digital Karnak, which are supported and funded by large teams and often, multiple organizations.

Some of the software that you may be working with (ArcGIS, 3D-modeling programs) may be prohibitively expensive, making this a “big” project just in terms of investment — but may be affordable through educational licenses.

How big this project is will depend in part on your specific intent, and what you expect to do with the model once you build it. Producing models with high accuracy will almost certainly take substantial time and labor, and you'll want to think about why you're making that investment.

Examples:

Digital Magnesia (1-person project)
HyperCities (large-scale project supported by 10+ people and grants from multiple organizations)
Digital Karnak (large-scale project supported by 10+ people and multiple organizations)

An online event

What is it?

A social gathering that takes place online, and invites people to interact and/or collaborate on a particular project or goal.

Why do it?

You see a lot of energy and curiosity around a particular topic, and have an idea that will center that energy in one place.
Alternatively, you have a topic that people may be less aware of, but there's an activity that people can participate in that will make them more aware.
Or: you have an idea for an activity that you think will lead to interesting results; or a project that can be accomplished if x number of people get involved.
Bottom line: you want people to be social, interacting with you, and with each other. You have a clear view of an outcome that will benefit them (either a learning or networking goal, or a sense of having participated in a useful and/or fun activity).

What's involved?

On the surface of it, this might look like one of the simplest projects possible, in terms of the technical skills needed, but don't be fooled — running an online course, even an informal one, successfully and professionally is probably the most intense labor commitment of all the projects listed here.

Create a schedule for the event (or for a class, a syllabus, reading schedule, and assignments/discussion prompts)
Choose discussion software (VanillaForums, ProBoards), and configure it
Figure out how you'll measure participation levels (so that you can report accurately on what took place. For a class, set up a registration process and monitor your enrollment.
Publicize your course or event through social media and any relevant forums, making it clear what your goals are, and what the outcomes will be for the participants.
Execute your event, starting discussions and refocusing when necessary, moderating forums to make sure that all participants feel respected and safe, and making yourself widely available to answer questions and teach.

How big is this project?

It depends. You could scale it down considerably, organizing a weekend seminar in which you scheduled specific discussions at particular times. Doing so would allow you to do test-runs for a larger/longer event. And as with events that take place in real-time and real-space, having at least one collaborator will make a huge difference in your ability to maintain energy. Even so, this is a highly social project, and you should probably expect it to be accompanied by at least a few misunderstandings, and possibly trolls.

Examples:

Global Women Wikipedia Write-In (GWWWI)
DHPoco Summer School (both the GWWWI and Summer School are 2-person projects, run by Adeline Koh and Roopika Risam)
Twitter vs. Zombies, additional commentary by the organizers, Pete Rorabaugh and Jesse Stommel

A crowdsourcing project

What is it?

A project where the material obtained and/or processed or curated by a large number of people (i.e., the crowd). While some projects simply have large staffs, crowdsourcing implies that the majority of the people involved are not staff members — they're just people who have something to contribute. Arguably, the best-known crowdsourced project is Wikipedia.

Why do it?

Your subject matter is made up of uncollected narratives, artefacts and memorabilia held by members of the public, and you want to provide a space where these can be displayed together in a digital format.
Your subject matter consists of oral histories, and you want to collect them in one space, and make it easy for people to record their experiences.
You have an immense amount of material that needs some form of simple processing that does not require scholarly expertise (i.e., proofreading or transcribing).

What's involved?

For a crowdsourced primary-source archive:

Finding a platform that will hold the types of artefacts you expect people to contribute. (Omeka.org and Omeka.net are popular)
If you plan to allow individuals to contribute stories or images, determine the terms and conditions of the ownership and use of their materials.
Decide standards for including objects in the archive: what's the minimum amount of information that is needed? What formats will the project accept?

For a crowdsourced labor project:

You need a platform which will allow users to contribute, and feel like their contribution is useful; and which can handle high traffic, if necessary.
You need a system of monitoring contributions for accuracy.
You need a clear statement on the rights of the contributors, and of the project managers.
You need a steady stream of communication letting people know what has been accomplished, and what needs to be done.

How big is this project?

By definition, true crowdsourced projects are big: they tend to require major infrastructure, and a strong commitment to managing contributors and meeting their needs.

For a recent write-up on designing DH crowdsourcing projects, see Mia Ridge’s notes and slides from the workshop she ran at the DH2013 conference.

Examples:

Transcribe Bentham (large-scale project, with support from multiple departments at the University of London and grant funding)
University of Minnesota Memorial Stadium (supported by 10+ people, and multiple university offices)
Zooniverse (specific team size not listed on site)
Sindhi Voices Project (a smaller project, created and managed primarily by two people)

2 comments

September 19, 2013 - 8:39 am Pingback: Survey: Is Creating an Online Journal a Digital Humanities Project? | Adeline Koh
September 27, 2013 - 8:28 pm Marshall Abrams

There are a few people who are using computer simulations in new ways within the humanities. This has occurred in philosophy, literary studies, and history, for example. These simulations are not intended primarily as teaching tools or for interaction by the public, although they might get used in this way as well. Such simulations are intended as research tools, first.

Log in to Reply

A knowledge site

A digital edition of a text, or texts

A database

A semi-linear, customizable narrative that includes text, images, audio, and/or video.

Large-scale text analysis and topic modelling

A geographic mapping site

A digital 3D model

An online event

A crowdsourcing project

2 comments

Post a comment Cancel reply