As part of the Digital Humanities Summer Institute training network, The University of Guelph hosted the first series of workshops on topics related to digital humanities research and teaching in Summer 2015. Two out of three initially advertised workshops ran for four days (May 19-22) alongside additional events that took place after class. Highlights of these additional talks and panels included “Emergent Modes of Digital Scholarship”, a talk presented by Susan Brown, Professor of English at the University of Guelph, as well as a keynote address by Jennifer Roberts-Smith, Associate Professor of Drama and Speech Communication at the University of Waterloo, with the attention-grabbing title “Your Mother is Not a Computer: Phenomenologies of the Human for Digital Humanities.” Running simultaneously, “Developing a Digital Exhibit in Omeka” and “Topic Modeling for Humanities Research” were workshops that had attracted a great deal of attention. The enrolment numbers would not be surprising to anyone who is aware of the buzz that the DH has caused in recent years. As was expected, some of the participants had attended workshops or talks about digital humanities before, bringing working knowledge and/or skills to the table. Others, including myself, would get their feet wet for the first time. Both workshops promised hands-on experience with the field.
The first day the large crowd of participants gathered in McLaughlin library and was composed largely of scholars coming from outside as well as inside Canada. After Susan Brown’s welcoming remarks in which she drew our attention to the coincidence that May 19th was also the Day of DH (https://dh.fbk.eu/events/day-dh-2015), we moved to the classrooms where the workshop sessions were held. During the introductory session, Susan Brown also created a hashtag (#DHatGuelph) that the participants with a Twitter account could use on their feed.
I had signed up for the workshop on topic modeling which was led by Adam Hammond, Michael Ridley Postdoctoral Fellow in Digital Humanities at the University of Guelph, and Julian Brooke, Postdoctoral Fellow in Computational Linguistics in the Computer Science Department at the University of Toronto. During the first day of the workshop, Adam introduced us to MALLET, an open source data/text mining toolkit which we downloaded online. Later on he walked us through this popular topic-modeling package by demonstrating how to create commands for building topic models. As a total stranger to coding or any sort of computational technique, my first encounter with the mathematics of topic modeling was messy and frustrating. Luckily, Adam’s clear and well-paced instructions helped me keep up with the process that basically consisted of a chain of well-curated commands. Our mission became clear by the end of the first day: we shall put together a corpus large enough to benefit from topic modeling. From the start, Adam encouraged us to think about a scholarly project that would harness topic modeling. To explore the toolkit and see how it works, I concentrated on Mathew Arnold’s body of works on literary criticism, digital copies of which were readily available. The second day we continued building our corpuses with a focus on individual topic assignments. Behind topic modeling, Adam told us, lies a crooked assumption that every word token has exactly one topic associated with it. In layman’s words, it is the assumption that when a writer sits down to write, every word she uses, she uses for a topic. According to this logic, we could expect data mining packages like Mallet to help us discover hidden thematic patterns in large collections of text by locating words that tend to co-exist in multiple contexts. The third day Julian and Adam introduced us to RStudio, a programming language that entails more advanced topic modeling procedures. That day we fiddled with using RStudio and plowed through the commands that demanded our meticulous attention. As promising as it is, topic modeling is also a process fraught with complications that are not always easy to foresee, thus requiring a high degree of patience and concentration.
Adam used Moby Dick as a sample corpus during the workshop. Seeing a literary text on the screen as a variable and working on it through commands felt alienating at first, but as we moved along, it became clear to me that computational techniques such as topic modeling to study “big data” could facilitate humanities research. Mallet, RStudio and others are new ways of pursuing research in the field of literary studies. They are promising resources that can be tapped into with the right dose of curiosity, patience, and attention.
On the last day of the workshops the participants from both workshops gathered in one of the classrooms for a show&tell event. Some of the participants shared their word clusters and thematic discoveries that they had obtained at the end of the topic modeling process. Interesting questions and ideas that are worth pursuing were raised during this event. Applying topic modeling technique to journals decade by decade in order to see the prominent topics picked up by each decade is one of these ideas that grabbed my attention, as someone whose research extends into archives, albeit moderately.
While I was at the University of Guelph, I also attended a tour of the library archives, which includes a large range of collections from a Canadian Cookbook Collection to Landscape Architecture. Historical Collections at the University of Guelph hold at least seven core collections. One of these collections, the Scottish Studies Collection, is the largest one outside the United Kingdom. In the archives one can also find Lucy Maud Montgomery’s (1874-1942) personal library and belongings, diaries, book manuscripts, and needle work. Montgomery was one of Canada’s most prolific writers, and she is famous for her Anne of Green Gables series. (Google recently commemorated Anne of Green Gables with a doodle on November 30th, which was Lucy Maud Montgomery’s birthday: https://g.co/doodle/qdp2dr)
With no previous background in digital humanities, I had concerns when I signed up for Digital Humanities@Guelph Summer Workshops. Although some of my concerns such as my lack of experience working from the command line proved to be valid, my overall experience with the topic modeling process was thought-provoking. When I examined the word clusters that I had garnered through topic modeling Mathew Arnold’s works of literary criticism, I found the frequency and proximity of certain words in clusters quite telling. The word rhythm appearing in the same cluster with the word ear, and sharing the same boldness and size, or the word class appearing with aristocratic and middle, might not seem all that unexpected. Nevertheless, when we consider these occurrences alongside more bewildering word pairs such as modern and interesting, these words could prove to be revealing in regard to the discussions surrounding the use and value of poetry not only in Arnold’s views but also in English literary circles of the late 19th century. There is surely a lot of fodder for a comparatist in this scheme.
The workshop was designed to tackle the practical task of getting good results from topic modeling technique and lived up to the expectations on that front. Indeed, the Digital part of Digital Humanities was successfully covered by the instructors. Yet, the Humanities part was missing in the sense that the critical implications of computing in humanities wasn’t integrated into the overall discussions. I believe addressing the impact of computation as a humanities question would put the Digital into a better perspective. Apart from that, I highly recommend DH@Guelph Summer Workshops to humanists who would like to venture into digital humanities work or improve their knowledge in the field.