Yesterday I had the chance to speak on a panel about “The MLA and Its Data: Remix, Reuse, and Research,” which I organized on behalf of the MLA’s Committee on Information Technology. The panel was very successful, due largely to fabulous co-panelists: David Laurence, Ernesto Priego, Chris Zarate, and Lisa Rhody. Ernesto has shared his slides for his presentation on his and Chris’s analysis of tweets from last year’s convention. Unfortunately we missed Jonathan Goodwin, who became ill. Lucky for us, he shared his talk as well.
What follows is the text of my talk, “Constellations at the Convention.” The metaphor of the title suggested itself immediately as I began looking at the network within Gephi, but I couldn’t help but think of Matt Kirschenbaum’s post following the 2011 MLA Convention, “The (DH) Stars Come Out in LA.” I think that the methods I’ve been able to begin deploying here might help us track the star system—if not within the profession, but within the convention.
Even though I say it within the talk, it’s critical that I acknowledge up front the assistance of two people. First, Chris Zarate kindly provided the data from the MLA that I asked for. (The MLA itself needs to be thanked for being willing to support this scholarship.) He made suggestions about the sorts of information he could provide me and gave me exactly what I asked for. Unfortunately, since I had never done something like this before, I didn’t quite know what to ask for. So when I discovered the data weren’t quite as I needed them, my colleague Sara Palmer who took the raw XML and transformed it with XSLT and Python into a format that I could use. Sara and I then spent several hours playing with the data and then talking about the different things that we were seeing. She identified the Midwestern Mafia as a question worth pursuing. Finally, Rebecca Sutton Koeser pointed out the Javascript exporter plugin for Gephi, which is why you can now play with the data easily.
I appreciated the interest from the crowd and the thoughtful questions about “algorithmic cruelty” and where such work might lead in the future. If you want to play with the data yourself, you can download the Gephi file of the 2014 and 2015 Mark Sample data. I will see what I can do about sharing the MLA data set. But for the moment, you can explore the four different networks that I showed.
As always, my work is Creative Commons-licensed. Let me know what you think!
Constellations at the Convention
First, let me start with an epigraph:
“It never happens that a new constellation suddenly rises out of the east. There is an order, a predictability, a permanence about the stars. In a way, they are almost comforting.” Carl Sagan, Cosmos
Second, let me thank my colleague Sara Palmer at the Emory Center for Digital Scholarship as she was instrumental in preparing the data for this talk.
Third, let me begin. My presentation today started with a simple question:
What could we learn about the field of language and literary studies if we considered the structure and networks formed by the MLA Convention itself?
A simple question, it turns out, doesn’t really have a simple answer.
But it does lead to some pretty pictures.
Let’s take, for example, the digital humanities panels at this year’s convention.
Mark Sample regularly publishes a list of the “more or less digitally-oriented sessions at the upcoming Modern Language Association convention.” Who are the people on these panels and how do they relate to each other? Using Mark’s list, I compiled a 564-line spreadsheet of the panels happening this year and the people speaking on it. I then pulled those data into Gephi, a tool for network visualizations. And this is what I got:
On this visualization and the others that I’ll include here, the individual nodes or points represent people who speak at the MLA. The edges or lines between them mean that they have spoken on a panel together. The thicker the edge, the more those two people have spoken together. The larger the node, the more people they have spoken with, and the size of the person’s name also corresponds to the number of people they have spoken with. In this particular graph, the nodes are colored by “community detection.” In other words, an algorithm looks to find people who are connected in “more densely connected internally than with the rest of the network” (Wikipedia definition of “community structure”).
•What we get in this graph is a number of different communities that represent, in some cases, individual panels. Other communities have different panels within them but individuals that pull the community together. What we see here is that while there are a number of digitally focused panels at the Convention this year (7% by Mark’s count), within those panels there are a wide variety of people who are not speaking with one another—literally and potentially figuratively.
But of course, Mark has been compiling his list for more than just this year.
What do we learn if we add in the data from the 78 panels in 2014?
This is starting to get a lot more interesting! It’s a much more dense network and communities are no longer based around people who speak together at this one convention. People can be connected to each other and be in different communities. Yet for all the increased connectivity in the network, there are still a number of groups who aren’t connected in any way to the larger conversation about the digital humanities, or at least the conversation insofar as Mark has sampled it.
•What would it take to get the periphery more connected to the center? Or, perhaps a better question, to get the center to look outside?
But perhaps this has been a bit too much navel-gazing, looking at the digital humanists. You’ll notice, at least, that I spared you from a definition of what DH is.
So let’s move our lens a bit broader. Let’s look at not just one subfield, but the whole of the convention. And since we’ve already learned that things are more interesting when you add more data, let’s go ahead and look at the entire convention across 10 years. Here I have to think Chris Zarate, my co-panelist and MLA staff member, as well as the MLA in general for being willing to make these data available for me to use.
This represents 16,039 speakers on almost 9,000 panels. There are 58,002 edges among the nodes. This time the nodes / people are colored not by community but by the number of connections she or he has.
If we dive down toward the center we can start to see the individual stars that this galaxy is made out of, as well as start to think about the constellations we see.
There are of course communities here as well: 181 of them, as far as the algorithm is concerned. And when we change the colors of the nodes to reflect their community you start to get a sense of how the Convention fits together.
In fact, with community detection, we can identify a group from the whole 10 years who are probably are digital humanists. While they all started as part of a single community identified by the algorithm on the first graph, I’ve pulled them out and run community detection a second time on the subset.
Now that I’ve shown you the galaxy of MLA data, it’s perhaps worth returning to my epigraph.
As I think is clear, “constellations” is a good metaphor for these data. Sagan is of course correct when he says there’s an order and a predictability to the stars, and that’s true here as well. But he’s dead wrong when it comes to constellations. New constellations do suddenly appear—because they depend on human pattern recognition and interpretation. The shapes that we see in the stars tell us far more about ourselves than anything else. Such is the case with these data, I think.
I’m short on time, but let me draw your attention to two places where Sara and I have found some interesting results. The first is what I’m going to call the Federal Peripheral.
Way out on the edge of the graph is a small community of people who don’t connect to anyone else. In a way, this isn’t totally unusual as we see it happening throughout this graph, despite the generally massively connected nature of things. But every single person in this cluster is, as far as I can tell, an employee of the US Federal Government. How have these individuals not ended up speaking with other members of the organization is a question that I’d like to know more about. But for the moment, I want to point to something strange:
The fact that there are two Patrice Shacklefords and three Richard Donovans.
This isn’t some sort of government conspiracy…at least I think it’s not. Instead, this points to some weirdness in the MLA’s data. We isolated individuals in the data by using their MLA member number. But the MLA also grants membership waivers to people who are speaking at the Convention but come from outside the fields of languages and literature. We are getting duplicates of these people because the MLA isn’t always using unique numbers for those who are granted waivers.
The problem of membership numbers for those outside the discipline could perhaps be fixed by the use of something like ORCID, but that’s a whole other ball of wax.
The second anomaly I want to talk about is the Midwestern Mafia.
Overwhelmingly within the 10 years of data, we see people only speaking with one another once. But there is a small group where the strongest connections are.
No one in 10 years is as connected as are David D Anderson and Marilyn Judith Atlas. They have spoken together 10 times across the program. The second strongest connection is between Atlas and Melody M. Zajdel, who have spoken together 7 times.
It’s not surprising that they are part of the same algorithmic community given their connection to one another. But it turns out that they are part of another community: the Society for the Study of Midwestern Literature. These three scholars as well as the others you see in red here are all part of the same society. It makes sense that they have a strong connection and work well with one another.
But what does it say when a Society sends the same people over and over again to the Convention? Without knowing the details of SSML, my hunch is that it might be seeing the number of members decline—or those with the financial ability to attend the convention. oThe result is that there’s a strong likelihood that the strength of connections among these scholars point paradoxically to a corresponding weakness in their particular field.
While the formation of a field depends on people getting to know one another (as in my graphs of DH panels), the long-term health of a field—or of a smaller society—will depend on bringing in new participants to the discussion and in finding ways that individuals can speak to different parts of the Convention than just their own niche.
With more time, we could track the rise and fall of different subfields—at least in connection to their presence at the convention—in this manner.
I’ve run out of time, I’m afraid. But there’s much, much more that can be done with these data. It’s important that the MLA work to make as much of its data as possible still more robust and all the more available for researchers so we can hunt for more shapes. Because it is the finding of these constellations that really marks us as human.
Thanks.