Constellations at the Convention: 10 Years of MLA Data

Title slide with a tree against star trails with caption 'Constellations at the Convention: 10 Years of MLA Daya'

Yesterday I had the chance to speak on a panel about “The MLA and Its Data: Remix, Reuse, and Research,” which I organized on behalf of the MLA’s Committee on Information Technology. The panel was very successful, due largely to fabulous co-panelists: David Laurence, Ernesto Priego, Chris Zarate, and Lisa Rhody. Ernesto has shared his slides for his presentation on his and Chris’s analysis of tweets from last year’s convention. Unfortunately we missed Jonathan Goodwin, who became ill. Lucky for us, he shared his talk as well.

What follows is the text of my talk, “Constellations at the Convention.” The metaphor of the title suggested itself immediately as I began looking at the network within Gephi, but I couldn’t help but think of Matt Kirschenbaum’s post following the 2011 MLA Convention, “The (DH) Stars Come Out in LA.” I think that the methods I’ve been able to begin deploying here might help us track the star system—if not within the profession, but within the convention.

Even though I say it within the talk, it’s critical that I acknowledge up front the assistance of two people. First, Chris Zarate kindly provided the data from the MLA that I asked for. (The MLA itself needs to be thanked for being willing to support this scholarship.) He made suggestions about the sorts of information he could provide me and gave me exactly what I asked for. Unfortunately, since I had never done something like this before, I didn’t quite know what to ask for. So when I discovered the data weren’t quite as I needed them, my colleague Sara Palmer who took the raw XML and transformed it with XSLT and Python into a format that I could use. Sara and I then spent several hours playing with the data and then talking about the different things that we were seeing. She identified the Midwestern Mafia as a question worth pursuing. Finally, Rebecca Sutton Koeser pointed out the Javascript exporter plugin for Gephi, which is why you can now play with the data easily.

I appreciated the interest from the crowd and the thoughtful questions about “algorithmic cruelty” and where such work might lead in the future. If you want to play with the data yourself, you can download the Gephi file of the 2014 and 2015 Mark Sample data. I will see what I can do about sharing the MLA data set. But for the moment, you can explore the four different networks that I showed.

As always, my work is Creative Commons-licensed. Let me know what you think!

Constellations at the Convention

First, let me start with an epigraph:

Epigraph by Carl Sagan against a field of stars: 'It never happens that a new constellation suddenly rises out of the east. There is an order, a predictability, a permanence about the stars. In a way, they are almost comforting.<br /> Carl Sagan, Cosmos'

“It never happens that a new constellation suddenly rises out of the east. There is an order, a predictability, a permanence about the stars. In a way, they are almost comforting.” Carl Sagan, Cosmos

A man in a red shirt with his hands in the 'namaste' position.

Second, let me thank my colleague Sara Palmer at the Emory Center for Digital Scholarship as she was instrumental in preparing the data for this talk.

Third, let me begin. My presentation today started with a simple question:

A photograph of metal bars in a power tower shot from below

What could we learn about the field of language and literary studies if we considered the structure and networks formed by the MLA Convention itself?

A simple question, it turns out, doesn’t really have a simple answer.

A photograph of model planets hanging from a ceiling

But it does lead to some pretty pictures.

Let’s take, for example, the digital humanities panels at this year’s convention.

A screenshot of samplereality.com and the 'DH at MLA 2015' post

Mark Sample regularly publishes a list of the “more or less digitally-oriented sessions at the upcoming Modern Language Association convention.” Who are the people on these panels and how do they relate to each other? Using Mark’s list, I compiled a 564-line spreadsheet of the panels happening this year and the people speaking on it. I then pulled those data into Gephi, a tool for network visualizations. And this is what I got:

A network visualization of the digital humanities sessions (defined by Sample) at the 2014 and 2015 MLA conventions

On this visualization and the others that I’ll include here, the individual nodes or points represent people who speak at the MLA. The edges or lines between them mean that they have spoken on a panel together. The thicker the edge, the more those two people have spoken together. The larger the node, the more people they have spoken with, and the size of the person’s name also corresponds to the number of people they have spoken with. In this particular graph, the nodes are colored by “community detection.” In other words, an algorithm looks to find people who are connected in “more densely connected internally than with the rest of the network” (Wikipedia definition of “community structure”).

•What we get in this graph is a number of different communities that represent, in some cases, individual panels. Other communities have different panels within them but individuals that pull the community together. What we see here is that while there are a number of digitally focused panels at the Convention this year (7% by Mark’s count), within those panels there are a wide variety of people who are not speaking with one another—literally and potentially figuratively.

But of course, Mark has been compiling his list for more than just this year.

The number '78' in metal on a red brick wall.

What do we learn if we add in the data from the 78 panels in 2014?

A network visualization of the digital humanities sessions (defined by Sample) at the 2014 and 2015 MLA conventions

This is starting to get a lot more interesting! It’s a much more dense network and communities are no longer based around people who speak together at this one convention. People can be connected to each other and be in different communities. Yet for all the increased connectivity in the network, there are still a number of groups who aren’t connected in any way to the larger conversation about the digital humanities, or at least the conversation insofar as Mark has sampled it.

•What would it take to get the periphery more connected to the center? Or, perhaps a better question, to get the center to look outside?

An image of the navel on an navel orange

But perhaps this has been a bit too much navel-gazing, looking at the digital humanists. You’ll notice, at least, that I spared you from a definition of what DH is.

A hand holding a camera lens

So let’s move our lens a bit broader. Let’s look at not just one subfield, but the whole of the convention. And since we’ve already learned that things are more interesting when you add more data, let’s go ahead and look at the entire convention across 10 years. Here I have to think Chris Zarate, my co-panelist and MLA staff member, as well as the MLA in general for being willing to make these data available for me to use.

A network visualization of all MLA speakers from 2004-2014

This represents 16,039 speakers on almost 9,000 panels. There are 58,002 edges among the nodes. This time the nodes / people are colored not by community but by the number of connections she or he has.

A closeup of network visualization of MLA convention speakers

If we dive down toward the center we can start to see the individual stars that this galaxy is made out of, as well as start to think about the constellations we see.

Close-up of network graph of convention speakers

There are of course communities here as well: 181 of them, as far as the algorithm is concerned. And when we change the colors of the nodes to reflect their community you start to get a sense of how the Convention fits together.

A visualization of digital humanists from the 10 years of MLA data, extracted algorithmically

In fact, with community detection, we can identify a group from the whole 10 years who are probably are digital humanists. While they all started as part of a single community identified by the algorithm on the first graph, I’ve pulled them out and run community detection a second time on the subset.

Now that I’ve shown you the galaxy of MLA data, it’s perhaps worth returning to my epigraph.

A slide with the epigraph from earlier in the talk from Carl Sagan

As I think is clear, “constellations” is a good metaphor for these data. Sagan is of course correct when he says there’s an order and a predictability to the stars, and that’s true here as well. But he’s dead wrong when it comes to constellations. New constellations do suddenly appear—because they depend on human pattern recognition and interpretation. The shapes that we see in the stars tell us far more about ourselves than anything else. Such is the case with these data, I think.

I’m short on time, but let me draw your attention to two places where Sara and I have found some interesting results. The first is what I’m going to call the Federal Peripheral.

A visualization of 10 years of MLA convention with one area highlighted

Way out on the edge of the graph is a small community of people who don’t connect to anyone else. In a way, this isn’t totally unusual as we see it happening throughout this graph, despite the generally massively connected nature of things. But every single person in this cluster is, as far as I can tell, an employee of the US Federal Government. How have these individuals not ended up speaking with other members of the organization is a question that I’d like to know more about. But for the moment, I want to point to something strange:

A close-up of one region of the network with individual, repeated names highlighted

The fact that there are two Patrice Shacklefords and three Richard Donovans.

This isn’t some sort of government conspiracy…at least I think it’s not. Instead, this points to some weirdness in the MLA’s data. We isolated individuals in the data by using their MLA member number. But the MLA also grants membership waivers to people who are speaking at the Convention but come from outside the fields of languages and literature. We are getting duplicates of these people because the MLA isn’t always using unique numbers for those who are granted waivers.

A photograph of a white orchid

The problem of membership numbers for those outside the discipline could perhaps be fixed by the use of something like ORCID, but that’s a whole other ball of wax.

A photograph of a wheat field

The second anomaly I want to talk about is the Midwestern Mafia.

A visualization of speakers from 2004-2014 with one area highlighted

Overwhelmingly within the 10 years of data, we see people only speaking with one another once. But there is a small group where the strongest connections are.

A close-up of the network visualization with tightly connected individuals

No one in 10 years is as connected as are David D Anderson and Marilyn Judith Atlas. They have spoken together 10 times across the program. The second strongest connection is between Atlas and Melody M. Zajdel, who have spoken together 7 times.

It’s not surprising that they are part of the same algorithmic community given their connection to one another. But it turns out that they are part of another community: the Society for the Study of Midwestern Literature. These three scholars as well as the others you see in red here are all part of the same society. It makes sense that they have a strong connection and work well with one another.

But what does it say when a Society sends the same people over and over again to the Convention? Without knowing the details of SSML, my hunch is that it might be seeing the number of members decline—or those with the financial ability to attend the convention. oThe result is that there’s a strong likelihood that the strength of connections among these scholars point paradoxically to a corresponding weakness in their particular field.

Stormtrooper toys, one about to get its temperature taken

While the formation of a field depends on people getting to know one another (as in my graphs of DH panels), the long-term health of a field—or of a smaller society—will depend on bringing in new participants to the discussion and in finding ways that individuals can speak to different parts of the Convention than just their own niche.

With more time, we could track the rise and fall of different subfields—at least in connection to their presence at the convention—in this manner.

Star trails against a night sky

I’ve run out of time, I’m afraid. But there’s much, much more that can be done with these data. It’s important that the MLA work to make as much of its data as possible still more robust and all the more available for researchers so we can hunt for more shapes. Because it is the finding of these constellations that really marks us as human.

Andrew Whiteman of Broken Social Scene playing a guitar with the word 'Thanks' on the slide

Thanks.