Welcome to Week 5!
Welcome to Week 5 of digital humanities!
This week, we're going to consider the affordances and limits of data visualization. While mapping, which we discussed last week, is one type of data visualization, there are a range of other kinds that we can use to visualize humanities data.
This is probably going to be the most challenging reading and experimentation so far - I was working you up to it!
What we aren't going to do is create datasets - we'll be working with existing ones. If this weren't a 6-week course, we would have spent a week on creating datasets. Since a lot of you are interested in digital humanities in relation to your teaching, it did not seem like the most effective use of time in our six weeks. If you would be interested in learning how to create datasets (or how to find the data for a dataset in the first place), please make a note in the "Open Forum" section of this week's discussion and I will go into it!
1) Read
- Stéfan Sinclair et al., "Information Visualization for Humanities Scholars," https://dlsanthology.commons.mla.org/information-visualization-for-humanities-scholars/ Links to an external site.
This essay discusses why information visualization might be useful in the humanities. The authors also give an overview of a range of tools that can be used for information visualization. Their examples demonstrate what these methods look like applied to literary study.
- Tara Zepel, "Visualization as a Digital Humanities _____?" https://www.hastac.org/blogs/tzepel/2013/05/02/visualization-digital-humanities Links to an external site. Links to an external site.
This post explores how we can conceptualize what visualization is and why it might be useful for digital humanities. It's a concise and clear look at the "whats" and "whys" of visualization and DH.
- Franco Moretti, "Network Theory, Plot Analysis," http://litlab.stanford.edu/LiteraryLabPamphlet2.pdf Links to an external site.
This pamphlet is a dense read but goes into detail about applying data visualization - specifically network analysis - to literary studies. Give it a try and make sure to click through to the figures so you can follow along more easily. I'm not really sure I'm sold on the work that Stanford Literary Lab is doing because I'm a bit of a skeptic and wonder if this is dubious quantification of literature. But I think it's important to know about and to take seriously as an approach to literature.
Moretti uses Hamlet as an example, so if you haven't read it or aren't familiar with it, you may want to brush up here: https://www.cliffsnotes.com/literature/h/hamlet/play-summary Links to an external site.. [And if you haven't read Hamlet, please read it at some point. I don't even like Shakespeare very much and LOVE Hamlet. Also, read The Tempest. I love The Tempest.]
- Chuck Rybak, "DH Toe Dip: Character Networks in Gephi," http://chuckrybak.com/teaching/dh-toe-dip-character-networks-in-gephi Links to an external site.
This is a description of doing a very small data visualization assignment with a class. Rybak was inspired by the Moretti pamphlet above. He describes the process, includes links to useful tutorials for learning the process, and discusses the results. Seeing what Rybak does at a small scale helps illustrate what Moretti is talking about.
- Scott Weingart, "Demystifying Networks," http://journalofdigitalhumanities.org/1-1/demystifying-networks-by-scott-weingart/ Links to an external site.
This essay gives an overview of using networks for literary analysis. It has some interesting warnings about network analysis that are worth considering. The essay also explains the meaning of terminology for data visualization, specifically "nodes" and "edges," which are key terms you'll see in data viz.
- "Cheat Sheet: Social Network Analysis for Humanists," http://cvcedhlab.hypotheses.org/106 Links to an external site.
Data visualization terminology can be jargon-y. If you find yourself getting confused, here's a cheat sheet.
2) Experiment:
There are two parts to this week's experimentation - one that focuses on a range of types of visualizations with a dataset and one that focuses specifically on network analysis.
Part I
Follow the instructions located at this tutorial of the data visualization tool Palladio to experiment with the dataset of photographs from the Cushman Collection at Indiana University: http://miriamposner.com/blog/getting-started-with-palladio Links to an external site..
The tutorial has step-by-step instructions for how to use Palladio.
The dataset is linked in the tutorial but buried in some verbiage, but you can find it here too: dataset Links to an external site.. [Click on "Download" in the upper right corner and select "Direct Download."]
Use the data visualizations to explore the following questions:
- When did Cushman take the most photographs?
- Where did Cushman take the most photographs?
- Can you connect travels or photographs with events in Cushman’s life? You can read about him here Links to an external site..
- When and where did Cushman take photographs of landscape features, like trees, clouds, and the sky?
- When and where did Cushman tend to take photographs of people?
- Can you map Cushman’s travels to a particular road or interstate highway? How would you do this?
- What other information would you need to fully understand this data? How might you obtain that information?
Part II
This part of the experimentation for the week looks at a single type of data visualization - network analysis - supported by a single tool called Gephi. This is also an example of literary data visualization specifically.
This dataset will demonstrate connections between bibliographic data from the modernist literary movement of the late 19th/early 20th century. If you aren't familiar with modernism, there's a good description of it here: http://www.online-literature.com/periods/modernism.php Links to an external site.. Using network analysis, we can see connections we may not have predicted between magazines and certain authors or particular authors and themes. We can also consider how network analysis might be a useful tool for engaging with literary history and think creatively about how analysis of information can help us ask and answer questions about the humanities.
Step 1: Download Gephi
Gephi is a free piece of software that can be downloaded here: https://gephi.org Links to an external site.
Step 2: Import Data into Gephi
A) Download the dataset located here: little-review-1918-09.csv Download little-review-1918-09.csv. (Make sure you know where to find it again! Save it to either your downloads folder or your desktop.)
B) Start Gephi and go to File > Open...
C) Browse to the dataset file and open it.
D) A dialogue window will ask if you want certain parameters, so make sure Graph Type is set to Directed, and that Auto-scale, Create missing nodes, and Add full graph are all checked. These options should already be set as the default.
E) Once the dialogue appears as in this picture, click OK.
[Yours may look slightly different depending on operating system, but should be more or less similar.]
Step 3: Manipulating Graph and Adding Labels
A) Once the data are imported, you'll see a square-ish rat's nest of black dots and lines, some of which are darker than others. Mouse over them to see how their immediate neighbors become highlighted within the graph. These dots represent different nodes within the data (such as author names and magazine titles), and show you the connecting links to other data.
B) You'll notice that Gephi doesn't immediately tell you what these dots are, so we need to make their labels visible. Click on the little rectangular arrow at the bottom right of the Graph area to open some options.
C) Once the options pane opens across the bottom, click on the Labels tab and then on the checkbox next to Node.
D) After a moment, the Graph area will fill with large text, which you should make smaller by clicking on the Font button (under the Node checkbox) and selecting a smaller font size like 8pt.
E) Now that the text is smaller, try zooming in to get a closer look, or pulling nodes around, and get a sense of what's in the network model.
F) To try a different layout, go to the Layout panel in the left sidebar, select one of the options (such as Yifan Hu) and then click on the Run button.
Step 4: Adding Interpretive Features with Node Size and Color
Gephi can change the visual aspects of its nodes and edges to represent the levels of connectedness among the nodes. For example, the nodes that have more connections, such as the magazine BLAST, will appear larger or in different colors to provide a more immediate conception of where the dominant nodes in the network lie. These features are based on statistical properties that Gephi will compute.
A) In the right sidebar, click on the Statistics tab. If it is not there, add it by clicking on the Window menu at the top of the screen and then selecting Statistics.
B) Next to Average degree, click on the Run button. You will see some output about the degree densities of the graph.
C) Now, in the left sidebar, click on the Ranking tab, and then on the Nodes sub-tab.
D) To the right of the Nodes subtab, the Color palette should already be selected.
E) Select Degree from the dropdown menu and then click on the Apply button. The nodes should now appear on a scale from red to black (or whatever colors were pre-selected on your version of Gephi).
F) Click on the Weight button (looks like a red diamond) to the right of the color palette.
G) Select Degree again, leave the default size parameters in place, and click on the Apply button.
H) Some of the nodes should now appear larger.
Step 5: Creating Ego Networks
The large-scale graph can be very interesting. However, we might wish to drill down and look at smaller networks in order to enhance our thinking about a particular author, magazine, or theme. Filtering for an ego network (a network centered on a single node) is one way to do this.
A) In the right sidebar, click on the Filters tab.
B) Expand the Topology folder.
C) Drag Ego Network down to the Queries area and then click on it.
D) In the Node ID field at the bottom, type in a term from our spreadsheet that you wish to examine, such as BLAST.
E) Click on OK and then on Filter to start the Ego Network.
F) You will then see a smaller network for those nodes most immediately connected to BLAST.
G) You can then adjust the number of degrees out from BLAST shown in the network by selecting 1, 2, 3, or Max from the Depth menu under the Node ID field.
Step 6: Use the Network Graph to Consider These Questions
A) Play around with the layouts and other visualization widgets to gain a sense of the graph’s content and structure. Does anything seem to be missing? Do you agree or disagree with any of the genre or topic tag connections? [Refer to the entry on Modernism linked above so you have more context for this material.]
B) Select and run the Yifan Hu layout algorithm. What significant groupings and divisions do you notice among the data? How does this relate or not relate to your understanding of Modernism? [Refer to the entry on Modernism linked above so you have more context for this material.]
C) Select the Fruchterman Rheingold layout algorithm and try moving the nodes around. Does this layout allow you to read the data any differently than Yifan Hu? How so?
D) Does small-scale network visualization strike you as a useful method for literary studies? Why or why not?
3) Post to the discussion board for the week. By Thursday at 11:59pm EST of each week, you must post a 250-word response to each question. By Sunday at 11:59pm EST of each week. you must post at least two substantive follow-up comments to each question. These should engage in conversations developing on the discussion threads.
Don't forget that you can use the Troubleshooting thread to ask questions - particularly if you are running into trouble with the social media analysis platforms.