Stylometry
Day 2 of the program was dedicated to using digital resources to look at authorial style. As discussed in Day 1, programs like Voyant Tools use tokens of text to pick up on word patterns. These patterns of things such as average sentence length, most common stopwords, most common unique words, etc. can be used to create, what one might call an authorial signature. An author’s style is not simply their chosen words, but the smaller parts of syntax often lost to a human reader. However, a computer algorithm can recognize subtle and often unique trends in authorial writing style. It may, for example, demonstrate how the author uses contractions or punctuation. The terms and practices of Stylometry have been around since the 1850s, just without computer algorithms to help. A person would do the job of physically counting and comparing word frequencies between texts, examining keywords words in their context, and looking at unique transcription patterns and abbreviations.
As a class, we examined how stylometry programs function using the coding Language R: a programming language specializing in statistical data and visualization. As a class, we did not write any code from scratch but instead used a template from the website Posit Could. We took a work of co-authorship and then selected a corpus of individual text samples from each author. The Posit Could program then creates a style package for each author and asses which parts of the co-authorship text fit which author’s trends presented in their package of work. Posit Cloud then spits out a color-coded diagram of its suspected authorship in a co-authored text (pictured below).

Then, the whole class presented 3-5 sources from either a given genre or a given author and we asked the computer program to cluster based on style. I chose to examine the genre of epic poetry and again used The Iliad, Odyssey, and Aeneid. I also decided to add two other texts written by Vergil to see if the program would organize by author or genre. To my surprise, The Iliad, Aeneid, and two of Vergil’s poems (the Eclogues and Georgics) were all sorted together, but Homer’s Odyssey was categorized with a markedly different genre of authors selected by members of the class (see below). It sparked a slew of questions for me: Why was it so far away? Did it have something to do with a change in style away from the other more war-driven texts? Was there a computer error? Does this tell us more about the interpretation than the original source? I think about how helpful this might be in looking at historical texts with questionable authorship or with matching fragments of text based on style (such as the ancient Greek poet Sappho whose fragmented text exists as well as many copycats claiming to be her by mimicking style

