ETSO: Stylometry applied to the Spanish Golden Age arose from the interest of University of Valladolid, professor Germán Vega García-Luengos, and University of Kentucky graduate student, Álvaro Cuéllar González. We try to apply computational tools to solve the numerous authorship problems present in plays from Spanish Golden Age Theater. This portal offers analysis that can shed light on the authorship of the large theatrical productions of the Golden Age period.
Stylometry is the discipline that strives to compare texts by the frequency of their words and, with its many applications, it can be beneficial for solving cases of unknown authorship. It is based on a simple but powerful hypothesis, which has been tested successfully in recent studies: each writer uses some words more frequently than others thus meaning that each writer has a ‘particular style’ and we can use stylometry to establish relations of proximity between the different texts.
So far, the most effective program to perform the analysis, and the one that is used for all the results offered here is Stylo. It is an R package developed by the Computational Stylistics Group, a team formed by members of the Universities of Krakow and Antwerp, headed by researchers Maciej Eder, Jan Rybicki, and Mike Kestemont.
Thanks to stylometry we can find out which texts coincide mostly closely in the use of words with texts from our corpus. Here, for example, we have searched for texts that are closest to La vida es sueño (1635), by Calderón de la Barca, in a corpus of more than 1200 comedies corresponding to more than 60 authors.
After processing more than 16 million words, stylometry establishes that the 20 closest works (with the lowest distance) to La vida es sueño from among the more than 1200 are by Calderón de la Barca. The result, therefore, supports the attribution to this dramatist.
Imagine the potential of this tool to enlighten us about texts with unknown authorship. We can study several problematic cases, but we will always need a large corpus with which to compare our text after preparing it properly, that is, with modernized spelling, the same lexical variants, without annotations and the names of the characters. Also, we must have the appropriate expertise and protocols so that the results we obtain are as accurate as possible.
That is the value of ETSO, from which you can benefit and with which you can collaborate. If you write to us at Álvaro Cuéllar or Germán Vega we can analyze, stylometrically with our corpus, the authorship of a work using all of the approaches we know. There is no need to consider the results as conclusive, but as an indication of the proximity or remoteness that the texts maintain.
Little by little, our corpus is growing thanks to the addition of texts from many sources. We can not publish the full texts because they are not our property; they belong to different projects, and some will be published soon in printed editions. We can, however, use the frequencies of words for our investigations. If you have any work of the Golden Age with modernized spelling or if you know how to access it, please write to us, we would greatly appreciate your collaboration. The bigger the corpus, the better and more reliable our results will be.