Text processing

Automatic transcriptions

Automatic transcription workflows.

Cuéllar, Álvaro. (2023). «La Inteligencia Artificial al rescate del Siglo de Oro. Transcripción y modernización automática de mil trescientos impresos y manuscritos teatrales» . Hipogrifo. Revista de literatura y cultura del Siglo de Oro, vol. 11, núm. 1, pp. 101-115.

New workflow

HTR + LLM: page-by-page reviewed transcriptions

Alongside our general HTR models, we are developing a new workflow for complex theatrical manuscripts. An initial automatic reading is checked page by page with language models and direct visual inspection of the facsimile.

The resulting texts are organized by acts, numbered with simple sequential pages, preserve doubts when a reading is uncertain, and are prepared in TEI format for consultation in BITESO.

This process does not replace a critical edition, but it helps turn difficult manuscripts into navigable, reviewable digital texts linked to their images.

We have recently developed automatic transcription workflows using Transkribus. These workflows have allowed us to automatically transcribe and modernize the spelling of around 1,000 printed books and 350 manuscripts of Golden Age theatre, which are now part of CETSO and TEXORO.

1,000

printed books automatically transcribed and spelling-modernized

350

manuscripts incorporated into the project workflows

99%

approximate accuracy for printed books

90%

approximate accuracy for manuscripts

The three models used are public, and anyone can use them through Transkribus.

Transkribus, 2021

Spanish Golden Age Prints 1.0

Model trained for automatic transcription of Golden Age theatrical printed books.

Transkribus, 2021

Spanish Golden Age Prints (Spelling Modernization) 1.0

Version designed for automatic spelling modernization of already transcribed printed books.

Transkribus, 2021

Spanish Golden Age Manuscripts (Spelling Modernization) 1.0

Model focused on theatrical manuscripts, with spelling modernization and detection of relevant features.

These models allow us to transcribe theatrical printed books and manuscripts with a high degree of accuracy: approximately 99% accuracy for printed books and 90% for manuscripts. Our transcriptions can also automatically modernize spelling according to current standards and detect certain elements, such as italics.

Example of automatic transcription applied to a Golden Age theatrical text. Second example of automatic transcription and spelling modernization.

If you would like to learn more about the tool, apply our transcription models to your documents, or request a specific transcription of a printed book or manuscript for research, please contact Álvaro Cuéllar.