top of page
  • Christopher Mc Carron and Juan Dominguez

Transkribus: How AI is Helping to Expand the Horizons of Historical Research

Updated: Mar 19

Written by: Christopher Mc Carron, Juan Dominguez.


Historical research is often divided into two opposing camps: qualitative and quantitative. This distinction is useful in structuring our thoughts, but it can also serve to limit the horizons of historical research. As AI-driven tools change the way research is done, the gap between these two approaches narrows as the range of opportunities for new research widens. From the point of view of the historian, such tools can often seem impenetrable and arcane but as these fields become more established we are starting to see the next generation of more usable interfaces such as Transkribus.


Transkribus is an AI-driven text recognition software that, as its name suggests, helps to automate the previously very labour intensive task of transcribing historical documents. It does this by taking a run through the manuscript and doing its best to identify each unique line of text. In the beginning this will be quite rough and our task as researchers is to examine its work and manually correct it. These corrections are fed back into Transkribus allowing it to learn from its mistakes and improve its model for the next pass. Although today it often seems like we can find everything online, there are still a large amount of Historical documents that remain undigitised, and Transkribus is helping to make those documents accessible to a wider range of researchers and disciplines.


Dr. Daniel R. Curtis’s ‘Positively Shocking!’, a project funded by the NWO VIDI, is one of those studies that directly benefit from this tool. Its main goal is to study the redistributive effects of epidemic diseases in Early Modern Northwestern Europe. The project also seeks to bridge the gap between the quantitative economic measurements and the qualitative societal effects that these epidemics had. During this last term of the academic year, we took part in an internship with this project where we focused on processing historical manuscripts. As it is often the case for this period, these manuscripts form our primary connection to the past. In our case, we were working with Dutch land registries of the 16th to 18th centuries, (see Figure 1). Unlike other more traditional tools, Transkribus has the ability to learn and grow over the course of a research project, making it particularly useful in our case. Furthermore, the software does not require any high-end specs on the user’s side as most of the computing is done on their servers.


Transkribus also has its uses outside of the strictly academic world. Since the software benefits the most from documents with a consistent structure or pattern, this makes it well suited for genealogy and other forms of personal research. Documents such as census records might appear impenetrable, but the ability to search in the transcribed text makes tasks like finding specific relatives or tracing family lineages much more feasible and less time-consuming.

Dr. Curtis’s project uses Transkribus to study the redistribution of land after epidemics, but the software is currently being used in a wide variety of historical research projects. For instance, during the Irish Civil War the central records office was burned to the ground, researchers at Trinity College Dublin are utilising Transkribus to aid in reconstructing the fragments of records left behind using the AI to line up sections from the same text. Similarly, researchers at the University of Quebec are using the software to unify and digitalize the many different collections of administrative documents left by the colony of New France, which are scattered all across North America, once again showing the power that Transkribus has to create more complete datasets for historians of all specialisations to use in their research.

The more complete Transkribus is, the better it will work. If you are interested in the software we encourage you to try it, the transcription of the first 400 pages is completely free.


Figure 1: ‘Screenshot of Morgenboek Voorhout 1660,’ Transkribus.



Bibliography:

Nederlandse Organisatie voor Wetenschappelijk Onderzoek. “Positively Shocking! The Redistributive Impact of Mass Mortality through Epidemic Diseases and Violent Conflict in Early Modern Northwest Europe.” Accessed June 23, 2022. https://www.nwo.nl/en/projects/016vidi185046

Read Coop. "David Brown on Transkribus & the Beyond 2022 Project." Accessed June 23, 2022. https://readcoop.eu/success-stories/david-brown/.

Read Coop. "Transkribus - AI Powered Handwritten Text Recognition." Accessed June 23, 2022. https://readcoop.eu/transkribus/.

Read Coop. "Nouvelle-France numérique: Collaboration and partnership arising from AI." Accessed June 23, 2022. https://readcoop.eu/success-stories/nouvelle-france-numerique-collaboration-and-partnership-arising-from-ai/.

Trinity College Dublin. "Beyond 2022 - Ireland's Virtual Record Treasury." Accessed June 23, 2022. https://beyond2022.ie.

134 views0 comments

Comments


bottom of page