IMC Leeds Paper: Sending 15th-Century Missives through Algorithms: Testing and Evaluating HTR with 2,200 Documents

[This paper was given at the IMC 2017 in Leeds]. For the discussion on twitter see also the storify.


Abstract

Is it possible to teach algorithms to read medieval handwriting? Does it make sense to have the material prepared by students, learning to read gothic writing at the same time? Those two simple questions lay the ground to discuss how and whether handwritten text recognition and teaching of the Middle ages can be intertwined.

The material to address the tasks consist of 2’200 missives from Thun, a small town in Switzerland. 120 documents were transcribed and used for training. In the process three difficulties had to be identified: Different and changing hands, difficult layout structures, and abbreviations. The identified difficulties are typical for such an endeavor. Unfortunately the results of the recognition are insufficient and cannot be used by scholars. The „small“ amount of material for training is a reason for this. Using language models, the results can be improved, although crucial parts such as names and verbs still remain only partially identifiable.

At the same time the combination of teaching and the use of cutting-edge technological tools proved engaging. The students involved were highly motivated and welcomed the possibility to take part in a digital research endeavor.


Intro

The teaching of paleography is in my opinion one of the core tools in order to get an insight of what the study of the Middle Ages as well as the Middle Age are about.

At the same time, technology promises to help us in regards of transcription but in the future also the identification of places, persons etc.

In the last six months, I tried to bring both aspects – the teaching of paleography as well as the technology – together by teaching students and algorithms to read gothic cursive of the 15th century.

This paper therefore lays out the Idea behind, the material used for the experiment and the outcomes of the text recognition as well as the outcome regarding the aspect of teaching paleography. So this is basically a lab report (and fits as blog post).

Since the results of the tests with the documents were not sufficient, at the end of the paper, I’d like to bring up briefly an approach, how handwritten text recognition worked better.

Idea

Was as simple as fitting: Bring together a software called Transkribus and students from the University of Zurich in the endeavor to transcribe missives. The goal was to let the students experience reading handwriting and at the same time provide material in order to train handwritten text recognition. Of course we also wanted to try out the HTR in the classroom.

READ_imc-leeds-2017-07-03-1.png

Idea came from teaching with an e-learning tool, developed at the university of Zurich, called «ad fontes».

The e-learning teaches people to read handwriting (with a focus on medieval and early modern scripts).

There are a lot of advantages to the approach, for example that the user can get tipps or receives feedback on words not or wrongly deciphered (ie. you see that Schultheiss is transcribed incorrectly with a „z“).

The downside of the tool lies in the fact, that nothing new can be detected. Everything is to be found in a perfectly streamlined way. In order to get students an impression how actual research or at least the transcription of single documents work, I started using Transkribus as a tool that allows for the transcription in the cloud.

The documents used in Transkribus are stored on servers at the university of Innsbruck. Access is given only to those designated by the uploader and/or the owner of a collection. Transkribus can also be used to train recurrent neural networks for Handwritten Text Recognition (or rather to produce a model for text recognition). Therefore the idea was not far to bring both approaches together and use the material provided in order to produce a model specific for fifteen century missives.

READ_imc-leeds-2017-07-03-2.png

Since the training of Handwritten Text Recognition, same as training of other neural networks, depends on masses of material for training in order to give good results, I added further transcriptions that were already provided by students on a wiki (developed some years ago).

But let’s first take a step back and look at the material:

From the first half to the end of the fifteenth century, more than two thousand missives have been preserved containing instructions of the city council of Berne to its bailiffs in Thun, a small town that became part of the Bernese territory at the end of the fourteenth century. These documents shed light on local dimensions of a city’s territorial lordship. Like many other cities subject to similar development, Berne acquired territorial lordship over an extended hinterland, claimed control over minor cities, and used them as district-towns in its territorial administration.

The topics dealt with in the missives are very diverse and sometimes amusing to the modern reader. Skirts were stolen by husbands or roosters deemed to loud in the morning. As a consequence, petitioner (male and female) went to Berne, complained in front of the council and the bailiff was then contacted by the council, using such missives.

The corpus spans more than 100 years, therefore we find a variety of scribes and also an evolving chancery that put forth rather irregular scripts.

Four problems or let’s say elements influence the results of the HTR:

  1. Different scribes —> the students did not have to focus on a single scribe/time-frame; therefore the transcriptions produced were from a variety of scribes.
  2. Not very regular scripts (the documents were not for display but rather needed to be sent out quickly)
  3. recognition of layout (or layout analysis as it’s called) does not work properly: the task that is even for regular scripts difficult; more so for missives (also due to this, Layout analysis has not been taken into account)
  4. and only to some degrees problematic, is a missing language model (like all vernacular of the Middle Ages early modern German lacks standardization), as we will see, we still can produce a very simple language model

From the teaching perspective

Transkribus can be recommended, but only if you are willing to allow for some time spent on explaining the tools:

  • it allows for a simple feedback-loop thanks to shared working-space (everything saved in a cloud)
  • but no real-time help

Every student was able to prepare transcriptions for 2 missives (11 students, makes 22 missives); they were also responsible for corrections of the layout. The process of transcribing led to intense (and heated) discussions about rules for transcription (normalization, dealing with abbreviation). In order to deal with HTR this is very important, i.e. expansion of abbreviation without „telling“ the algorithms leads to diminished quality of recognition.

At the same time, the discussions were a very fruitful experience in order to understand what rules and forms of normalization are common in edition (digital or analogue) but also to raise the question what a „text“ makes.

In order to produce further material for training about 100 already transcribed missives were added by myself. In the end the so called Ground Truth consisted of 21’682 words on 1900 lines.

READ_imc-leeds-2017-07-03-4.png

A look at the training curve shows that the test sets (that’s the document who were prepared but not used for training in order to determine the quality of the recognition) were recognized with around 26% Character Error Rate.

Meaning: That more than every forth character was recognized incorrect. That’s quite a lot, if you look at the example in the screenshot.

READ_imc-leeds-2017-07-03-5.png

This is one of the „better“ examples with 20% CER. Only one five letters is identified incorrectly.

As mentioned, there is the possibility to add „a language model“ or rather a vocabulary consisting of the words used in the 120 missives transcribed and prepared. With such a model the Character Error Rate (CER) used to decrease dramatically for around 7-9% (so we get from 20 to 13%!)

READ_imc-leeds-2017-07-03-6.pngOf course this is some sort of „cheating“, but still medieval documents tend to be repetitive. The missives are similar to diplomatic documents and are rather formulaic in writing.

The increase of the recognition is mostly on standard phrases, that leaves us with wrongly recognized named entities and verbs.

Only from recognized transcriptions, the content is hard to identify. But it can be helpful if you are familiar with the script and you only need to have a short look at a document. That means at the same time that there’s a disadvantage if ones unfamiliar/unable to read the script; the hard to read parts are also mostly wrongly recognized (so it’s not very helpful for beginners, and even less so lay people).

Thanks to the close connection between text and image, you will be able to check on the image and make sure that the transcription is correct. Also you are able to search through collections within Transkribus without exporting the documents (even fuzzy searches are implemented).

Still, with as little as 100 something very short pages, you got a model that helps, giving a first impression of the content of a document but not much more.

Dealing with such a broad range of documents (with its different scribes), it would be interesting to apply writer identification. In order to determine for which documents a specific model works better. Currently, this is being integrated in Transkribus and we expect it to be ready in autumn.


One scribe, one cartulary

Since the demonstrated example led to rather insufficient results, let me briefly bring up an example that worked with a similar input of material for training (see also this blog post).

READ_imc-leeds-2017-07-03-7.png

For Königsfelden abbey, a monastery founded in 1309, a cartulary was produced in 1336. The codex was written in a very regular gothic book-script. Copies of charters given to the monastery were entered in the cartulary.

We prepared for training 25’658 words (that’s almost 4300 very short lines). The result is a character error rate of 10% on average (without language models, so it doesn’t matter whether the charter copied was written in Latin or in the vernacular). Even abbreviations and special characters (such as the latin genitiv ending -rum), can with some reliability be recognized.

The differences between the two examples is obvious:

Whilst one corpus consists of different hands from different times, the other is just the writing of two or three scribes (and to be honest the model only works good for the main hand). Still, it can be said that the more material for training of a single hand is available, the better the recognition is going to be.

Conclusion

For the teaching experience, I can very much recommend the approach taken. Especially the high level of motivation among the students bears witness of the interest to use new technology combined with the ambition to learn the skill-set of an „analogue“ paleographer. Nonetheless, it needs to be emphasized that task of learning to decipher gothic writing is still an arduous one. One of the strengths of the approach, was the need to discuss what text means in regard to a medieval document. And also what is necessary in order to use it for further inquiries.

After all the best case scenario would be that we get a transcription of a folio or a page according to what we trained an algorithm to do.

Regarding technical  aspects, it is in my opinion fair to say, that we will have in short a good recognition of Medieval Handwritings. I’m sure that Character Error Rates around 10% (and even below) will be the standard.

Of course we need to be aware that this is not to be compared to editions and human produced transcripts but still we will be able to access vast amounts of text from the Middle Ages in years to come.

 

[Disclaimer: I’m associated with the project READ as a research associate at the state archives of Zurich]

READ_imc-leeds-2017-07-03-0.png


Interview zu digitaler Edition

Ein bisschen Werbung in eigener Sache…

Während der Tagung «Edition! Wozu? Wie? Und Wieviele?» vom vergangenen November hat Infoclio.ch die Gelegenheit genützt und einige Interviews mit den beteiligten Personen durchgeführt (zur Reporting-Seite mit allen Interviews und Mitschnitten einiger Vorträge). Ich hatte das Glück zusammen mit Christiane Sibille (DODIS) und Gerhard Lauer (Uni Göttingen) kurze Statements abzugeben.

Das Resultat:

Interview zum Panel 4: T. Hodel, C. Sibille, G. Lauer from infoclio.ch on Vimeo.