Search our website:

Digitising Finnish history using crowdsourced volunteers

Change management

The National Library had a huge problem on its hands: how to correct the mistakes optical character recognition (OCR) software had made digitizing millions of pages of historical archives. Instead of spending millions of Euros and many years trying to do this with library employees, the idea was to use a voluntary crowdsourced workforce. It did this by partnering with crowdsourcing company Microtask, and creating the Digitalkoot project.

Digitalkoot broke all the work up into small tasks and created an online platform where people could verify the words in a fun and engaging way from their own homes and workplaces. The project relied on the fact that people, especially Finns, would be interested in helping to preserve Finnish culture – as long as it was easy enough for them to do so.

How did they do it?

The image below shows what the original text looked like (on the far left) and then, next to that, what it looked like after the Library’s OCR software had digitized it. The words in red are those in which the accuracy is likely to be poor. The box on the right has had a person check the original text and the OCR words, and correct any mistakes.

In practice what the Microtask Platform did was to identify and extract the problematic words (the red ones), and then present them individually to online volunteers for checking. The platform would then automatically collect the answers, verify them, and insert them back into the digitized newspapers. The process is outlined below. 

But rather than asking the volunteers to spend time tediously reading through and typing out text that may not interest them, all they had to do was play social computer games. These games made helping correct the errors fun and competitive.

In order to succeed in these games, players must accurately type in the difficult to read words cut from the original scanned version of the archive. To make sure that people were correctly entering the words, the same words were sent to different players simultaneously. This verified the results, ensuring a high level of accuracy.
 
The game looks like this:

About this case study
Main Contact

Ville Miettinen

Founder & CEO, Microtask

Email: wili@microtask.com

Twitter: https://twitter.com/#!/wili
Blog: http://blog.microtask.com


Ville Miettinen wrote this case study for Governance International on 1 September 2012.

Copyright © Governance International ®, 2010 -2019. All rights reserved