We found this recent comment by a volunteer on a FromThePage project to be fascinating:
“I am sad to report I have found numerous errors, too many to even begin to fix, within these pages… It will be much easier to completely transcribe from the beginning correctly, than try and fix ALL the typos. Would you like me to do this for the Library? “
OCR correction is arguably easier than full transcription, but based on this volunteer comment it is less fun and more frustrating.
We spoke to the project owner for this particular project, and she’s hoping that high school aged volunteers, who are on site and working together, might be a good match for OCR correction. Often younger transcribers have less experience with cursive handwriting, so I’m curious to see if this idea works.
We’ve seen OCR correction work on projects like Trove — the OCR feeds the search, and the person reading the article is motivated to fix the mistakes when they run into them, because they are emotionally invested in the material. (No one likes mistakes!)
We’ve also seen great projects, like the Alabama Department of Archives and History’s WW1 service cards project, that engage volunteers to transcribe typewritten text.
We also suspect that the proportion of errors in the text make a difference for the corrector’s experience.
If you are thinking about doing an OCR correction project, we would recommend you think about:
- How interesting is the text to start with? Is it fun to read?
- How engaged are volunteers with the text to start with? Do they have a reason to read it, which might lead to a reason to fix it?
- What’s the proportion of errors in the text? The higher the proportion, the more frustrating the experience, the more likely a transcription of typewritten text would make sense.
You may also be interested in this article Ben wrote with a more technical review of OCR correction in FromThePage.
If you’d like to start a transcription — or OCR correction — project in FromThePage, contact us and we’ll get you started.
Source: OCR Correction vs Transcription