Not registered? - Request an account here
Tutorial on Glyph Segmentation with Aletheia
This tutorial describes different approaches to segment a document page into glyphs (character objects) using Aletheia.
Efficient OCR Training Data Generation with Aletheia
Poster presented by PRImA in DAS2014, France.
Example Image and Ground truth
These are the example files bundled with Aletheia.
Aletheia Sans is a font derived from Dejavu Sans. It has been enriched with characters that were required by several large-scale European and American digitisation projects for historical documents. Where possible, code points recommended by the Medieval Unicode Font Initiative (MUFI) were used.
Important note: Installing the font permanently will prevent automatic font updates in the Aletheia tool. Future changes will not be visible until the new version of the font is installed manually.
A lightweight web based version of the Aletheia ground truthing system.
WebAletheia is based on the open source library PRImA-GWT.
A survey of OCR evaluation tools and metrics
In The 6th International Workshop on Historical Document Imaging and Processing (HIP '21). Association for Computing Machinery, New York, NY, USA, 13–18.
Efficient and Effective OCR Engine Training
International Journal on Document Analysis and Recognition (IJDAR), 23(1), 73-88
Quality Prediction System for Large-Scale Digitisation Workflows
Proceedings of the 12th IAPR International Workshop on Document Analysis Systems (DAS2016), Santorini, Greece, April 11-14, 2016
Efficient OCR Training Data Generation with Aletheia
Short Paper Booklet of the 11th International Association for Pattern Recognition (IAPR) Workshop on Document Analysis Systems (DAS2014), Tours, France, April 2014, pp. 19-20
Aletheia - An Advanced Document Layout and Text Ground-Truthing System for Production Environments
Proceedings of the 11th International Conference on Document Analysis and Recognition (ICDAR2011), Beijing, China, September 2011, pp. 48-52
These resources are not maintained by the PRImA Research Lab. We are not responsible for the content in these external resources. If you that find any of the pages linked here is irrelevant or outdated, please let us know to update this section.
How do I segment a document using Tesseract then output the resulting bounding boxes and labels?
Useful answer to a question on stackoverflow.