Cookie Name	Cookie Description	When not logged in	When logged in
prima_cookies	Remembers whether you have already closed this message.	Yes	Yes
prima_notice	Remembers if you have alreaded viewed any notice/warning message(s). Such a message is used to inform users of potential downtime or issues that might affect the normal operation of the website. It is set to expire after the date when such notice is obsolete (eg after an expected downtime/error is fixed).	Yes	Yes
PHPSESSID	The ID of your session.	Yes	Yes
__utma	This is set by Google Analytics. It stores each user's amount of visits, and the time of the first visit, the previous visit, and the current visit.	Yes	Yes
__utmb, __utmc	These are set by Google Analytics. They are used to check approximately how long you stay on a site (when a visit starts, and approximately ends).	Yes	Yes
__utmz	This is set by Google Analytics. It stores where a visitor came from (search engine, search keyword, link).	Yes	Yes

Aletheia EditionsResourcesUse cases

<a name='VideoTutorials'></a>Video tutorials

Aletheia 3 Tutorial - Introduction to the Aletheia software

Go to our YouTube channel for more video tutorials

Tutorial on Glyph Segmentation with Aletheia

This tutorial describes different approaches to segment a document page into glyphs (character objects) using Aletheia.

Download PDF

Efficient OCR Training Data Generation with Aletheia

Poster presented by PRImA in DAS2014, France.

Download PDF

Aletheia 3 User Guide

Download PDF

A brief introduction to Aletheia

Download PDF

<a name='ExampleFiles'></a>Example files

Example Image and Ground truth

These are the example files bundled with Aletheia.

Download ZIP [2.38 MB]

<h2><a name='AletheiaSans'></a>Aletheia Sans Font</h2>

Aletheia Sans is a font derived from Dejavu Sans. It has been enriched with characters that were required by several large-scale European and American digitisation projects for historical documents. Where possible, code points recommended by the Medieval Unicode Font Initiative (MUFI) were used.

Important note: Installing the font permanently will prevent automatic font updates in the Aletheia tool. Future changes will not be visible until the new version of the font is installed manually.

Download the latest version

A lightweight web based version of the Aletheia ground truthing system.

WebAletheia is based on the open source library PRImA-GWT.

Try it

PRImA-GWT on GitHub

<a name='RelatedPublications'></a>Related Publications

A survey of OCR evaluation tools and metrics

C. Neudecker, K. Baierer, C. Clausner, A. Antonacopoulos, S. Pletschacher

In The 6th International Workshop on Historical Document Imaging and Processing (HIP '21). Association for Computing Machinery, New York, NY, USA, 13–18.

Details » Download PDF

Efficient and Effective OCR Engine Training

C. Clausner, A. Antonacopoulos, S. Pletschacher

International Journal on Document Analysis and Recognition (IJDAR), 23(1), 73-88

Details »

Quality Prediction System for Large-Scale Digitisation Workflows

C. Clausner, S. Pletschacher, A. Antonacopoulos

Proceedings of the 12th IAPR International Workshop on Document Analysis Systems (DAS2016), Santorini, Greece, April 11-14, 2016

Details » Download PDF

Efficient OCR Training Data Generation with Aletheia

C. Clausner, S. Pletschacher, A. Antonacopoulos

Short Paper Booklet of the 11th International Association for Pattern Recognition (IAPR) Workshop on Document Analysis Systems (DAS2014), Tours, France, April 2014, pp. 19-20

Details » Download PDF

Aletheia - An Advanced Document Layout and Text Ground-Truthing System for Production Environments

C. Clausner, S. Pletschacher, A. Antonacopoulos

Proceedings of the 11th International Conference on Document Analysis and Recognition (ICDAR2011), Beijing, China, September 2011, pp. 48-52

Details » Download PDF

<a name='CommunityResources'></a>Community resources

These resources are not maintained by the PRImA Research Lab. We are not responsible for the content in these external resources. If you that find any of the pages linked here is irrelevant or outdated, please let us know to update this section.

How do I segment a document using Tesseract then output the resulting bounding boxes and labels?

Useful answer to a question on stackoverflow.

Go to article