Cookie Name	Cookie Description	When not logged in	When logged in
prima_cookies	Remembers whether you have already closed this message.	Yes	Yes
prima_notice	Remembers if you have alreaded viewed any notice/warning message(s). Such a message is used to inform users of potential downtime or issues that might affect the normal operation of the website. It is set to expire after the date when such notice is obsolete (eg after an expected downtime/error is fixed).	Yes	Yes
PHPSESSID	The ID of your session.	Yes	Yes
__utma	This is set by Google Analytics. It stores each user's amount of visits, and the time of the first visit, the previous visit, and the current visit.	Yes	Yes
__utmb, __utmc	These are set by Google Analytics. They are used to check approximately how long you stay on a site (when a visit starts, and approximately ends).	Yes	Yes
__utmz	This is set by Google Analytics. It stores where a visitor came from (search engine, search keyword, link).	Yes	Yes

A Self-Adaptive Method for Extraction of Document-Specific Alphabets

S. Pletschacher

Proceedings of the 10th International Conference on Document Analysis and Recognition (ICDAR2009), Barcelona, Spain, July 2009, pp. 656-660

Abstract

Recognition and encoding of digitized historical documents is still a challenging and difficult task. A major problem is the occurrence of unknown glyphs and symbols which might not even exist in modern alphabets. Current pre-trained OCR-methods hardly deliver usable results for such documents. This paper describes an alternative approach and framework for handling printed historical documents without restrictions on the contained alphabets or fonts. The basic idea is to derive all information required for encoding directly from the document itself. This is achieved by extracting a document-specific prototype alphabet of locatable glyphs. Core of the system is a customized clustering method which adapts automatically to new documents by ascertaining appropriate threshold parameters based on the special characteristics of glyphs. This way, the system is able to run without manual interventions and can be integrated into automated mass digitization workflows.

Citation

S. Pletschacher , "A Self-Adaptive Method for Extraction of Document-Specific Alphabets", Proceedings of the 10th International Conference on Document Analysis and Recognition (ICDAR2009), Barcelona, Spain, July 2009, pp. 656-660

DOI

10.1109/ICDAR.2009.253

Full Paper

Download PDF

PRImA

A Self-Adaptive Method for Extraction of Document-Specific Alphabets

Abstract

Citation

DOI

Full Paper