Cookie Name	Cookie Description	When not logged in	When logged in
prima_cookies	Remembers whether you have already closed this message.	Yes	Yes
prima_notice	Remembers if you have alreaded viewed any notice/warning message(s). Such a message is used to inform users of potential downtime or issues that might affect the normal operation of the website. It is set to expire after the date when such notice is obsolete (eg after an expected downtime/error is fixed).	Yes	Yes
PHPSESSID	The ID of your session.	Yes	Yes
__utma	This is set by Google Analytics. It stores each user's amount of visits, and the time of the first visit, the previous visit, and the current visit.	Yes	Yes
__utmb, __utmc	These are set by Google Analytics. They are used to check approximately how long you stay on a site (when a visit starts, and approximately ends).	Yes	Yes
__utmz	This is set by Google Analytics. It stores where a visitor came from (search engine, search keyword, link).	Yes	Yes

Quality Prediction System for Large-Scale Digitisation Workflows

C. Clausner, S. Pletschacher, A. Antonacopoulos

Proceedings of the 12th IAPR International Workshop on Document Analysis Systems (DAS2016), Santorini, Greece, April 11-14, 2016

Abstract

The feasibility of large-scale OCR projects can so far only be assessed by running pilot studies on subsets of the target document collections and measuring the success of different workflows based on precise ground truth, which can be very costly to produce in the required volume. The premise of this paper is that, as an alternative, quality prediction may be used to approximate the success of a given OCR workflow. A new system is thus presented where a classifier is trained using metadata, image and layout features in combination with measured success rates (based on minimal ground truth). Subsequently, only document images are required as input for the numeric prediction of the quality score (no ground truth required). This way, the system can be applied to any number of similar (unseen) documents in order to assess their suitability for being processed using the particular workflow. The usefulness of the system has been validated using a realistic dataset of historical newspaper pages.

Citation

C. Clausner, S. Pletschacher, A. Antonacopoulos , "Quality Prediction System for Large-Scale Digitisation Workflows", Proceedings of the 12th IAPR International Workshop on Document Analysis Systems (DAS2016), Santorini, Greece, April 11-14, 2016

Full Paper

Download PDF

Related Projects

"") { $image_src = Constants::SITE_ROOT().'/www/media/projects/no_image.png'; $image_alt = "No image."; } else $image_src = Constants::SITE_ROOT().'/www/media/projects/no_image.png'; ?>
Warning: Undefined variable $image_src in /media/PrimaStorage/wwwroot/www/www/views/publication_details.phtml on line 63

PRImA

Quality Prediction System for Large-Scale Digitisation Workflows

Abstract

Citation

Full Paper

Related Projects