Not registered? - Request an account here

http://www.impact-project.eu

IMPACT - Improving Access to Text

IMPACT - Improving Access to Text

Concept

In the i2010 vision of a European Digital Library, the EU launched an ambitious plan for large scale digitisation projects transforming Europe’s printed heritage into digitally available resources. The aim of fully integrating intellectual content into the modern information and communication technologies environment can only be achieved by full-text digitisation: transforming digital images of scanned books into electronic text.

Over the last 2-3 years mass-digitisation has become one of the most prominent issues in the library world. Today, a number of advanced libraries in Europe are scanning millions of pages each year and large scale-digitisation is a matter of fact, not a vision any more. However, these efforts can tackle only a fraction of the total heritage available in cultural memory organisations. The digitised material is becoming available too slowly and in too small quantities from too few sources, for three reasons.

  1. There is a lack of institutional knowledge and expertise which causes inefficiency and ‘re-inventing the wheel’. This is a problem for the vast majority of libraries, museums and archives in Europe.
  2. The costs for full-featured electronic text of historical documents are much too high. Cultural heritage institutions will not be able to satisfy the needs of their users for electronic texts instead of pure digital images. Manual keying costs around 1 EUR per page, so that a typical book sums up to 400, 500 or even 1000 EUR.
  3. Automated text recognition, carried out by Optical Character Recognition (OCR) engines does in many cases not produce satisfying results for historical documents. Recognition rates are poor or even useless. No commercial or other OCR engine is able to cope satisfactorily with the wide range of printed materials published between the start of the Gutenberg age in the 15th century and the start of the industrial production of books in the middle of the 19th century.

The IMPACT project will remove many of these barriers. The project will push innovation in OCR technology and language technology for historical document processing and retrieval, and share expertise to build capacity in digitisation across Europe. During the project a Centre of Competence will be set up in order to provide a central service entry point for all libraries, archives and museums involved in the digitisation of textual material.

The consortium brings together twenty-six national and regional libraries, research institutions and commercial suppliers who will share their know-how and best practices, develop innovative tools to enhance the capabilities of OCR engines and the accessibility of digitised text and lay down the foundations for the mass-digitisation programmes that will take place over the next decade.


Related Publications

Navigating the Storm: IMPACT, eMOP, and Agile Steering Standards

L.C. Mandell, C. Neudecker, A. Antonacopoulos, E. Grumbach, L. Auvil , M.J. Christie, J.A. Heil, T. Samuelson

Digital Scholarship in the Humanities, 2015.

Details »  Download PDF 


The IMPACT Dataset of Historical Document Images

C. Papadopoulos, S. Pletschacher, C. Clausner, A. Antonacopoulos

Proceedings of the 2013 Workshop on Historical Document Imaging and Processing (HIP2013), Washington DC, USA, August 2013, pp. 123-130

Details »  Download PDF 


A robust hybrid approach for text line segmentation in historical documents

C. Clausner, A. Antonacopoulos, S. Pletschacher

Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012), Tsukuba, Japan, November 11-15, 2012, IEEE-CS Press, pp. 335-338

Details »  Download PDF 


Restoration of Arbitrarily Warped Historical Document Images Using Flow Lines

M. Rahnemoonfar, A. Antonacopoulos

Proceedings of the 11th International Conference on Document Analysis and Recognition (ICDAR2011), Beijing, China, September 2011, pp. 905-909

Details »  Download PDF 


Scenario Driven In-Depth Performance Evaluation of Document Layout Analysis Methods

C. Clausner, S. Pletschacher, A. Antonacopoulos

Proceedings of the 11th International Conference on Document Analysis and Recognition (ICDAR2011), Beijing, China, September 2011, pp. 1404-1408

Details »  Download PDF 


Aletheia - An Advanced Document Layout and Text Ground-Truthing System for Production Environments

C. Clausner, S. Pletschacher, A. Antonacopoulos

Proceedings of the 11th International Conference on Document Analysis and Recognition (ICDAR2011), Beijing, China, September 2011, pp. 48-52

Details »  Download PDF 


Historical Document Layout Analysis Competition

A. Antonacopoulos, C. Clausner, C. Papadopoulos, S. Pletschacher

Proceedings of the 11th International Conference on Document Analysis and Recognition (ICDAR2011), Beijing, China, September 2011, pp. 1516-1520

Details »  Download PDF 


Grid-Based Modelling and Correction of Arbitrarily Warped Historical Document Images for Large-Scale Digitisation

P. Yang, A. Antonacopoulos, C. Clausner, S. Pletschacher

Proceedings of the 2011 Workshop on Historical Document Imaging and Processing (HIP2011), Beijing, China, September 2011, pp. 106-111

Details »  Download PDF 


The PAGE (Page Analysis and Ground-Truth Elements) Format Framework

S. Pletschacher, A. Antonacopoulos

Proceedings of the 20th International Conference on Pattern Recognition (ICPR2010), Istanbul, Turkey, August 23-26, 2010, IEEE-CS Press, pp. 257-260

Details »  Download PDF 


A New Framework for Recognition of Heavily Degraded Characters in Historical Typewritten Documents Based on Semi-Supervised Clustering

S. Pletschacher, J. Hu, A. Antonacopoulos

Proceedings of the 10th International Conference on Document Analysis and Recognition (ICDAR2009), Barcelona, Spain, July 2009, pp. 506-510

Details »  Download PDF 


Word-Based Adaptive OCR for Historical Books

V. Kluzner, A. Tzadok, Y. Shimony, E. Walach, A. Antonacopoulos

Proceedings of the 10th International Conference on Document Analysis and Recognition (ICDAR2009), Barcelona, Spain, July 2009, pp. 501-505

Details »  Download PDF 


A Realistic Dataset for Performance Evaluation of Document Layout Analysis

A. Antonacopoulos, D. Bridson, C. Papadopoulos, S. Pletschacher

Proceedings of the 10th International Conference on Document Analysis and Recognition (ICDAR2009), Barcelona, Spain, July 2009, pp. 296-300

Details »  Download PDF 


ICDAR2009 Page Segmentation Competition

A. Antonacopoulos, S. Pletschacher, D. Bridson, C. Papadopoulos

Proceedings of the 10th International Conference on Document Analysis and Recognition (ICDAR2009), Barcelona, Spain, July 2009, pp. 1370-1374

Details »  Download PDF 


A Geometric Approach for Accurate and Efficient Performance Evaluation of Layout Analysis Methods

D. Bridson, A. Antonacopoulos

Proceedings of the 19th International Conference on Pattern Recognition (ICPR2008), Tampa, Florida, USA, December 7-11, 2008, IEEE-CS Press

Details »