Cookie Name	Cookie Description	When not logged in	When logged in
prima_cookies	Remembers whether you have already closed this message.	Yes	Yes
prima_notice	Remembers if you have alreaded viewed any notice/warning message(s). Such a message is used to inform users of potential downtime or issues that might affect the normal operation of the website. It is set to expire after the date when such notice is obsolete (eg after an expected downtime/error is fixed).	Yes	Yes
PHPSESSID	The ID of your session.	Yes	Yes
__utma	This is set by Google Analytics. It stores each user's amount of visits, and the time of the first visit, the previous visit, and the current visit.	Yes	Yes
__utmb, __utmc	These are set by Google Analytics. They are used to check approximately how long you stay on a site (when a visit starts, and approximately ends).	Yes	Yes
__utmz	This is set by Google Analytics. It stores where a visitor came from (search engine, search keyword, link).	Yes	Yes

Competitions

We regularly organise contests open to researchers and others. The aim is to compare state-of-the-art document image analysis methods. We also put the submissions in context with leading comercial and open source tools.

For 2019 competitions see:

ICDAR2019 Competition on Recognition of Historical Arabic Scientific Manuscripts - RASM2019
ICDAR2019 Competition on Recognition of Early Indian printed Documents - REID2019
ICDAR2019 Competition on Digitised Magazine Article Segmentation - DMAS2019
ICDAR2019 Competition on Recognition of Documents with Complex Layouts - RDCL2019

RDCL Recognition of Documents with Complex Layouts 2015-2019

Evaluation of page segmentation and region classification methods for documents with complex layouts. The images and ground truth were taken from the PRImA Layout Analysis dataset, containing a wide selection of contemporary documents (with complex as well as simple layouts) together with extensive metadata. Emphasis is placed on magazines (mostly) and technical articles, which are likely to be the focus of digitisation efforts.

RDCL is now a continuous competition. The evaluation is integrated in the Aletheia tool and can be used offline. New results can be submitted at any time and will be published once validated.

RDCL2019 Website » Dataset

RDCL2017 Website » Dataset ICDAR Publication »

RDCL2015 Website » Dataset ICDAR Publication »

RASM Recognition of Historical Arabic Scientific Manuscripts 2018

The British Library’s collection of Arabic manuscripts is internationally recognised as one of the largest and finest in Europe and North America, comprising almost 15,000 works in some 14,000 volumes. Since 2012, the Library, in partnership with The Qatar Foundation and Qatar National Library, has digitised and made freely available over 950,000 images and counting, featuring the cultural and historical heritage of the Gulf and wider region, on Qatar Digital Library (QDL).

Ranging from the early eighth century CE to the nineteenth century, the manuscripts are drawn from both Arab countries and other countries with Arab or Muslim communities including India, China, Indonesia, Malaysia, and West Africa, and they display fascinating variations in style and script.

As part of this project we would like to pose a challenge focussing on finding an optimal solution for accurately and automatically transcribing our vast and growing digital archive of historical Arabic scientific handwritten manuscripts within the QDL. Our aim is to improve accessibility of this rich content by enabling full-text search and discovery, as well as enabling large-scale text analysis.

RASM2019Website »

RASM2018 Website » ICDAR Publication »

REID Recognition of Early Indian printed Documents 2017-2019

The British Library is currently undertaking a ground breaking project, Two Centuries of Indian Print, to digitise and make available as open access 4,000 early printed Indian books (1713-1914) written in Bengali. Complementary material, the Quarterly Lists, consist of catalogue records for all books published in India between 1867 and 1967, will also be made openly available through the project.

As part of this project we would like to pose a challenge to find an optimal solution for accurately and automatically transcribing the Bengali books and Quarterly Lists, to form a unique dataset that can be used with computational tools and methods, and to enable full-text search and discovery.

REID2019 Website » Dataset

REID2017 Website » Dataset ICDAR Publication »

HBR Historical Book Recognition 2013

Historical books represent a large proportion of libraries’ holdings and continue to be the focus of large-scale digitisation projects. A number of distortions frequently manifest themselves in scans of historical books, hindering layout analysis and text recognition. The motivation of the competition is to evaluate existing approaches using a realistic dataset and an objective performance analysis system.

HBR2013 followed the successful running of all previous ICDAR Page Segmentation competitions (2001, 2003, 2005, 2007, 2009 and 2011). The competition expanded the scope to historical books with distortions (the historical documents in the dataset of the ICDAR2011 competition were largely distortion free – in order to better evaluate the segmentation step on its own). Furthermore, the breadth of the competition was increased to cover recognition as well.

HBR2013 Website » Dataset ICDAR Publication »

HNLA Historical Newspaper Layout Analysis 2013

Historical newspapers pose a series of challenges due to the method of their production (inexpensive paper, inconsistent inking, varying layout etc.) as well as the presence of ageing and use artefacts. Newspapers are increasingly the major focus of large-scale digitisation projects (e.g. Europeana Newspapers) as they contain information that is widely interesting to the general public and, at the same time, are rapidly deteriorating in storage. The motivation of the competition is to evaluate existing approaches using a realistic dataset and an objective performance analysis system.

HNLA2013 followed the successful running of all previous ICDAR Page Segmentation competitions (2001, 2003, 2005, 2007, 2009 and 2011). The competition expanded the scope to historical newspapers.

HNLA2013 Website » Dataset ICDAR Publication »

Early Competitions 2001-2011

Biennial ICDAR competitions since 2001, providing snapshots of page recognition methods.

2011 - Historical Document Layout Analysis Competition ICDAR Publication »

2009 - Page Segmentation Competition ICDAR Publication »

2007 - Handwriting Segmentation Contest ICDAR Publication »

2007 - Page Segmentation Competition ICDAR Publication »

2005 - Page Segmentation Competition ICDAR Publication »

2003 - Page Segmentation Competition ICDAR Publication »

2001 - First International Newspaper Page Segmentation Contest ICDAR Publication »