Not registered? - Request an account here

New open source releases


PRImA released two more tools as open source on GitHub.

The Page Viewer is a stand-alone application for viewing page layout and text content of segmentation ground truth and results of page recognition/OCR systems. The natively supported file format is PAGE XML. However, ALTO XML, FineReader XML, and HOCR can be opened as well. It is available for Windows, Linux and MacOS and is now available as open source on GitHub.

The PAGE Metadata Scanner is a Java command line tool that scans a single PAGE XML file (document page layout and text content) and outputs its properties/statistics as comma-separated values. Is is also available open source on GitHub.