You have selected to download . Please accept the following terms and conditions to proceed.
PAGE Metadata Scanner
Download the latest version
Overview
PAGE Metadata Scanner is a command line tool that scans a single PAGE XML file (document page layout and text content) and outputs its properties/statistics as comma-separated values.
Following properties are supported:
- Metadata (ID, creator, creation time, modification time, width, height)
- Border and print space (true/false)
- Content object count (per type and sub-type)
- Text content statistics (number of characters and white spaces)
- Language and script (semicolon separated list)
- Reading order and layers (number of region references)
It is also possible to output statistics on all characters that appear in the text content of a PAGE file.
Access the latest source code
Download the latest version
Alternative download
Related Publications
A survey of OCR evaluation tools and metrics
C. Neudecker, K. Baierer, C. Clausner, A. Antonacopoulos, S. Pletschacher
In The 6th International Workshop on Historical Document Imaging and Processing (HIP '21). Association for Computing Machinery, New York, NY, USA, 13–18.
Details » Download PDF