Not registered? - Request an account here

PAGE Metadata Scanner

PAGE Metadata Scanner

Download the latest version

Overview

PAGE Metadata Scanner is a command line tool that scans a single PAGE XML file (document page layout and text content) and outputs its properties/statistics as comma-separated values.

Following properties are supported:

  • Metadata (ID, creator, creation time, modification time, width, height)
  • Border and print space (true/false)
  • Content object count (per type and sub-type)
  • Text content statistics (number of characters and white spaces)
  • Language and script (semicolon separated list)
  • Reading order and layers (number of region references)

It is also possible to output statistics on all characters that appear in the text content of a PAGE file.

Access the latest source code

Download the latest version