Not registered? - Request an account here
This command line tool can be used to convert page layout files to the latest PAGE XML format. It supports all previous versions of the PAGE format as well as ALTO XML and FineReader XML. Apart from the conversion, files can also be validated against a set of ground truthing rules and guidelines.
Additional features include:
A basic Java version for conversion only.
A survey of OCR evaluation tools and metrics
In The 6th International Workshop on Historical Document Imaging and Processing (HIP '21). Association for Computing Machinery, New York, NY, USA, 13–18.