Not registered? - Request an account here
From Author to Reader: Challenges for the Digital Content Chain, Proceedings of the 9th ICCC International Conference on Electronic Publishing, Leuven-Heverlee, Belgium, June 2005, pp. 35-41
This paper describes a general approach on how digitised documents may be automatically prepared for being stored and processed on various digital platforms. The focus is on documents that are not suitable for optical character recognition (OCR) methods but provide regular structures in the form of text-like blocks. By extracting a document immanent alphabet, preserving the graphical representations by means of vectorisation and based on these steps encoding the original document, it is possible to gather benefits of encoded text without the effort and the possible mistakes that arise from recognition methods. The use of the Extensible Markup Language (XML) for structural descriptions and Scalable Vector Graphics (SVG) for graphical representations enables a seamless integration into style sheet based output workflows for producing system specific layouts.
S. Pletschacher , "OCR Alternatives for Electronic Publishing of Digitised Documents", From Author to Reader: Challenges for the Digital Content Chain, Proceedings of the 9th ICCC International Conference on Electronic Publishing, Leuven-Heverlee, Belgium, June 2005, pp. 35-41