Not registered? - Request an account here

Handwritten and Machine Printed Text Separation in Document Images using the Bag of Visual Words Paradigm

K. Zagoris, I. Pratikakis, A. Antonacopoulos, B. Gatos, N. Papamarkos

Proceedings of the 13th International Conference on Frontiers in Handwriting Recognition (ICFHR2012), Bari, Italy, September 2012, pp. 103-108

Abstract

In a number of types of documents, ranging from forms to archive documents and books with annotations, machine printed and handwritten text may be present in the same document image, giving rise to significant issues within a digitisation and recognition pipeline. It is therefore necessary to separate the two types of text before applying different recognition methodologies to each. In this paper, a new approach is proposed which strives towards identifying and separating handwritten from machine printed text using the Bag of Visual Words paradigm (BoVW). Initially, blocks of interest are detected in the document image. For each block, a descriptor is calculated based on the BoVW. The final characterization of the blocks as Handwritten, Machine Printed or Noise is made by a Support Vector Machine classifier. The promising performance of the proposed approach is shown by using a consistent evaluation methodology which couples meaningful measures along with a new dataset.

Citation

K. Zagoris, I. Pratikakis, A. Antonacopoulos, B. Gatos, N. Papamarkos , "Handwritten and Machine Printed Text Separation in Document Images using the Bag of Visual Words Paradigm", Proceedings of the 13th International Conference on Frontiers in Handwriting Recognition (ICFHR2012), Bari, Italy, September 2012, pp. 103-108

DOI

10.1109/ICFHR.2012.207

Full Paper

Download PDF