Not registered? - Request an account here
This dataset contains contains scans of index cards from the UK's Natural History Museum lepidoptera index [1][2]. The text is typewritten with handwritten annotations. The associated ground truth is in PAGE format. This dataset has been created and used in the work reported in the following paper:
K. Zagoris, I. Pratikakis, A. Antonacopoulos, B. Gatos and N. Papamarkos, "Handwritten and Machine Printed Text Separation in Document Images using the Bag of Visual Words Paradigm", 13th International Conference on Frontiers in Handwriting Recognition, Bari, Italy, September 2012, pp. 103-108.
References
[1] Beccaloni, G. W., Scoble, M. J., Robinson, G. S., Downton, A. C. & Lucas, S. M. 2003. Chapter 10: Computerising unit-level data in natural history card archives. In: Scoble, M. J. (Ed.). ENHSIN: The European Natural History Specimen Information Network. London: The Natural History Museum. 176pp.
[2] http://www.nhm.ac.uk/research-curation/research/projects/lepindex/aboutproject.html
Handwritten and Machine Printed Text Separation in Document Images using the Bag of Visual Words Paradigm
Proceedings of the 13th International Conference on Frontiers in Handwriting Recognition (ICFHR2012), Bari, Italy, September 2012, pp. 103-108