Not registered? - Request an account here

Towards the Extraction of Statistical Information from Digitised Numerical Tables - The Medical Officer of Health Reports Scoping Study

C. Clausner, A. Antonacopoulos, C. Henshaw, J. Hayes

Proceedings of Third International Conference on Digital Access to Textual Cultural Heritage (DATeCH 2019), Brussels, Belgium, 08 - 10 May 2019

Abstract

Numerical data of considerable significance is present in historical documents in tabular form. Due to the challenges involved in the extraction of this data from the scanned documents it is not available to researchers in a useful representation that unlocks the underlying statistical information. This paper sets out to create a better understanding of the problem of extracting and representing statistical information from numerical tables, in order to enable the creation of appropriate technical solutions and also for collection holders to appropriately plan their digitisation projects to better serve their readers. To that effect, after an initial overview of current practices in digitisation and representation of historical numerical data, the authors’ findings are presented from a scoping exercise of the Wellcome Library’s high-profile collection of the Medical Officer of Health reports. In addition to users’ perspectives and a detailed examination of the nature and structure of the data in the reports, a study of the extraction and integration of the data is also described.

Citation

C. Clausner, A. Antonacopoulos, C. Henshaw, J. Hayes , "Towards the Extraction of Statistical Information from Digitised Numerical Tables - The Medical Officer of Health Reports Scoping Study", Proceedings of Third International Conference on Digital Access to Textual Cultural Heritage (DATeCH 2019), Brussels, Belgium, 08 - 10 May 2019

Full Paper

Download PDF