The Challenge

Death certificates provide an invaluable source for mortality statistics, which can be used for surveillance and early warnings of increases in disease activity and to support the development and monitoring of prevention or response strategies. However, this value can be realised only if timely, accurate and quantitative data is extracted from death certificates, an aim hampered by both the volume and variable nature of certificates written in natural language.

Our Solution

In partnership with the Centre for Epidemiology and Evidence within NSW Ministry of Health and the Cancer Institute of NSW, we developed systems to automatically enhance the use of death data to provide up-to-date information on deaths of high public health relevance in the community.

Medtex was adapted and developed to provide a high quality method to automatically code the cause of death (ICD-10) from text information contained in death certificates.

  • Cancer, influenza, pneumonia, diabetes and HIV classified at the disease name level with very high precision (PPV) and recall (sensitivity) from mid-to-high 90% [1-2].
  • ICD-10 cause of death codes classified with high precision and recall (mid-90%) for causes of deaths with high prevalence, and moderate precision and recall (mid-80%) for those with lower prevalence [2].

For more information, contact Dr. Anthony Nguyen.

Death records mortality statistics screenshot

  1. Butt L, Zuccon G, Nguyen A, Bergheim A, Grayson N. Classification of Cancer-related Death Certificates using Machine Learning. AMJ, 6(5):292-300, 2013.
  2. Koopman B, Karimi S, Nguyen A, McGuire R, Muscatello D, Kemp M, Truran D, Zhang Mand Thackway S. Automatic classification of diseases from free-text death certificates for real-time surveillance. BMC Medical Informatics and Decision Making, 15:53, 2015