Medical Free Text Retrieval and Analytics

Medical Free Text Retrieval and Analytics

Our research has developed advanced natural language processing, information retrieval, and machine learning approaches to overcome the problems of 'understanding and reasoning with clinical data'.

 
Medical Free Text Retrieval & Analytics

The majority of health data is recorded in unstructured free-text; clinical examination reports, nursing notes, discharge summaries, death certificates are just some examples. This data contains information that is valuable for secondary use, such as for population health monitoring and reporting. However, its clinical importance and large volume hinders manual analysis of such data. As a consequence, the analysis of clinical data is often performed retrospectively with delays that potentially undermine effective population health monitoring and reporting.

Our research has developed advanced natural language processing, information retrieval, and machine learning approaches to overcome the problems of “understanding and reasoning with clinical data”. In additional, we emphasise the use of standard clinical terminologies grounded in description logic.

We have delivered effective automated health monitoring and reporting solutions, including:

  • the analysis of pathology reports and death certificates to timely assess the incidence of cancer and the associated mortality rates,
  • the analysis of radiology reports to support the reconciliation of radiology findings with emergency department discharge records,
  • the analysis of medical reports to provide capability for medical record searching and analytics,
  • the analysis of medical forums to identify adverse drug reactions

Software

Medtex, RadSearch, CADEminer

Our solutions have been developed in partnership with healthcare practitioners from Cancer Registries, hospital radiology and emergency medicine departments. Working with health industry stakeholders allows Medtex to provide informed decision support by extracting greater value from their clinical narrative reports. For more information, contact Dr. Anthony Nguyen.

Medtex

iaward 2013 winners logo

Reading and processing narrative-based clinical reports is an extremely labour and time-consuming process. To ease the workload of clinical staff and aid the computer processing of these reports, Medtex, a smart clinical natural language processing software, has been developed. The software extracts meaningful information from free text data to aid decision support and take the weight off clinical staff.

ChallengeSolutionScreenshotPublications
An extensive amount of clinical data is still stored as unstructured free-text and the information is often trapped within the language used in these reports. The reports are in the form of unstructured, ungrammatical, and often fragmented free-text.

Clinical information abstraction from patients' clinical data relies on manual inspections and experience-based judgements from clinical staff. The effort required for information abstraction is extremely labour and time intensive, prone to human errors and ineffective.

A simple way of easily and consistently using free-text clinical data can improve both health outcomes for patients, boost the efficiency of the health system and provide a rich data set for further research.

Medtex, a semantic medical text analysis software, is a tool for informing clinical decision making by analysing free-text clinical documents.

Medtex works by “learning” what statements to look for, and uses SNOMED CT, the internationally defined set of clinical terms, to unify and reason with the language across information sources. It incorporates domain knowledge to bridge the gap between natural language and the use of clinical terminology semantics for automatic medical text inference and reasoning.

Analysis engines using the Medtex technology [1] have been developed to:

  • standardise the free text by identifying medical concepts, abbreviations and acronyms, shorthand terms, dimensions and relevant legacy codes;
  • relate key medical concepts, terms and codes using contextual information and report substructure; and
  • use formal semantics to reason with the clinical concepts; inferring complex clinical notions relevant to a health application.

Medtex scales to large amounts of unstructured data and have been integrated within a highly distributed computational framework. It turns the medical narrative into structured data that can be easily stored, queried or rendered by most systems for use in their health application.

Medtex has been used to deliver the following solutions to healthcare practitioners from Cancer Registries, and hospital radiology and emergency medicine departments:

  • the analysis of pathology reports and death certificates to timely assess the incidence of cancer and the associated mortality rates,
  • the analysis of radiology reports to support the reconciliation of radiology findings with emergency department discharge records,
  • the analysis of medical reports to provide capability for medical record searching and analytics

For more information, contact Dr. Anthony Nguyen.

Medtex electronic health record flow chart diagram
  1. Nguyen A, Lawley M, Hansen D, Colquist S. A simple pipeline application for identifying and negating SNOMED clinical terminology in free text. Health Informatics Conference, 2009;188-193.

 
Automated Cancer Notifications

Pathology notification for a Cancer Registry is regarded as the most valid information for the confirmation of a diagnosis of cancer. The development of a clinical decision support system to unlock information from medical free-text can significantly reduce costs arising from manual processes and enable improved decision support, enhancing efficiency and timeliness of cancer information for Cancer Registries.

OverviewScreenshotPublications
Medtex aids Cancer Registry tasks with the notification of cancer reports and the coding of notifications data. The system automatically scans HL7 messages and analyses the free-text reports for terms and concepts relevant to cancer.

Our automated classification of pathology reports that are notifiable cancers is highly effective: sensitivity of 98% and specificity of 96% [1]. The coding of specific cancer notification items such as basis of diagnosis, histological type and grade, primary site and laterality can also be accurately extracted (80% accuracy [2-3]). In the case of lung cancer staging, positive results were achieved after a formal trial on lung cancer cases comparing the stages it assigned with those given by expert pathologists [4-5]. Medtex also allows for detailed tumour stream synoptic and stage reporting [6].

This software has been developed in conjunction with the Queensland Cancer Control Analysis Team, Queensland Health. For more information, contact Dr. Anthony Nguyen.

The Medtex software processes narrative reports and generates structured data to aid clinical staff in abstraction tasks.

The Medtex software processes narrative reports and generates structured data to aid clinical staff in abstraction tasks.

  1. Nguyen A, Moore J, Zuccon G, Lawley M, Colquist S, “Classification of Pathology Reports for Cancer Notifications,” Studies in health technology and informatics, 2012; 150-156.
  2. Nguyen A, Moore J, Lawley M, Hansen D, Colquist S. Automatic Extraction of Cancer Characteristics from Free-Text Pathology Reports for Cancer Notifications. Studies in health technology and informatics, 2011; 117-124.
  3. Nguyen A, Moore , O’Dwyer J, Philpot Assessing the Utility of Automatic Cancer Registry Notifications Data Extraction from Free-Text Pathology Reports. AMIA 2015 Annual Symposium, 2015.
  4. Nguyen A, Lawley M, Hansen D, et al. Symbolic rule-based classification of lung cancer stages from free-text pathology reports. J Am Med Inform Assoc. 2010;17(4):440-445.
  5. McCowan I, Moore D, Nguyen A, Bowman R, Clarke B, Duhig E, et al. Collection of cancer stage data by classifying free-text medical reports. J Am Med Inform Assoc. 2007 Nov/Dec;14(6):736–745.
  6. Nguyen A, Lawley M, Hansen D, Colquist S, “Structured Pathology Reporting for Cancer from Free Text: Lung Cancer Case Study”, eJHI, 7(1): e8, 2012

Death Certificate Coding

Death certificates provide an invaluable source for mortality statistics, which can be used for surveillance and early warnings of increases in disease activity and to support the development and monitoring of prevention or response strategies. However, this value can be realised only if timely, accurate and quantitative data is extracted from death certificates, an aim hampered by both the volume and variable nature of certificates written in natural language.

OverviewScreenshotPublications
In partnership with the Centre for Epidemiology and Evidence within NSW Ministry of Health and the Cancer Institute of NSW, we developed systems to automatically enhance the use of death data to provide up-to-date information on deaths of high public health relevance in the community.

Medtex was adapted and developed to provide a high quality method to automatically code the cause of death (ICD-10) from text information contained in death certificates.

  • Cancer, influenza, pneumonia, diabetes and HIV classified at the disease name level with very high precision (PPV) and recall (sensitivity) from mid-to-high 90% [1-2].
  • ICD-10 cause of death codes classified with high precision and recall (mid-90%) for causes of deaths with high prevalence, and moderate precision and recall (mid-80%) for those with lower prevalence [2].

For more information, contact Dr. Anthony Nguyen.

Death records mortality statistics screenshot
  1. Butt L, Zuccon G, Nguyen A, Bergheim A, Grayson N. Classification of Cancer-related Death Certificates using Machine Learning. AMJ, 6(5):292-300, 2013.
  2. Koopman B, Karimi S, Nguyen A, McGuire R, Muscatello D, Kemp M, Truran D, Zhang Mand Thackway S. Automatic classification of diseases from free-text death certificates for real-time surveillance. BMC Medical Informatics and Decision Making, 15:53, 2015

Checking radiology reports to prevent missed fractures

The checking of X-ray reports to ensure limb fractures are not missed and that patients receive appropriate follow-up once discharged from the Emergency Department (ED) is an essential but laborious task.

OverviewScreenshotPublications
In partnership with the Royal Brisbane and Women’s and Gold Coast Hospital Emergency Medicine Department, we developed a system to reliably identify limb fractures documented in radiology reports and link the fractures identified in the radiology reports with patients’ disposition recorded in Emergency Department Information System to provide decision support to the, currently manual, checking process [1-2].

For more information, contact Dr. Anthony Nguyen.

Text of radiology reports results screen capture
  1. Zuccon, A. Wagholikar, A. Nguyen, L. Butt, K. Chu, J. Greenslade, S. Martin, “Automatic Classification of Free-Text Radiology Reports to Identify Limb Fractures using Machine Learning Algorithms and SNOMED CT” AMIA CRI, 2013
  2. Koopman B, Zuccon G, Wagholikar A, Chu K, O’Dwyer J, Nguyen A, Keijzers G. Automated Reconciliation of Radiology Reports and Discharge Summaries. AMIA 2015 Annual Symposium, 2015.

Medical Record Search & Analytics

Search technologies are critical to enable clinical staff to rapidly and effectively access patient information contained in free-text medical records. Medical search is challenging as it suffers from the semantic gap problem: the mismatch between the raw data and the way a human being interprets it. Valuable domain knowledge explicitly represented in structured knowledge resources such as ontologies (e.g. SNOMED CT) can potentially be leveraged to support such semantic inferences.

OverviewScreenshotPublications
The focus of our research is on medical record searching and analytics using text [1], concepts [2], annotations [3], and SNOMED CT subsumption and relation querying [4-6].

For more information, contact Dr. Anthony Nguyen.

Xray of a bone fracture to lower limb
  1. Koopman B, Nguyen A. RadSearch: A search and analysis engine for free-text radiology reports. Health Informatics Conference, 2015.
  2. Koopman B, Zuccon G, Bruza P, Sitbon L, Lawley M. An Evaluation of Corpus-driven Measures of Medical Concept Similarity for Information Retrieval. CIKM, pg. 2439-2442, 2012.
  3. Metke A. ASE: A Search Engine for Semantically Annotated Documents. SNOMED CT Implementation Showcase 2014
  4. Koopman B, Zuccon G, Nguyen A, Vickers D, Butt L, Bruza P. Exploiting SNOMED CT Concepts & Relationships for Clinical Information Retrieval: AEHRC and QUT at the TREC Medical Track. TREC, 2012.
  5. Zuccon G, Koopman B, Nguyen A, Vickers D, Butt L. Exploiting Medical Hierarchies for Concept-based Information Retrieval. ADCS, pg. 111-114, 2012.
  6. Koopman B. Semantic Search as Inference: Applications in Health Informatics. PhD thesis, Queensland University of Technology, Brisbane, Australia, 2014.

Adverse Drug Reaction Identification in Medical Forums

Adverse Drug Reactions (ADRs), also known as drug side effects, are a major concern for public health, costing health care systems worldwide millions of dollars. Social media has been identified as a source of information that could be used to find signals of potential ADRs to supplement traditional passive approaches from regulatory agencies.

OverviewScreenshotPublications
CSIRO has developed the CADEminer (CSIRO Adverse Drug Event miner) system that mines online forums containing drug ratings and reviews for active surveillance. It uses machine learning techniques to identify mentions of drugs and ADRs in the forums and then uses data mining algorithms to discover signals of potential ADRs that are currently unknown and warrant further review.
adverse drug reataction user interface screen capture
  1. Karimi S; Wang C, Metke A, Gaire R, Paris C, Harvey B. Text and Data Mining Techniques in Adverse Drug Reaction Detection. ACM Computing Surveys. 47(4), 56, 2015
  2. Karimi S, Metke A, Kemp M, Wang C. CADEC: A Corpus of Adverse Drug Event Annotations. Journal of Biomedical Informatics. 55:73–81, 2015.
  3. Metke A, Karimi S, Paris C. Evaluation of Text-Processing Algorithms for Adverse Drug Event Extraction from Social Media. International Workshop on Social Media Retrieval and Analysis. pp. 15-20, 2014.