Project Aim: to develop computer aided clinical information processing and reporting systems to assist with consistent and timely data collection and facilitate comparative analysis, benchmarking, and reporting.

Medical Free Text Processing

Unlocking information from medical free text is important for quality control, decision support, and management and planning. Through the use of CSIRO’s automatic semantic text analysis services, seamless and reliable information extraction can be achieved while substantially reducing the effort for manual abstractions.

2013 Queensland iAwards Winner - Health Category

 The MEDTEX Technology

The semantic medical text analysis (MEDTEX) service is a research software platform developed at the Australian e-Health Research Centre for the development of clinical language engineering analysis engines to support data-driven analytic tasks. MEDTEX incorporates domain knowledge to bridge the gap between natural language and the use of clinical terminology semantics for automatic medical text inference and reasoning.

Analysis engines using the MEDTEX technology have been developed to:

  • standardise the free text by identifying medical concepts, abbreviations and acronyms, shorthand terms, dimensions and relevant legacy codes;
  • relate key medical concepts, terms and codes using contextual information and report substructure; and
  • use formal semantics to reason with the clinical concepts; inferring complex clinical notions relevant to a health application.

The semantic inference and reasoning techniques developed exploit the report substructure and make use of the semantics encoded in SNOMED CT concepts, and thus medical narrative is turned into structured data which can be easily stored, queried or rendered by most systems for use in their health application.

Improving the Manual Process

Currently patient information is gathered by manually scanning and reading reports from information systems to identify key relevant information. An extensive amount of clinical information may be required to be abstracted, and the information is often trapped in the language in these reports which are in the form of unstructured, ungrammatical, and often fragmented free-text. The process mainly relies on manual inspections and experience-based judgements from clinical coders and the effort required for information abstraction is extremely labour and time intensive, prone to human errors and ineffective. By employing a semi-automated system, the reliance on expert clinical staff can be lessened, thus improving the efficiency and availability of health information.

Cancer Notifications and Staging

CSIRO is partnering with Queensland Cancer Control Analysis Team (QCCAT) to provide semantic medical text analysis services for cancer notifications, cancer staging and synoptic reporting. In the case of lung cancer staging, the system performs within the bounds of human staging accuracy as observed in studies of registry data.

Stage Cases Accuracy % (95% CI)
T 718 72 (69–75)
N 718 78 (75–81)
M 718 94 (92–96)

Table: Accuracy of lung cancer staging system with respect to database of pathologic TNM staging decisions.

The cancer notification services also show that the extraction and coding of notifiable items such as basis of diagnosis, histological type and grade, cancer site and laterality can be reliably extracted with an overall accuracy of 80%.

Other Applications

Currently, the semantic medical text analysis services are also used for the identification of patients for advanced prostate cancer clinical trials, as well as the checking of radiology reports to ensure limb fractures are not missed so that patients are suitably followed-up upon Emergency Department discharge.

Last Updated on Thursday, 13 June 2013 13:28

Go to top


Dr Anthony Nguyen

 +61 7 3253 3637
This email address is being protected from spambots. You need JavaScript enabled to view it.