Australian e-Health Research Centre
Australian e-Health Research Centre Australian e-Health Research Centre

Cancer Stage Interpretation System (CSIS)

Project Objective

The CSIS project objective is to generate an accurate cancer stage from histo-pathology reports, and to use this information to help improve cancer management, both for individual patients and at a population-level. This project builds on research capability in multimedia content analysis within the health domain within the Australian e-Health Research Centre.

Cancer Staging

The "stage" of a cancer is a categorisation of its progression in the body, and describes the extent of the primary tumour and any spreading to local or distant body sites. While staging has a fundamental role in cancer management, due to the expertise and time required and the multi–disciplinary nature of the task, a definitive stage for cancer patients is not always collected. By automating the collation, analysis, summarisation and classification of relevant patient data, the reliance on expert clinical staff can be reduced, improving the efficiency and availability of cancer staging.

The CSIS Software

The CSIS technology uses the same guidelines which are used by clinicians in assigning a stage based on histo-pathology reports. The pathology report is divided into statements (sentences or part of a sentence) while the guidelines are divided into factors. Each statement in the report is then evaluated against each of the guideline factors for relevance and whether it is a positive or negative reference. Machine learning techniques are then used to teach the CSIS engine the set of statements in a pathology report which indicate that the report is describing a particular guideline factor.

In addition, an extract is produced consisting of sentences that were found to contribute to the final staging decision, and their relationship to criteria from the formal staging guidelines for lung cancer.

Clinical Trial: Lung Cancer

In collaboration with the Queensland Cancer Control Analysis Team (QCCAT), the software prototype system was used within a clinical trial context. The CSIS engine was trained on a set of 710 pathology reports describing surgical resections of the lung. The aim was to produce an accurate pathological T and N stage.

The system was then formally trialled in a clinical setting on a previously unseen set of 179 lung cancer cases. The trial compared the automatic stage decisions from CSIS to the stages assigned by two expert pathologists.

The results of the trial showed that the automated stages produced were accurate enough for the purposes of population level research and for indicative staging of pathology reports prior to multi–disciplinary team meetings.

The results of the trial, and a description of the software, has been published in the Journal of American Medical Informatics Association ( I.McCowan, D.Moore, A.Nguyen, R.Bowman, B.Clarke, E.Duhig, M.Fry Collection of Cancer Stage Data by Classifying Free-text Medical Reports Journal of the American Medical Informatics Association; 2007; 14:736-745).

Current Use

The first version of the software has now been installed at Queensland Health in the Queensland Cancer Control and Analysis Team.

Current and Future work

Current research is focusing on the use of clinical terminologies to improve accuracy and the extension of the model to include the staging of Breast, Bowel and Prostate cancers.

Future work will include:

  • Extension to other sorts of medical free text, such as radiology reports (for M staging).
  • Improvements to the accruacy of the staging, in particular through the use of large ontologies to better understand complex medical terms
  • Extension to the development of synoptic reports from free text medical reports.
  • Classifying cancer characteristics other than stage. The techniques used to classify cancer stage may be extended to other tasks, such as filtering of patient data, for example, screening for cancer / non-cancer, or classification of cancer types.