Project Objective
The CSIS project objective is to generate an accurate cancer stage from histo-pathology reports, and to use this information to help improve cancer management, both for individual patients and at a population-level. This project builds on research capability in multimedia content analysis within the health domain within the Australian e-Health Research Centre.
Cancer Staging
The "stage" of a cancer is a categorisation of its progression in the body, and describes the extent of the primary tumour and any spreading to local or distant body sites. While staging has a fundamental role in cancer management, due to the expertise and time required and the multi–disciplinary nature of the task, a definitive stage for cancer patients is not always collected. By automating the collation, analysis, summarisation and classification of relevant patient data, the reliance on expert clinical staff can be reduced, improving the efficiency and availability of cancer staging.
The CSIS Software
The CSIS technology uses the same guidelines which are used by clinicians in assigning a stage based on histo-pathology reports. The pathology report is divided into statements (sentences or part of a sentence) while the guidelines are divided into factors. Each statement in the report is then evaluated against each of the guideline factors for relevance and whether it is a positive or negative reference. Machine learning techniques are then used to teach the CSIS engine the set of statements in a pathology report which indicate that the report is describing a particular guideline factor.
In addition, an extract is produced consisting of sentences that were found to contribute to the final staging decision, and their relationship to criteria from the formal staging guidelines for lung cancer.
Clinical Trial: Lung Cancer
In collaboration with the Queensland Cancer Control Analysis Team (QCCAT), the software prototype system was used within a clinical trial context. The CSIS engine was trained on a set of 710 pathology reports describing surgical resections of the lung. The aim was to produce an accurate pathological T and N stage.
The system was then formally trialled in a clinical setting on a previously unseen set of 179 lung cancer cases. The trial compared the automatic stage decisions from CSIS to the stages assigned by two expert pathologists.
The results of the trial showed that the automated stages produced were accurate enough for the purposes of population level research and for indicative staging of pathology reports prior to multi–disciplinary team meetings.
The results of the trial, and a description of the software, has been published in the Journal of American Medical Informatics Association ( I.McCowan, D.Moore, A.Nguyen, R.Bowman, B.Clarke, E.Duhig, M.Fry Collection of Cancer Stage Data by Classifying Free-text Medical Reports Journal of the American Medical Informatics Association; 2007; 14:736-745).
Current Use
The software has now been installed at Queensland Health in the Queensland Cancer Control and Analysis Team.
Current and Future work
The above work has been extended to include other sorts of medical free text, such as radiology reports (for M staging) as well as automatic population of synoptic reports for lung cancer.
Future research will focus on the use of clinical terminologies to improve accuracy and the extension of the model. This work will be done in the new CIPAR Project.


