Automated Cancer Stage Classification from Free-text Histology Reports
Authors: I. McCowan, D. Moore, and M. Fry
Date: August 2006
Abstract:
Objectives: This article describes a system to automatically classify the stage of a lung cancer patient based on text analysis of their histology reports. Methods: The system uses machine learning techniques to train a statistical classifier, specifically a support vector machine, for each TNM stage category based on word occurrences in a corpus of histology reports for staged patients. New reports can then be classified according to the most likely stage, facilitating the collection and analysis of population staging data. While the system could in principle be applied to stage different cancer types, the current work focuses on staging lung cancer due to data availability. Results: The article presents initial experiments quantifying system performance on a corpus of reports from more than 1000 lung cancer patients. Results give average sensitivity of 0.72 and specificity of 0.87 for pathologic staging based on histology report text.
© 2006 HISA Ltd.
Download the paper (PDF:55KB)
