Unsupervised machine learning of integrated health and social care data from the Macmillan Improving the Cancer Journey service in Glasgow

Kean Lee Kang, Margaret Greer, James Bown, Janice Preston, Judith Mabelis, Leigh-Anne Hepburn, Miriam Fisher, Ruth E. Falconer, Sandra McDermott, Stuart Deed

    Research output: Contribution to journalMeeting Abstractpeer-review

    2 Citations (Scopus)
    145 Downloads (Pure)


    Background: Improving the Cancer Journey (ICJ) was launched in 2014 by Glasgow City Council and Macmillan Cancer Support. As part of routine service, data is collected on ICJ users including demographic and health information, results from holistic needs assessments and quality of life scores as measured by EQ-5D health status. There is also data on the number and type of referrals made and feedback from users on the overall service. By applying artificial intelligence and interactive visualization technologies to this data, we seek to improve service provision and optimize resource allocation.

    Method: An unsupervised machine-learning algorithm was deployed to cluster the data. The classical k-means algorithm was extended with the k-modes technique for categorical data, and the gap heuristic automatically identified the number of clusters. The resulting clusters are used to summarize complex data sets and produce three-dimensional visualizations of the data landscape. Furthermore, the traits of new ICJ clients are predicted by approximately matching their details to the nearest existing cluster center.

    Results: Cross-validation showed the model’s effectiveness over a wide range of traits. For example, the model can predict marital status, employment status and housing type with an accuracy between 2.4 to 4.8 times greater than random selection. One of the most interesting preliminary findings is that area deprivation (measured through Scottish Index of Multiple Deprivation-SIMD) is a better predictor of an ICJ client’s needs than primary diagnosis (cancer type).

    Conclusion: A key strength of this system is its ability to rapidly ingest new data on its own and derive new predictions from those data. This means the model can guide service provision by forecasting demand based on actual or hypothesized data. The aim is to provide intelligent person-centered recommendations. The machine-learning model described here is part of a prototype software tool currently under development for use by the cancer support community.

    Disclosure: Funded by Macmillan Cancer Support

    Original languageEnglish
    Article number4
    Number of pages1
    JournalBritish Journal of Cancer
    Publication statusPublished - 8 Nov 2018
    Event2018 NCRI Cancer Conference - SEC Centre, Glasgow, United Kingdom
    Duration: 4 Nov 20186 Nov 2018


    Dive into the research topics of 'Unsupervised machine learning of integrated health and social care data from the Macmillan Improving the Cancer Journey service in Glasgow'. Together they form a unique fingerprint.

    Cite this