Multimodal Data Fusion for Improved Diagnosis of Hepatocellular Carcinoma

Thursday 23 January 2025


The quest for a more accurate diagnosis of hepatocellular carcinoma (HCC), one of the most common types of liver cancer, has led researchers to explore the potential of machine learning algorithms in combination with multimodal clinical data. A recent study published in a preprint journal presents a baseline model for performing multi-modal data classification on an open dataset of HCC patients.


The study’s authors collected a dataset consisting of both image and tabular data from 100 patients with HCC. The image data includes contrast-enhanced CT and MRI scans, while the tabular data encompasses clinical laboratory test reports and case report forms. The goal is to predict the TNM staging of each patient using these multimodal data sources.


The authors extracted radiomics features from the segmented lesions in both CT and MRI images, which were then combined with preprocessed tabular data. They used mutual information to select the most relevant features for the machine learning model. The XGBoost classifier was chosen as it is one of the most efficient gradient-boosted decision trees (GBDTs) and has been shown to be effective in tabular data classification.


The results show that combining both image and clinical laboratory data leads to higher prediction accuracy than using either modality alone. The addition of radiomics features extracted from CT and MRI images further improves the performance of the model. The authors found that the combination of all three modalities (tabular, image, and radiomics) achieved the highest prediction accuracy.


The study also highlights the importance of feature selection in achieving accurate results. By filtering out features with less than 80% non-missing values, removing columns with identical values across all samples, and handling features with high similarity rates, the authors were able to reduce noise and improve data quality.


In addition to the technical insights, the study underscores the significance of multimodal data fusion in achieving accurate diagnostic results. By combining different types of data sources, researchers can leverage their unique strengths to create a more comprehensive understanding of complex diseases like HCC.


The findings have implications for the development of AI-powered diagnosis systems that can integrate multiple data modalities to improve accuracy and patient outcomes. As the field of medical imaging continues to evolve, such studies will play a crucial role in shaping the future of healthcare.


Cite this article: “Multimodal Data Fusion for Improved Diagnosis of Hepatocellular Carcinoma”, The Science Archive, 2025.


Hepatocellular Carcinoma, Machine Learning, Multimodal Data, Clinical Laboratory Test Reports, Case Report Forms, Radiomics Features, Xgboost Classifier, Gradient-Boosted Decision Trees, Feature Selection, Ai-Powered Diagnosis Systems


Reference: Binwu Wang, Isaac Rodriguez, Leon Breitinger, Fabian Tollens, Timo Itzel, Dennis Grimm, Andrei Sirazitdinov, Matthias Frölich, Stefan Schönberg, Andreas Teufel, et al., “A baseline for machine-learning-based hepatocellular carcinoma diagnosis using multi-modal clinical data” (2025).


Leave a Reply