TY - GEN
T1 - An Evaluation of Machine Learning Models coupled with Powerful Big Data Techniques in the Case of Pancreatic Cancer
AU - Kouremenou, Eleftheria
AU - Manias, George
AU - Syed-Abdul, Shabbir
AU - Kyriazis, Dimosthenis
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - This study proposes an optimized machine learning (ML) methodology and workflow to examine pancreatic cancer factors, taking advantage of real-world data collected from three different hospitals. The overall proposed processing and analysis pipeline incorporates data transformation, cleaning, and mapping techniques such as translating specific values into a common language and calculating average blood result tests per patient. The ML models utilized under the scope of this research work are supervised learning techniques, such as Random Forest, LightGBM, XGBoost, SVM, and Gradient Boosting, by also considering and analyzing various risk factors such as demographic characteristics, drug use, surgeries, organ removal, blood values, and disease history of the patient. The models were evaluated and compared in terms of performance, considering important characteristics such as age, marriage, gender, and pre- existing diseases as risk factors for pancreatic cancer. The results indicate that the utilization of ML models offers a robust and comprehensive solution for pancreatic cancer risk prediction, considering a broad range of variables and risk factors. These models enhance the understanding and identification of the key risk factors associated with the development and progression of this rare type of cancer and can act as powerful tools in the hands of healthcare professionals in the fight against pancreatic cancer.
AB - This study proposes an optimized machine learning (ML) methodology and workflow to examine pancreatic cancer factors, taking advantage of real-world data collected from three different hospitals. The overall proposed processing and analysis pipeline incorporates data transformation, cleaning, and mapping techniques such as translating specific values into a common language and calculating average blood result tests per patient. The ML models utilized under the scope of this research work are supervised learning techniques, such as Random Forest, LightGBM, XGBoost, SVM, and Gradient Boosting, by also considering and analyzing various risk factors such as demographic characteristics, drug use, surgeries, organ removal, blood values, and disease history of the patient. The models were evaluated and compared in terms of performance, considering important characteristics such as age, marriage, gender, and pre- existing diseases as risk factors for pancreatic cancer. The results indicate that the utilization of ML models offers a robust and comprehensive solution for pancreatic cancer risk prediction, considering a broad range of variables and risk factors. These models enhance the understanding and identification of the key risk factors associated with the development and progression of this rare type of cancer and can act as powerful tools in the hands of healthcare professionals in the fight against pancreatic cancer.
KW - big data processing
KW - machine learning
KW - pancreatic cancer risk prediction
KW - riskfactors analysis
UR - http://www.scopus.com/inward/record.url?scp=85186770107&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85186770107&partnerID=8YFLogxK
U2 - 10.1109/ICAMCS59110.2023.00014
DO - 10.1109/ICAMCS59110.2023.00014
M3 - Conference contribution
AN - SCOPUS:85186770107
T3 - Proceedings - 2023 International Conference on Applied Mathematics and Computer Science, ICAMCS 2023
SP - 42
EP - 49
BT - Proceedings - 2023 International Conference on Applied Mathematics and Computer Science, ICAMCS 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 3rd International Conference on Applied Mathematics and Computer Science, ICAMCS 2023
Y2 - 8 August 2023 through 10 August 2023
ER -