Application Study of Machine Learning in Bioinformatics for Disease Marker Identification

Journal: Journal of Clinical Medicine Research DOI: 10.32629/jcmr.v6i3.4431

Shengyuan Zhang

Cornell University, Ithaca, New York, 14853, United States

Abstract

This study explores the application of machine learning to disease marker identification, in particular, the evaluation of the performance of different algorithms on cancer datasets using simulation experiments. Data quality is ensured by data preprocessing, normalization and dimensionality reduction of gene expression data from publicly available databases. During the experiments, various models such as support vector machines, decision trees, and random forests were used and evaluated for accuracy, precision, recall and F1-score. The results of the study revealed that Random Forest showed the best performance in all the evaluation criteria, with an accuracy of 95%, precision of 94%, recall of 96%, and F1-score of 0.95. The feature selection analysis revealed that TP53, BRCA1, and EGFR genes are the potential markers, which are closely related to cancer. The results of the study proved that machine learning methods, especially the random forest algorithm, have great application value in disease marker identification.

Keywords

machine learning, cancer markers, support vector machines, random forests

References

[1] XIN Rui Hao, WANG Sweet, LI Ying Rui, et al. Research on breast cancer staging marker detection method based on machine learning[J]. Modern Information Technology,2021,5(22):95-97.
[2] WANG Jianmei, ZHU Goli, CAO Chenlin, et al. Construction of a diagnostic prediction model for anti-neutrophil cytoplasmic antibody-associated vasculitis with glomerulonephritis based on machine learning algorithm[J]. Journal of Clinical Nephrology,2025,25(02):89-97.
[3] Cui J.W.,Yang L.,Shi S.Q.,et al. Identification of key genes in Yang-deficient ankylosing spondylitis based on bioinformatics and machine learning[J]. Asia-Pacific Traditional Medicine,2025,21(02):138-145.
[4] Umadevi K, Sundeep D . Predictive analysis of breast cancer metastasis and identification of genetic markers using machine learning[J]. Clinical Oncology, 2024, 42(23_suppl):4-4.
[5] Hu XN, Luo LL, Zhang XX, et al. Application of machine learning in biomarker mining of HIV-combined malignant tumors[J/OL]. Chinese Journal of Dermatology and Venereology,1-10[2025-03-08].

Copyright © 2025 Shengyuan Zhang

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License