A Feature Selection Method Based on Graph Theory for Cancer Classification
- Authors: Zhou K.1, Yin Z.2, Gu J.1, Zeng Z.3
-
Affiliations:
- School of Mathematics Physics and Statistics, Shanghai University of Engineering Science
- School of Mathematics, Physics and Statistics,, Shanghai University of Engineering Science
- School of Mathematics Physics and Statistics, Shanghai University of Engineering Science,
- Issue: Vol 27, No 5 (2024)
- Pages: 650-660
- Section: Chemistry
- URL: https://vietnamjournal.ru/1386-2073/article/view/644786
- DOI: https://doi.org/10.2174/1386207326666230413085646
- ID: 644786
Cite item
Full Text
Abstract
Objective:Gene expression profile data is a good data source for people to study tumors, but gene expression data has the characteristics of high dimension and redundancy. Therefore, gene selection is a very important step in microarray data classification.
Method:In this paper, a feature selection method based on the maximum mutual information coefficient and graph theory is proposed. Each feature of gene expression data is treated as a vertex of the graph, and the maximum mutual information coefficient between genes is used to measure the relationship between the vertices to construct an undirected graph, and then the core and coritivity theory is used to determine the feature subset of gene data.
Results:In this work, we used three different classification models and three different evaluation metrics such as accuracy, F1-Score, and AUC to evaluate the classification performance to avoid reliance on any one classifier or evaluation metric. The experimental results on six different types of genetic data show that our proposed algorithm has high accuracy and robustness compared to other advanced feature selection methods.
Conclusion:In this method, the importance and correlation of features are considered at the same time, and the problem of gene selection in microarray data classification is solved.
Keywords
About the authors
Kai Zhou
School of Mathematics Physics and Statistics, Shanghai University of Engineering Science
Email: info@benthamscience.net
Zhixiang Yin
School of Mathematics, Physics and Statistics,, Shanghai University of Engineering Science
Author for correspondence.
Email: info@benthamscience.net
Jiaying Gu
School of Mathematics Physics and Statistics, Shanghai University of Engineering Science
Email: info@benthamscience.net
Zhiliang Zeng
School of Mathematics Physics and Statistics, Shanghai University of Engineering Science,
Email: info@benthamscience.net
References
- Thakur, T.; Batra, I.; Luthra, M.; Vimal, S.; Dhiman, G.; Malik, A.; Shabaz, M. Gene expression-assisted cancer prediction techniques. J. Healthc. Eng., 2021, 2021, 4242646. doi: 10.1155/2021/4242646
- Taguchi, Y-H.; Turki, T. Integrated analysis of tissue-specific gene expression in diabetes by tensor decomposition can identify possible associated diseases. Genes, 2022, 13(6), 1097. doi: 10.3390/genes13061097 PMID: 35741859
- Abdulla, M.; Khasawneh, M.T. G-Forest: An ensemble method for cost-sensitive feature selection in gene expression microarrays. Artif. Intell. Med., 2020, 108, 101941. doi: 10.1016/j.artmed.2020.101941 PMID: 32972668
- Zhang, H. Feature selection using approximate conditional entropy based on fuzzy information granule for gene expression data classification. Front. Genet., 2021, 12, 631505. doi: 10.3389/fgene.2021.631505 PMID: 33859666
- Sun, L.; Zhang, X.; Qian, Y.; Xu, J.; Zhang, S. Feature selection using neighborhood entropy-based uncertainty measures for gene expression data classification. Inf. Sci., 2019, 502, 18-41. doi: 10.1016/j.ins.2019.05.072
- Manikandan, G.; Abirami, S. Feature selection is important: state-of-the-art methods and application domains of feature selection on high-dimensional data. In: Applications in Ubiquitous Computing; Springer, 2021; pp. 177-196. doi: 10.1007/978-3-030-35280-6_9
- Singh, R.K.; Sivabalakrishnan, M. Feature selection of gene expression data for cancer classification: a review. Procedia Comput. Sci., 2015, 50, 52-57. doi: 10.1016/j.procs.2015.04.060
- Ding, C.; Peng, H. Minimum redundancy feature selection from microarray gene expression data. J. Bioinform. Comput. Biol., 2005, 3(2), 185-205. doi: 10.1142/S0219720005001004 PMID: 15852500
- Yu, L.; Liu, H. In Feature selection for high-dimensional data: A fast correlation-based filter solution Proceedings of the 20th international conference on machine learning (ICML-03), Aug 21-24, 2003, DC, United States, 2003, pp. 856-863.
- Huber, W.; von Heydebreck, A.; Sültmann, H.; Poustka, A.; Vingron, M. Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics, 2002, 18(Suppl. 1), S96-S104. doi: 10.1093/bioinformatics/18.suppl_1.S96 PMID: 12169536
- Li, L.; Weinberg, C.R.; Darden, T.A.; Pedersen, L.G. Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method. Bioinformatics, 2001, 17(12), 1131-1142. doi: 10.1093/bioinformatics/17.12.1131 PMID: 11751221
- Chatra, K.; Kuppili, V.; Edla, D.R.; Verma, A.K. Cancer data classification using binary bat optimization and extreme learning machine with a novel fitness function. Med. Biol. Eng. Comput., 2019, 57(12), 2673-2682. doi: 10.1007/s11517-019-02043-5 PMID: 31713709
- Geurts, P.; Fillet, M.; de Seny, D.; Meuwis, M.A.; Malaise, M.; Merville, M.P.; Wehenkel, L. Proteomic mass spectra classification using decision tree based ensemble methods. Bioinformatics, 2005, 21(14), 3138-3145. doi: 10.1093/bioinformatics/bti494 PMID: 15890743
- Ball, G.; Mian, S.; Holding, F.; Allibone, R.O.; Lowe, J.; Ali, S.; Li, G.; McCardle, S.; Ellis, I.O.; Creaser, C.; Rees, R.C. An integrated approach utilizing artificial neural networks and SELDI mass spectrometry for the classification of human tumours and rapid identification of potential biomarkers. Bioinformatics, 2002, 18(3), 395-404. doi: 10.1093/bioinformatics/18.3.395 PMID: 11934738
- Ahmad, S.; Mehfuz, S.; Mebarek-Oudina, F.; Beg, J. J. C. C. RSM analysis based cloud access security broker: A systematic literature review. Cluster Comput., 2022, 25(5), 3733-3763. doi: 10.1007/s10586-022-03598-z
- Myat, T.N.; Mebarek-Oudina, F.; Hlaing, S.S.; Nadeem, A.K. Otsus thresholding technique for MRI image brain tumor segmentation. Multimed. Tools. Appl., 2022, 81(30), 43837-43849. doi: 10.1007/s11042-022-13215-1
- Rostami, M.; Berahmand, K.; Forouzandeh, S. A novel community detection based genetic algorithm for feature selection. J. Big Data, 2021, 8(1), 1-27. doi: 10.1186/s40537-020-00398-3
- Bandyopadhyay, S.; Bhadra, T.; Mitra, P.; Maulik, U. Integration of dense subgraph finding with feature clustering for unsupervised feature selection. Pattern Recognit. Lett., 2014, 40, 104-112. doi: 10.1016/j.patrec.2013.12.008
- Nasarian, E.; Abdar, M.; Fahami, M. A.; Alizadehsani, R.; Hussain, S.; Basiri, M. E.; Zomorodi-Moghadam, M.; Zhou, X.; Pławiak, P.; Acharya, U. Association between work-related features and coronary artery disease: A heterogeneous hybrid feature selection integrated with balancing approach. Pattern Recognit. Lett., 2020, 133, 33-40. doi: 10.1016/j.patrec.2020.02.010
- Lu, H.; Chen, J.; Yan, K.; Jin, Q.; Xue, Y.; Gao, Z. A hybrid feature selection algorithm for gene expression data classification. Neurocomputing, 2017, 256, 56-62. doi: 10.1016/j.neucom.2016.07.080
- Alshamlan, H.; Badr, G.; Alohali, Y. mRMR-ABC: A hybrid gene selection algorithm for cancer classification using microarray gene expression profiling. Biomed. Res. Int., 2015, 2015, 604910. doi: 10.1155/2015/604910
- Alhenawi, E.; Al-Sayyed, R.; Hudaib, A.; Mirjalili, S. Feature selection methods on gene expression microarray data for cancer classification: A systematic review. Comput. Biol. Med., 2022, 140, 105051. doi: 10.1016/j.compbiomed.2021.105051 PMID: 34839186
- Almugren, N.; Alshamlan, H. A survey on hybrid feature selection methods in microarray gene expression data for cancer classification. IEEE access, 2019, 7, 78533-78548. doi: 10.1109/ACCESS.2019.2922987
- Chinnaswamy, A.; Srinivasan, R. Hybrid feature selection using correlation coefficient and particle swarm optimization on microarray gene expression data. In: Innovations in bio-inspired computing and applications; Springer, 2016; pp. 229-239. doi: 10.1007/978-3-319-28031-8_20
- Pragadeesh, C.; Jeyaraj, R.; Siranjeevi, K.; Abishek, R.; Jeyakumar, G. Hybrid feature selection using micro genetic algorithm on microarray gene expression data. J. Intell. Fuzzy Syst., 2019, 36(3), 2241-2246. doi: 10.3233/JIFS-169935
- Singh, P.; Shukla, A.; Vardhan, M. In Hybrid approach for gene selection and classification using filter and genetic algorithm 2017 International Conference on Inventive Computing and Informatics (ICICI), 23-24 Nov, 2017, Coimbatore, India, 2017, pp. 832-837. doi: 10.1109/ICICI.2017.8365253
- Bolón-Canedo, V.; Sánchez-Maroño, N.; Alonso-Betanzos, A.; Benítez, J.M.; Herrera, F. A review of microarray datasets and applied feature selection methods. Inf. Sci., 2014, 282, 111-135. doi: 10.1016/j.ins.2014.05.042
- Bolón-Canedo, V.; Alonso-Betanzos, A. Ensembles for feature selection: A review and future trends. Inf. Fusion, 2019, 52, 1-12. doi: 10.1016/j.inffus.2018.11.008
- Golub, T.R.; Slonim, D.K.; Tamayo, P.; Huard, C.; Gaasenbeek, M.; Mesirov, J.P.; Coller, H.; Loh, M.L.; Downing, J.R.; Caligiuri, M.A.; Bloomfield, C.D.; Lander, E.S. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 1999, 286(5439), 531-537. doi: 10.1126/science.286.5439.531 PMID: 10521349
- Model, F.; Adorján, P.; Olek, A.; Piepenbrock, C. Feature selection for DNA methylation based cancer classification. Bioinformatics, 2001, 17(S1), S157-S164. doi: 10.1093/bioinformatics/17.suppl_1.S157 PMID: 11473005
- Tang, J.; Zhou, S. A new approach for feature selection from microarray data based on mutual information. IEEE/ACM Trans. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 2016, 13(6), 1004-1015. doi: 10.1109/TCBB.2016.2515582 PMID: 26761857
- Hanchuan, P; Fuhui, L.; Ding, C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell., 2005, 27(8), 1226-1238. doi: 10.1109/TPAMI.2005.159 PMID: 16119262
- Kavitha, K.; Prakasan, A.; Dhrishya, P. In Score-based feature selection of gene expression data for cancer classification 2020 Fourth International Conference on Computing Methodologies and Communication (ICCMC), 11-13 March 2020, Erode, India, 2020, pp. 261-266. doi: 10.1109/ICCMC48092.2020.ICCMC-00049
- Rostami, M.; Forouzandeh, S.; Berahmand, K.; Soltani, M.; Shahsavari, M.; Oussalah, M. Gene selection for microarray data classification via multi-objective graph theoretic-based method. Artif. Intell. Med., 2022, 123, 102228. doi: 10.1016/j.artmed.2021.102228 PMID: 34998517
- Ganjei, M.A.; Boostani, R. A hybrid feature selection scheme for high-dimensional data. Eng. Appl. Artif. Intell., 2022, 113, 104894. doi: 10.1016/j.engappai.2022.104894
- Hsu, H.H.; Hsieh, C.W.; Lu, M.D. Hybrid feature selection by combining filters and wrappers. Expert Syst. Appl., 2011, 38(7), 8144-8150. doi: 10.1016/j.eswa.2010.12.156
- Salem, H.; Attiya, G.; El-Fishawy, N. Classification of human cancer diseases by gene expression profiles. Appl. Soft. Comput., 2017, 50, 124-134. doi: 10.1016/j.asoc.2016.11.026
- Wang, Y.; Gao, X.; Ru, X.; Sun, P.; Wang, J. A hybrid feature selection algorithm and its application in bioinformatics. PeerJ Comput. Sci., 2022, 8, e933. doi: 10.7717/peerj-cs.933 PMID: 35494789
- Djellali, H.; Zine, N.G.; Azizi, N. Two stages feature selection based on filter ranking methods and SVMRFE on medical applications. In: Modelling and implementation of complex systems; Springer, 2016; pp. 281-293. doi: 10.1007/978-3-319-33410-3_20
- Sadeghian, Z.; Akbari, E.; Nematzadeh, H. A hybrid feature selection method based on information theory and binary butterfly optimization algorithm. Eng. Appl. Artif. Intell., 2021, 97, 104079. doi: 10.1016/j.engappai.2020.104079
- Liu, J.B.; Zhang, T.; Wang, Y.; Lin, W. The Kirchhoff index and spanning trees of Möbius/cylinder octagonal chain. Discrete Appl. Math., 2022, 307, 22-31. doi: 10.1016/j.dam.2021.10.004
- Liu, J.B.; Bao, Y.; Zheng, W.T. Analyses of some structural properties on a class of hierarchical scale-free networks. arXiv:2203.12361, 2022.
- Goswami, S.; Das, A.K.; Guha, P.; Tarafdar, A.; Chakraborty, S.; Chakrabarti, A.; Chakraborty, B. An approach of feature selection using graph-theoretic heuristic and hill climbing. Pattern Anal. Appl., 2019, 22(2), 615-631. doi: 10.1007/s10044-017-0668-x
- Henni, K.; Mezghani, N.; Gouin-Vallerand, C. Unsupervised graph-based feature selection via subspace and pagerank centrality. Expert Syst. Appl., 2018, 114, 46-53. doi: 10.1016/j.eswa.2018.07.029
- Hashemi, A.; Dowlatshahi, M.B.; Nezamabadi-pour, H. MGFS: A multi-label graph-based feature selection algorithm via PageRank centrality. Expert Syst. Appl., 2020, 142, 113024. doi: 10.1016/j.eswa.2019.113024
- Roffo, G.; Melzi, S.; Castellani, U.; Vinciarelli, A.; Cristani, M. Infinite feature selection: a graph-based feature filtering approach. IEEE Trans. Pattern Anal. Mach. Intell., 2021, 43(12), 4396-4410. doi: 10.1109/TPAMI.2020.3002843 PMID: 32750789
- Das, A.K.; Kumar, S.; Jain, S.; Goswami, S.; Chakrabarti, A.; Chakraborty, B. An information-theoretic graph-based approach for feature selection. Sadhana, 2020, 45(1), 11. doi: 10.1007/s12046-019-1238-2
- Jin, X. On system core and coritivity (I). J. Syst. Sci. Math. Sci., 1993, 13(2), 102.
- Reshef, D.N.; Reshef, Y.A.; Finucane, H.K.; Grossman, S.R.; McVean, G.; Turnbaugh, P.J.; Lander, E.S.; Mitzenmacher, M.; Sabeti, P.C. Detecting novel associations in large data sets. Science, 2011, 334(6062), 1518-1524. doi: 10.1126/science.1205438 PMID: 22174245
- Akoglu, H. Users guide to correlation coefficients. Turk. J. Emerg. Med., 2018, 18(3), 91-93. doi: 10.1016/j.tjem.2018.08.001 PMID: 30191186
- Zhou, H.; Wang, X.; Zhu, R. Feature selection based on mutual information with correlation coefficient. Appl. Intell., 2022, 52(5), 5457-5474. doi: 10.1007/s10489-021-02524-x
- Lin, G.; Lin, A.; Gu, D. Using support vector regression and Knearest neighbors for short-term traffic flow prediction based on maximal information coefficient. Inf. Sci., 2022, 608, 517-531. doi: 10.1016/j.ins.2022.06.090
- Yao, L.; Shen, H.; Laird, P.W.; Farnham, P.J.; Berman, B.P. Inferring regulatory element landscapes and transcription factor networks from cancer methylomes. Genome Biol., 2015, 16(1), 105. doi: 10.1186/s13059-015-0668-3 PMID: 25994056
- Ge, R.; Zhou, M.; Luo, Y.; Meng, Q.; Mai, G.; Ma, D.; Wang, G.; Zhou, F. McTwo: a two-step feature selection algorithm based on maximal information coefficient. BMC Bioinformatics, 2016, 17(1), 142. doi: 10.1186/s12859-016-0990-0 PMID: 27006077
- Wang, Y.; Li, X.; Ruiz, R. Feature selection with maximal relevance and minimal supervised redundancy. IEEE Trans. Cybern., 2022, 53(2), 707-717. PMID: 35130179
- Bennasar, M.; Hicks, Y.; Setchi, R. Feature selection using joint mutual information maximisation. Expert Syst. Appl., 2015, 42(22), 8520-8532. doi: 10.1016/j.eswa.2015.07.007
Supplementary files
