ADABOOST ENSEMBLE ALGORITHMS FOR BREAST CANCER CLASSIFICATION

Document Type: Original Manuscript

Authors

1 Computer Science Department, Federal University Wukari, P.M.B 1020, Katsina-Ala Road, Wukari, Taraba State, Nigeria

2 Department of Physical Sciences, Computer Science Programme, Al-Hikmah University, P.M.B 1601, Adewole Housing Estate, Ilorin, Kwara State, Nigeria

3 Department of Computer Science, University of Ilorin, P.M.B. 1515, Ilorin-Nigeria

Abstract

With an advance in technologies, different tumor features have been collected for Breast Cancer (BC) diagnosis, processing of dealing with large data set suffers some challenges which include high storage capacity and time require for accessing and processing. The objective of this paper is to classify BC based on the extracted tumor features. To extract useful information and diagnose the tumor, an Adaboost ensemble Model is developed. In this research work, both homogeneous and heterogeneous ensemble classifiers (combine two different classifiers together) were implemented, and Synthetic Minority Over-Sampling Technique (SMOTE) data mining pre-processing is used to deal with the class imbalance problem and noise in the dataset. In this paper, the proposed method is of two steps. The first step employs SMOTE to reduce the effect of data imbalance in the dataset. The second step involves classifying using decision algorithms (ADTree, CART, REPTree and Random Forest), Naïve Bayes and their Ensembles. The experiment was implemented on WEKA Explore (Weka 3.6). Experimental results shows that Adaboost-Random Forest classify better than other classification algorithms with 82.52% accuracy, follow by Adaboost-REPTree and Adaboost-CART with 77.62% accuracy while Adaboost-Naïve Bayes classifications is the lowest with 35.66% accuracy.

Keywords

Main Subjects