FFS: A F-DBSCAN Clustering- Based Feature Selection For Classification Data

Authors

1 Department of Computer Engineering, Rouzbahan University, Sari, Iran

2 Faculty of Electrical and Computer Engineering, Babol Noshirvani University of Technology, Babol, Iran

Abstract

Feature selection is an important step in most classification problems to select an optimal subset of features to increase the learning accuracy and reduce the computational time. In this paper we proposed a new feature clustering based method to perform feature selection (FFS) in classification problems. The FFS algorithm works in two steps. In the first step, features are divided into clusters by using F-DBSCAN method. A novel F-DBSCAN clustering method used mutual information for measuring dependencies between features. In the second step, the most representative feature is selected from each cluster by a new criterion function. This allows us to consider the possible dependency on the target class and the redundancy between the selected features in each cluster. The experimental results on different datasets show that the proposed algorithm is more effective for feature selection in classification problems.. Compared with the other methods, the average classification accuracy of C4.5, KNN and Naïve Bayes are improved using FFS by 8.05, 8.36 and 4.63 percent, respectively. Also, the results demonstrate that the FFS algorithm produces small subsets of features with very high classification rate.

Keywords