NSE: An effective model for investigating the role of pre-processing using ensembles in sentiment classification

Document Type : Original Manuscript


1 Department of Computer Engineering, Isfahan (Khorasgan) Branch, Islamic Azad University, Isfahan, Iran

2 Senior Lecturer, School of Computing, National University of Singapore, 119613, Singapore


With the extensive Internet applications, review sentiment classification has attracted increasing interest among text mining experts. Traditional bag of words approaches did not indicate multiple relationships connecting words while emphasizing the pre-processing phase and data reduction techniques, making a huge performance difference in classification. This study suggests a model as a different efficient model for multi-class sentiment classification using sampling techniques, feature selection methods, and ensemble supervised classification to increase the performance of text classification. The feature selection phase of our model has applied n-grams, a computational method that optimizes feature selection procedure by extracting features based on the relationships of the words to improve a candidate selection of features. The proposed model classifies the sentiment of tweets and online reviews through ensemble methods, including boosting, bagging, stacking, and voting in conjunction with supervised methods. Besides, two sampling techniques were applied in the pre-processing phase. In the experimental study, a comprehensive range of comparative experiments was conducted to assess the effectiveness of our model using the best existing works in the literature on well-known movie reviews and Twitter datasets. The highest accuracy and f-measure for our model obtained 92.95 and 92.65% on the movie dataset, 90.61 and 87.73% on the Twitter dataset, respectively.


Main Subjects