Building Semantic Kernel for Persian Text Classification with a Small Amount of Training Data

Authors

1 Faculty of Computer and Information Technology Engineering, Qazvin Branch, Islamic Azad University, Qazvin, Iran

2 Department of Computer Engineering, West Tehran Branch, Islamic Azad University, Tehran, Iran

Abstract

The original idea of semantic kernels is to use semantic features instead of terms appeared in the text document. In this article, the documents are transformed into a new k-dimensional feature space by applying Singular Value Decomposition on the Term-Document matrix and extracting 𝑘 eigenvectors with higher energy. The suggested semantic kernel causes severe reduction of dimensions which leads to two main conclusions. First, the computational complexity of the classifier is severely reduced. Second, the trained classifier has less sensitivity on the input terms; therefore, it can classify documents effectively. Experiments on Persian documents indicate the absolute superiority of the suggested semantic kernel in comparison to well-known vector space (Bag-of-Words) kernel, especially under the circumstances in which external semantic resources are not available and the amount of available training data is not sufficient

Keywords