摘要: |
文本分类是信息检索和文本挖掘的关键技术之一。提出了一种基于支持向量数据描述(SVDD)的多类文本分类算法,用支持向量描述训练求得包围各类样本的最小超球体,并使得分类间隔最大化,在测试阶段,引入基于核空间k-近邻平均距离的判别准则,判断样本所属类别。实验结果表明,该方法具有很好的泛化能力和很好的时间性能。 |
关键词: 信息检索 文本挖掘 文本分类 支持向量数据描述 多类分类器 |
DOI: |
|
基金项目: |
|
A multi-class text categorization algorithm based on maximal classification margin SVDD |
LUO Qi |
() |
Abstract: |
Text categorization is one of the key technology to retrieve information and mine text.This paper proposes a multi-class text categorization algorithm based on maximal classification margin SVDD(Support Vector Data Description).This algorithm trains multi-class samples with support vector data description,then computes a minimal super spherical structure which can surround all samples and has maximal margin between each class. In the phase of testing,this algorithm classifies samples with a criterion of average distance based on KNN(K-Nearest Neighbor). The test result shows this algorithm has good generalization capability and good time efficiency of text categorization. |
Key words: information retrieving text mining text categorization support vector data description(SVD) multi-class classifier |