quotation:[Copy]
[Copy]
【Print page】 【Download 【PDF Full text】 View/Add CommentDownload reader Close

←Previous page|Page Next →

Back Issue    Advanced search

This Paper:Browse 2930   Download 3160  
结合词性的短文本相似度算法及其在文本分类中的应用
黄贤英,李沁东,刘英涛
0
(重庆理工大学 计算机科学与工程学院,重庆 400054)
摘要:
针对基于语义的短文本相似度计算方法在短文本分类中准确率较低这一问题,提出了结合词性的短文本相似度算法(GCSSA)。该方法在基于hownet(“知网”)语义的短文本相似度计算方法的基础上,结合类别特征词并添加关键词词性分析,对类别特征词和其他关键词的词性信息给定不同关键词以不同的权值系数,以此区别各种贡献度词项在短文本相似度计算中的重要程度。实验表明,该算法进行文本相似度计算后应用于短文本分类中较基于hownet的短文本分类算法在准确率宏平均和微平均上提升4%左右,有效提高了短文本分类的准确性。
关键词:  短文本分类  短文本相似度  词性  hownet语义  分类准确率
DOI:
基金项目:国家自然科学基金资助项目(11547148);重庆市教委科技计划项目(16SKGH133);重庆市社会科学规划博士项目(2015BS059)
A grammatical category-combined short-text similarity algorithm and its application in text categorization
HUANG Xianying,LI Qindong,LIU Yingtao
()
Abstract:
To address the problem that the categorization accuracy of hownet-based short-text similarity calculation method in short-text is low,a grammatical category-combined short-text similarity algorithm(GCSSA) is proposed.Based on short-text hownet semantic similarity calculation method and combing with categorized features words,this method adds keywords grammatical category analysis,targets at categorized features words and the grammatical category information of keywords,gives different weights for different keywords,so as to differentiate the importance of various items' contribution in the text similarity calculation of short-texts. Experiments show that compared with hownet-based short-text categorization algorithm,the proposed method improves the macro-average and micro-average accuracy by 4% in short-text categorization,and improves the short-text categorization accuracy effectively.
Key words:  short text categorization  short-text similarity  grammatical category  hownet semantic  categorization accuracy