quotation:[Copy]
[Copy]
【Print page】 【Download 【PDF Full text】 View/Add CommentDownload reader Close

←Previous page|Page Next →

Back Issue    Advanced search

This Paper:Browse 1265   Download 769 本文二维码信息
码上扫一扫!
一种识别和检测人工智能生成文本的算法
王雨欣,刘柯飞,李雪莲,王红军
0
(1.河海大学 信息科学与工程学院,江苏 常州 213200;2.扬州大学广陵学院,江苏 扬州 225000;3.国防科技大学 电子对抗学院,合肥 230031)
摘要:
针对目前人工智能(Artificial Intelligence,AI)生成文本的滥用导致的学术不端、侵犯版权、隐私保护和舆情监控等问题,提出了一种基于自然语言处理的AI生成文本的识别和检测算法。该算法首先采用Word2vec方法中的连续词袋模型将文本词转换成词向量,并将词向量累加获得文本向量。随后利用softmax函数获取文本向量的概率分布,通过统计可视化分析AI生成文本的基本规律,并采用余弦相似性来判断文本类型。其次采用支持向量机递归特征消除算法判断文本是否由AI生成,通过K-近邻算法对文本重生成次数进行判断,进一步细化了文本检测的粒度。通过仿真实验验证了算法的有效性,结果显示算法识别准确率达80%及以上。
关键词:  AI生成文本检测  文本向量  余弦相似性  支持向量机(SVM)  K-近邻(KNN)算法
DOI:10.20079/j.issn.1001-893x.240727001
基金项目:国家自然科学基金面上项目(61971473)
An Artificial Intelligence-generated Text Recognition and Detection Method
WANG Yuxin,LIU Kefei,LI Xuelian,WANG Hongjun
(1.College of Information Science and Engineering,Hohai University,Changzhou 213200,China;2.Guangling College of Yangzhou University,Yangzhou 225000,China;3.College of Electronic Engineering,National University of Defense Technology,Hefei 230031,China)
Abstract:
To address such issues as academic dishonesty,copyright infringement stemming,privacy protection and public opinion monitoring from the misuse of artificial intelligence(AI)-generated texts,an recognition and detection algorithm based on natural language processing(NLP) is proposed.This algorithm initially converts words into vectors using the continuous bag-of-words(CBOW) model within Word2vec,and accumulates them into text vectors.It then applies softmax to address their probability distribution,analyze the fundamental patterns of AI-generated texts with statistical visualization,and determin the type of text by using cosine similarity.Next,a support vector machine recursive feature elimination(SVM-RFE) is used to determine whether the text is generated by AI.For AI-generated texts,the K-nearest neighbor(KNN) algorithm estimates the extent of AI involvement,further refining the granularity of text detection.Finally,simulation experiments show the algorithm搒 effectiveness with recognition accuracy of 80% or above.
Key words:  AI-generated text detection  text vector  cosine similarity  support vector machine(SVM)  K-nearest neighbor(KNN) algorithm