quotation:		[Copy]
		[Copy]

This Paper:Browse 2013 Download 1058	码上扫一扫！
一种识别和检测人工智能生成文本的算法
王雨欣,刘柯飞,李雪莲,王红军
0 Fontlarge +\|Default\|Small
(1.河海大学信息科学与工程学院，江苏常州 213200;2.扬州大学广陵学院，江苏扬州 225000;3.国防科技大学电子对抗学院，合肥 230031)

摘要:

针对目前人工智能(Artificial Intelligence，AI)生成文本的滥用导致的学术不端、侵犯版权、隐私保护和舆情监控等问题，提出了一种基于自然语言处理的AI生成文本的识别和检测算法。该算法首先采用Word2vec方法中的连续词袋模型将文本词转换成词向量，并将词向量累加获得文本向量。随后利用softmax函数获取文本向量的概率分布，通过统计可视化分析AI生成文本的基本规律，并采用余弦相似性来判断文本类型。其次采用支持向量机递归特征消除算法判断文本是否由AI生成，通过K-近邻算法对文本重生成次数进行判断，进一步细化了文本检测的粒度。通过仿真实验验证了算法的有效性，结果显示算法识别准确率达80%及以上。

关键词: AI生成文本检测文本向量余弦相似性支持向量机（SVM） K-近邻(KNN)算法

DOI：10.20079/j.issn.1001-893x.240727001

基金项目:国家自然科学基金面上项目(61971473)

An Artificial Intelligence-generated Text Recognition and Detection Method

WANG Yuxin,LIU Kefei,LI Xuelian,WANG Hongjun

(1.College of Information Science and Engineering,Hohai University,Changzhou 213200,China;2.Guangling College of Yangzhou University,Yangzhou 225000,China;3.College of Electronic Engineering,National University of Defense Technology,Hefei 230031,China)

Abstract:

To address such issues as academic dishonesty,copyright infringement stemming,privacy protection and public opinion monitoring from the misuse of artificial intelligence(AI)-generated texts,an recognition and detection algorithm based on natural language processing(NLP) is proposed.This algorithm initially converts words into vectors using the continuous bag-of-words(CBOW) model within Word2vec,and accumulates them into text vectors.It then applies softmax to address their probability distribution,analyze the fundamental patterns of AI-generated texts with statistical visualization,and determin the type of text by using cosine similarity.Next,a support vector machine recursive feature elimination(SVM-RFE) is used to determine whether the text is generated by AI.For AI-generated texts,the K-nearest neighbor(KNN) algorithm estimates the extent of AI involvement,further refining the granularity of text detection.Finally,simulation experiments show the algorithm搒 effectiveness with recognition accuracy of 80% or above.

Key words: AI-generated text detection text vector cosine similarity support vector machine(SVM) K-nearest neighbor(KNN) algorithm