quotation:[Copy]
[Copy]
【Print page】 【Download 【PDF Full text】 View/Add CommentDownload reader Close

←Previous page|Page Next →

Back Issue    Advanced search

This Paper:Browse 416   Download 176  
基于生成式LLM的开源情报分析方法
成磊峰,罗吉,王磊,朱敏,陶思彤
0
(1.四川大学 计算机学院,成都 610065;2.西南电子技术研究所,成都 610036;3.中电信数智科技有限公司,北京 100001)
摘要:
针对开源情报分析中网页信息提取问答问题,提出一种融合生成式大语言模型(Large Language Model,LM)、XPath与检索增强生成(Retrieval-Augmented Generation,RAG)的方法,涉及动态模板化提示策略与多粒度语义检索。动态模板基于情报类型生成领域知识约束提示,提升实体提取精度;多粒度检索构建文档-段落-实体三级体系,结合BERT-Top玨算法优化长文本信息定位。通过OpenKG知识库对齐实体构建属性-关系-事件三维网络,增强复杂事件逻辑分析。该方法在ClueWeb22与TAC-KBP2022数据集上的提取率为0.85,回答准确率为0.78,相比传统RAG,性能提升18%~31%。实际应用中,热点事件简报关键事实准确率达92%,综合成本仅为GPT-4的12%。
关键词:  开源情报分析  网页信息提取;生成式大语言模型;检索增强生成
DOI:10.20079/j.issn.1001-893x.250204001
基金项目:
An Open Source Intelligence Analysis Method Based on Generative LLM
CHENG Leifeng,LUO Ji,WANG Lei,ZHU Min,TAO Sitong
(1.College of Computer Science,Sichuan University,Chengdu 610065,China;2.Southwest China Institute of Electronic Technology,Chengdu 610036,China;3.China Telecom Digital Intelligence Technology Co.,Ltd.,Beijing 100001,China)
Abstract:
The authors propose a method integrating generative large language models(LLMs),XPath,and retrieval-augmented generation(RAG) for web page information extraction in open-source intelligence analysis.Key innovations include a dynamic templated prompting strategy and multi-granularity semantic retrieval.The dynamic templates generate domain-constrained prompts based on intelligence types(events/persons/organizations),enhancing entity extraction accuracy.The multi-granular retrieval establishes a document-paragraph-entity hierarchy optimized by the BERT-Top玨 algorithm for fragmented long-text information.By aligning entities with OpenKG,a three-dimensional attribute-relation-event network is constructed to strengthen complex event analysis.
Key words:  open source intelligence analysis  web information extraction  generative large language model  retrieval-augmented generation