摘要: |
随着互联网技术的快速发展,人们能够及时地获取大量的新闻文本信息,如何从新闻中自动获取关键信息,把新闻中具有价值的信息转化为结构化数据,从而快速有效地获取有用的知识已是迫切需求。实体关系抽取是获取关键信息的方法之一,但目前关于中文的实体关系抽取工作较少。针对基于长短时记忆网络的中文实体识别模型难于提取长距离的依存关系特征和句法特征问题,提出利用双向树形长短时记忆神经网络提取依存句法树的结构特征。在提取的特征的基础上,使用条件随机场判断实体的类别和边界,并在实体识别模型中加入注意力机制提高模型的性能。在《人民日报》数据集和ACE 2005语料库上训练模型,验证了模型的有效性。 |
关键词: 新闻文本信息;实体关系抽取;长短时记忆网络 最短依存路径 条件随机场 注意力机制 |
DOI: |
|
基金项目: |
|
Chinese Entity Relationship Extraction for Journalism |
WANG Bo,WANG Kan,WANG Chenggang,LIU Ran,LIU Weipeng,HUANG Huirong |
(1.Beijing Information Technology Research Institution,Beijing 100093,China;Southwest China Institute of Electronic Technology,Chengdu 610036,China;3.School of Cyber Science and Engineering,Northwestern Polytechnical University,Xi′an 710129,China) |
Abstract: |
With the rapid development of Internet technology,people can obtain a large amount of news without delay.It′s important to automatically obtain key information from the news and convert valuable information in the news into structured data.It′s very urgent to quickly and effectively obtain useful knowledge.Entity relationship extraction is one of the methods to obtain key information.However,at present,there is less research on entity relations extraction of Internet news text data.This paper proposes a bidirectional tree long-short-term memory neural network to extract the structural features from dependency syntactic tree.To improve the performance of the model,conditional random field is used to determine the category and boundary of the entity based on the extracted features,and attention mechanism is added.Finally,the model is trained on the People's Daily dataset and the ACE 2005 corpus to verify the validity. |
Key words: news text information entity relationship extraction long-short-term memory neural network shortest dependency path conditional random field attention mechanism |