摘要: |
面向比特流的未知通信协议识别技术是现代电子侦察技术的重要组成内容。首先分析了AC(AhoCorasick)快速统计算法及基于位置差的长序列拼接算法,并指出了这两个算法存在的不足。采用数组代替二叉树存储比特流中模式序列的位置信息,并通过构造使得数组下标值与二叉树节点值保持一致的关系式,有效降低了对模式序列进行计数及筛选的时间复杂度。进一步,将数组元素依照对应频繁序列出现位置的先后顺序从左到右进行重新排列,提出了基于位置差的特征序列挖掘改进算法。最后,使用Wireshark截取100个地址解析协议(Address Resolution Protocol,ARP)数据包进行仿真验证。结果表明,与原算法相比,改进算法的时间复杂度至少可降低一个数量级。 |
关键词: 未知通信协议识别 特征序列挖掘 数据帧结构 情报信息 比特流 |
DOI: |
|
基金项目: |
|
A fast algorithm for feature sequence mining based on position difference |
PAN Xiangrong,WANG Ting |
(Southwest China Institute of Electronic Technology,Chengdu 610036,China;National Key Laboratory of Science and Technology on Communications,University of Electronic Science and Technology of China,Chengdu 611730,China) |
Abstract: |
It is an important part of modern electronic reconnaissance technology to identify the binary streamoriented unknown communication protocol.The AhoCorasick(AC) fast statistical algorithm and the long sequence splicing algorithm based on position difference are analyzed,and the shortcomings of which are pointed out.An array instead of the binary tree is used to store the position information of the pattern sequence in the bit stream,and a relational expression is constructed to keep the value of the array subscript value consistent with that of binary tree node,which reduces the time complexity of counting and filtering the pattern sequence effectively.〖JP2〗Furthermore,the array elements are rearranged from left to right according to the order in which the corresponding pattern sequences appear,and an improved algorithm based on position difference is proposed.Finally,Wireshark is used to intercept 100 Address Resolution Protocol(ARP) packets for simulation verification.The results show that the time complexity of the improved algorithm can be reduced by at least an order of magnitude compared with that of the original algorithm. |
Key words: unknown communication protocol identification featuresequence mining data frame structure intelligence information bit stream |