摘要: |
在择机频谱接入系统中,为解决未知信道环境先验知识下的信道选择问题,提出了一种基于多臂赌博机(MAB)模型的改进UCB(Upper Confidence Bound) 索引选择策略。该策略是通过在UCB索引的置信因子中引入收益方差值来调整对未知信道环境的探索过程,以降低探索成本。结合理论证明了本策略有较快的收敛速度,还证明了本策略下的学习后悔值曲线与时隙呈近似对数关系而较缓慢增长。仿真结果表明,与原UCB策略以及贪心算法相比,所提策略更能自适应地选择可用性较好的信道,有效降低学习后悔值并加快其收敛速度,从而提高了系统吞吐量。 |
关键词: 认知无线电 择机频谱接入 信道选择 多臂赌博机模型 UCB索引 |
DOI: |
|
基金项目:国家自然科学基金资助项目(61102062);教育部科学技术研究重点项目(212145);重庆市科委自然科学基金项目(KJ120503);重庆市教委科学技术研究项目(KJ120530) |
|
Channel selection based on multi-armed bandit |
ZHU Jiang,CHEN Hongcui,XIONG Jiahao |
() |
Abstract: |
In the opportunistic spectrum access(OSA) system,in order to solve the problem of channel selection without the priori channel statistic information,a novel channel selection strategy is proposed which applies improved upper confidence bound(UCB) based on multi-armed bandit(MAB). Through adding the revenue variance into the confidence factor of UCB index,the proposed strategy can effectively adjust the exploration process of unknown channel environment and reduce the cost of exploration. It is theoretically proved that the proposed strategy has a faster convergent speed and its learning regret curve with time slot is approximately logarithmic and can bring a slower growing rate. The simulation results show that,compared with UCB index algorithm and greedy algorithm,the proposed strategy can adaptively select the channel with better availability,effectively reduce the learning regret and accelerate the convergent speed,thus improving the system throughput. |
Key words: cognitive radio opportunistic spectrum access channel selection multi-armed bandit model UCB index |