好吧,为此,我想最好使用基于步骤更新epsilon的线性退火epsilon-greedy策略:
EXPLORE = 3000000 #how many time steps to play FINAL_EPSILON = 0.001 # final value of epsilon INITIAL_EPSILON = 1.0# # starting value of epsilon if epsilon > FINAL_EPSILON: epsilon -= (INITIAL_EPSILON - FINAL_EPSILON) / EXPLORE