This program combines the DQN algorithm based on TensorFlow and the CPLEX solver in Matlab.
(该程序结合了基于TensorFlow的DQN算法和Matlab中的CPLEX求解器。)
The purpose is to find an optimal operation control strategy for energy storage based on deep reinforcement learning. (目的是找到基于深度强化学习的能量存储的最佳运行控制策略。)
DQN is used for energy storage systems. (DQN用于能量存储系统。)
CPLEX solves the microgrid planning and scheduling problem (Mixed Integer Linear Programming). (CPLEX解决了微电网计划和调度问题(混合整数线性规划)。)
The main problem now is that the program iterates very quickly at the beginning, but may iterate to EPSIOD = 100
or so, the running speed starts to decrease significantly, so that the final desired result cannot be obtained. (现在的主要问题是程序在一开始就非常快速地进行迭代,但是可能会迭代到EPSIOD = 100
左右,运行速度开始显着降低,因此无法获得最终的期望结果。)
For the DQN algorithm, I directly used a program framework that has been written by others.
(对于DQN算法,我直接使用了别人编写的程序框架。)
The application scenario is the CartPole-v0 game in OpenAI Gym. (应用场景是OpenAI Gym中的CartPole-v0游戏。)
The code URL is: https://github.com/ljpzzz/machinelearning/blob/master/reinforcement-learning/dqn.py (代码URL是: https : //github.com/ljpzzz/machinelearning/blob/master/reinforcement-learning/dqn.py)
In my python program, the matlab.engine
module is used to call the CPLEX solver in matlab to solve the microgrid planning problem.
(在我的python程序中,使用matlab.engine
模块在matlab中调用CPLEX求解器来解决微电网规划问题。)
The parameters returned are used by the DQN algorithm for storage and use. (返回的参数由DQN算法用于存储和使用。)
Each iteration ( for episode in range (EPISODE + 1):
) will advance the STEP
step ( for step in range (1, STEP + 1):
). (每次迭代( for episode in range (EPISODE + 1):
将推进STEP
步骤( for step in range (1, STEP + 1):
。)
Each iteration first calls the Matlab function ( engine.MILP_1
) for initial planning. (每次迭代都首先调用Matlab函数( engine.MILP_1
)进行初始规划。)
And then calls the matlab function for planning adjustment ( engine.Reward_MILP2
) even further, and its specific use position is shown in the figure below. (然后进一步调用matlab函数进行计划调整( engine.Reward_MILP2
),其具体使用位置如下图所示。)
It is worth noting that I previously replaced the CPLEX in this program with Matlab's power flow solver MATPOWER to calculate the 33-node power flow (another project).
(值得注意的是,我之前用Matlab的潮流解决器MATPOWER替换了该程序中的CPLEX,以计算33个节点的潮流(另一个项目)。)
In this case, the entire program can always maintain a relatively ideal speed, and iterate to EPISODE = 20000
times without any problem. (在这种情况下,整个程序始终可以保持相对理想的速度,并且可以毫无问题地迭代到EPISODE = 20000
次。)
Tried method:
(尝试的方法:)
- Adjust the CPLEX program written in MATLAB.
(调整用MATLAB编写的CPLEX程序。)
This part of the program was relatively simple at first, and the solution was fast. (该程序的这一部分最初相对简单,并且解决方案很快。)
- Call the
gc
module, adding gc.disable()
and gc.enable()
before and after the main for-loop. (调用gc
模块,在主for循环之前和之后添加gc.disable()
和gc.enable()
。)
- Added
reset_default_graph()
in the create_Q_network
function, as shown in the figure below. (如下图所示,在create_Q_network
函数中添加了reset_default_graph()
。)
However, the above methods cannot be used separately and in combination.
(但是,上述方法不能单独或组合使用。)
Is there any way to keep this program running at a good speed? (有什么办法可以使该程序保持良好的运行速度?)
Many thanks! (非常感谢!)
Here is my main code :
(这是我的主要代码:)
# Hyper Parameters
EPISODE = 3000 #Episode limitation
STEP = 24 # Step limitation in an episode
TEST = 24 # The number of experiment test every 100 episode
T= 24
def main():
agent = DQN()
P_sc_max = 500
k1 = 11
P_be_max = 500
k2 = 51
P_sc_ini1, P_be_ini1 = engine.CHESS_discrete(P_sc_max, k1, P_be_max, k2, nargout=2)
P_be_ini=P_be_ini1[0]
workbook_test = xlwt.Workbook(encoding='utf-8')
worksheet_test= workbook_test.add_sheet('test_data')
for episode in range(EPISODE+1):
total_reward = 0.0
load_MG_pre,PV,PV_pre,WT,WT_pre,P_wt,P_wt_pre,pric_b,pric_b_pre,pric_s,pric_s_pre,pric_ob,pric_os= ini()
SoC_mg_0 = 0.5
SoC_be_0 = 0.1 * random.randint(3, 7)
state =np.concatenate(([1],[SoC_be_0],[pric_b_pre[0]],[0]))
P_b_station_pre, P_s_station_pre = engine.MILP_1(SoC_be_0,
SoC_mg_0,
matlab.double(load_MG_pre[0:24]),
matlab.double(PV_pre[0:24]),
matlab.double(WT_pre[0:24]),
matlab.double(pric_b_pre[0:24]),
matlab.double(pric_s_pre[0:24]),
nargout=2)
state[3] = P_b_station_pre - P_s_station_pre
SoC_be=SoC_be_0
P_b_station=P_b_station_pre
P_s_station=P_s_station_pre
SoC_mg=SoC_mg_0
gc.disable()
num_step=0
for step in range(1,STEP+1):
action = agent.egreedy_action(state)
P_be=P_be_ini[action]
reward, P_station, SoC_be_new, P_b_station_new, P_s_station_new, SoC_mg_new, done = engine.Reward_MILP2(
P_be, SoC_be,P_b_station,P_s_station,
pric_b[step - 1],pric_s[step - 1],
pric_ob[step - 1],pric_os[step - 1],
SoC_mg,
matlab.double(load_MG_pre[step - 1:step + 23]),
matlab.double(PV_pre[step - 1:step + 23]),
matlab.double(WT_pre[step - 1:step + 23]),
matlab.double(pric_b_pre[step - 1:step + 23]),
matlab.double(pric_s_pre[step - 1:step + 23]),
nargout=7)
SoC_be=SoC_be_new
P_b_station =P_b_station_new
P_s_station =P_s_station_new
P_MG = P_b_station_new - P_s_station_new
SoC_mg = SoC_mg_new
next_state=np.concatenate(([1+step],[SoC_be],[pric_b[step]],[P_MG]))
agent.perceive(state, action, reward, next_state, done)
state = next_state
total_reward += reward
num_step += 1
if done:
break
gc.enable()
if episode % 100 == 0:
ave_reward = total_reward / TEST
worksheet_test.write(0,int(episode / 100), ave_reward)
worksheet_test.write(1, int(episode / 100), num_step)
workbook_test.save('Test_data.xls')
workbook_test.save('Test_data.xls')
x2 = xlrd.open_workbook("Test_data.xls")
sheet2 = x2.sheet_by_name("test_data")
for i in range(0, int(EPISODE / 100) + 1):
print('episode: ', i, 'Evaluation Average Reward:', sheet2.row(0)[i].value)
ask by whysohard translate from so