Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
729 views
in Technique[技术] by (71.8m points)

python - DQN + CPLEX:我的程序随着迭代的进行变得越来越慢(DQN+CPLEX: my program is getting slower with iteration goes)

This program combines the DQN algorithm based on TensorFlow and the CPLEX solver in Matlab.

(该程序结合了基于TensorFlow的DQN算法和Matlab中的CPLEX求解器。)

The purpose is to find an optimal operation control strategy for energy storage based on deep reinforcement learning.

(目的是找到基于深度强化学习的能量存储的最佳运行控制策略。)

DQN is used for energy storage systems.

(DQN用于能量存储系统。)

CPLEX solves the microgrid planning and scheduling problem (Mixed Integer Linear Programming).

(CPLEX解决了微电网计划和调度问题(混合整数线性规划)。)

The main problem now is that the program iterates very quickly at the beginning, but may iterate to EPSIOD = 100 or so, the running speed starts to decrease significantly, so that the final desired result cannot be obtained.

(现在的主要问题是程序在一开始就非常快速地进行迭代,但是可能会迭代到EPSIOD = 100左右,运行速度开始显着降低,因此无法获得最终的期望结果。)

For the DQN algorithm, I directly used a program framework that has been written by others.

(对于DQN算法,我直接使用了别人编写的程序框架。)

The application scenario is the CartPole-v0 game in OpenAI Gym.

(应用场景是OpenAI Gym中的CartPole-v0游戏。)

The code URL is: https://github.com/ljpzzz/machinelearning/blob/master/reinforcement-learning/dqn.py

(代码URL是: https : //github.com/ljpzzz/machinelearning/blob/master/reinforcement-learning/dqn.py)

In my python program, the matlab.engine module is used to call the CPLEX solver in matlab to solve the microgrid planning problem.

(在我的python程序中,使用matlab.engine模块在matlab中调用CPLEX求解器来解决微电网规划问题。)

The parameters returned are used by the DQN algorithm for storage and use.

(返回的参数由DQN算法用于存储和使用。)

Each iteration ( for episode in range (EPISODE + 1): ) will advance the STEP step ( for step in range (1, STEP + 1): ).

(每次迭代( for episode in range (EPISODE + 1):将推进STEP步骤( for step in range (1, STEP + 1): 。)

Each iteration first calls the Matlab function ( engine.MILP_1 ) for initial planning.

(每次迭代都首先调用Matlab函数( engine.MILP_1 )进行初始规划。)

And then calls the matlab function for planning adjustment ( engine.Reward_MILP2 ) even further, and its specific use position is shown in the figure below.

(然后进一步调用matlab函数进行计划调整( engine.Reward_MILP2 ),其具体使用位置如下图所示。)

在此处输入图片说明

在此处输入图片说明

It is worth noting that I previously replaced the CPLEX in this program with Matlab's power flow solver MATPOWER to calculate the 33-node power flow (another project).

(值得注意的是,我之前用Matlab的潮流解决器MATPOWER替换了该程序中的CPLEX,以计算33个节点的潮流(另一个项目)。)

In this case, the entire program can always maintain a relatively ideal speed, and iterate to EPISODE = 20000 times without any problem.

(在这种情况下,整个程序始终可以保持相对理想的速度,并且可以毫无问题地迭代到EPISODE = 20000次。)

Tried method:

(尝试的方法:)

  1. Adjust the CPLEX program written in MATLAB.

    (调整用MATLAB编写的CPLEX程序。)

    This part of the program was relatively simple at first, and the solution was fast.

    (该程序的这一部分最初相对简单,并且解决方案很快。)

  2. Call the gc module, adding gc.disable() and gc.enable() before and after the main for-loop.

    (调用gc模块,在主for循环之前和之后添加gc.disable()gc.enable() 。)

  3. Added reset_default_graph() in the create_Q_network function, as shown in the figure below.

    (如下图所示,在create_Q_network函数中添加了reset_default_graph() 。)

在此处输入图片说明

However, the above methods cannot be used separately and in combination.

(但是,上述方法不能单独或组合使用。)

Is there any way to keep this program running at a good speed?

(有什么办法可以使该程序保持良好的运行速度?)

Many thanks!

(非常感谢!)

Here is my main code :

(这是我的主要代码:)

# Hyper Parameters
EPISODE = 3000 #Episode limitation
STEP = 24  # Step limitation in an episode
TEST = 24  # The number of experiment test every 100 episode
T= 24        

def main():
    agent = DQN()
    P_sc_max = 500
    k1 = 11
    P_be_max = 500
    k2 = 51
    P_sc_ini1, P_be_ini1 = engine.CHESS_discrete(P_sc_max, k1, P_be_max, k2, nargout=2)
    P_be_ini=P_be_ini1[0]

    workbook_test = xlwt.Workbook(encoding='utf-8')

    worksheet_test= workbook_test.add_sheet('test_data')

    for episode in range(EPISODE+1):
        total_reward = 0.0

        load_MG_pre,PV,PV_pre,WT,WT_pre,P_wt,P_wt_pre,pric_b,pric_b_pre,pric_s,pric_s_pre,pric_ob,pric_os= ini()
        SoC_mg_0 = 0.5
        SoC_be_0 = 0.1 * random.randint(3, 7)
        state =np.concatenate(([1],[SoC_be_0],[pric_b_pre[0]],[0]))
        P_b_station_pre, P_s_station_pre = engine.MILP_1(SoC_be_0,
                                                         SoC_mg_0,
                                                         matlab.double(load_MG_pre[0:24]),
                                                         matlab.double(PV_pre[0:24]),
                                                         matlab.double(WT_pre[0:24]),
                                                         matlab.double(pric_b_pre[0:24]),
                                                         matlab.double(pric_s_pre[0:24]),
                                                         nargout=2)
        state[3] = P_b_station_pre - P_s_station_pre

        SoC_be=SoC_be_0
        P_b_station=P_b_station_pre
        P_s_station=P_s_station_pre
        SoC_mg=SoC_mg_0

        gc.disable()

        num_step=0
        for step in range(1,STEP+1):
            action = agent.egreedy_action(state)  
            P_be=P_be_ini[action]

            reward, P_station, SoC_be_new, P_b_station_new, P_s_station_new, SoC_mg_new, done = engine.Reward_MILP2(   
                P_be, SoC_be,P_b_station,P_s_station,
                pric_b[step - 1],pric_s[step - 1],
                pric_ob[step - 1],pric_os[step - 1],
                SoC_mg,
                matlab.double(load_MG_pre[step - 1:step + 23]),
                matlab.double(PV_pre[step - 1:step + 23]),
                matlab.double(WT_pre[step - 1:step + 23]),
                matlab.double(pric_b_pre[step - 1:step + 23]),
                matlab.double(pric_s_pre[step - 1:step + 23]),
                nargout=7)

            SoC_be=SoC_be_new
            P_b_station =P_b_station_new
            P_s_station =P_s_station_new
            P_MG = P_b_station_new - P_s_station_new
            SoC_mg = SoC_mg_new

next_state=np.concatenate(([1+step],[SoC_be],[pric_b[step]],[P_MG]))
            agent.perceive(state, action, reward, next_state, done)
            state = next_state
            total_reward += reward
            num_step += 1
            if done:
                break
        gc.enable()

        if episode % 100 == 0:
            ave_reward = total_reward / TEST
            worksheet_test.write(0,int(episode / 100), ave_reward)
            worksheet_test.write(1, int(episode / 100), num_step)
            workbook_test.save('Test_data.xls')

    workbook_test.save('Test_data.xls')
    x2 = xlrd.open_workbook("Test_data.xls")
    sheet2 = x2.sheet_by_name("test_data")

    for i in range(0, int(EPISODE / 100) + 1):
        print('episode: ', i, 'Evaluation Average Reward:', sheet2.row(0)[i].value)

  ask by whysohard translate from so

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
等待大神答复

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...