• 设为首页
  • 点击收藏
  • 手机版
    手机扫一扫访问
    迪恩网络手机版
  • 关注官方公众号
    微信扫一扫关注
    迪恩网络公众号

JuliaML/Reinforce.jl: Abstractions, algorithms, and utilities for reinforcement ...

原作者: [db:作者] 来自: 网络 收藏 邀请

开源软件名称:

JuliaML/Reinforce.jl

开源软件地址:

https://github.com/JuliaML/Reinforce.jl

开源编程语言:

Julia 100.0%

开源软件介绍:

DEPRECATED

This package is discontinued. Please check ReinforcementLearning.jl, POMDPs.jl or AlphaZero.jl instead.

Reinforce

Build Status Gitter

Reinforce.jl is an interface for Reinforcement Learning. It is intended to connect modular environments, policies, and solvers with a simple interface.


Packages which build on Reinforce:

Environment Interface

New environments are created by subtyping AbstractEnvironment and implementing a few methods:

  • reset!(env) -> env
  • actions(env, s) -> A
  • step!(env, s, a) -> (r, s′)
  • finished(env, s′) -> Bool

and optional overrides:

  • state(env) -> s
  • reward(env) -> r

which map to env.state and env.reward respectively when unset.

  • ismdp(env) -> Bool

An environment may be fully observable (MDP) or partially observable (POMDP). In the case of a partially observable environment, the state s is really an observation o. To maintain consistency, we call everything a state, and assume that an environment is free to maintain additional (unobserved) internal state. The ismdp query returns true when the environment is MDP, and false otherwise.

  • maxsteps(env) -> Int

The terminating condition of an episode is control by maxsteps() || finished(). It's default value is 0, indicates unlimited.


An minimal example for testing purpose is test/foo.jl.

TODO: more details and examples

Policy Interface

Agents/policies are created by subtyping AbstractPolicy and implementing action. The built-in random policy is a short example:

struct RandomPolicy <: AbstractPolicy end
action::RandomPolicy, r, s, A) = rand(A)

Where A is the action space. The action method maps the last reward and current state to the next chosen action: (r, s) -> a.

  • reset!(π::AbstractPolicy) -> π

Episode Iterator

Iterate through episodes using the Episode iterator. A 4-tuple (s,a,r,s′) is returned from each step of the episode:

ep = Episode(env, π)
for (s, a, r, s′) in ep
    # do some custom processing of the sars-tuple
end
R = ep.total_reward
T = ep.niter

There is also a convenience method run_episode. The following is an equivalent method to the last example:

R = run_episode(env, π) do
    # anything you want... this section is called after each step
end

Author: Tom Breloff




鲜花

握手

雷人

路过

鸡蛋
该文章已有0人参与评论

请发表评论

全部评论

专题导读
热门推荐
阅读排行榜

扫描微信二维码

查看手机版网站

随时了解更新最新资讯

139-2527-9053

在线客服(服务时间 9:00~18:00)

在线QQ客服
地址:深圳市南山区西丽大学城创智工业园
电邮:jeky_zhao#qq.com
移动电话:139-2527-9053

Powered by 互联科技 X3.4© 2001-2213 极客世界.|Sitemap