andrewcooke/ParserCombinator.jl: A parser combinator library for Julia

原作者: [db:作者] 来自: 网络收藏邀请

开源软件名称：

andrewcooke/ParserCombinator.jl

开源软件地址：

https://github.com/andrewcooke/ParserCombinator.jl

开源编程语言：

Julia 100.0%

开源软件介绍：

ParserCombinator

Example
Install
Manual
Parsers
Design
Releases

A parser combinator library for Julia, similar to those in other languages, like Haskell's Parsec or Python's pyparsing. It can parse any iterable type (not just strings) (except for regexp matchers, of course).

ParserCombinator's main advantage is its flexible design, which separates the matchers from the evaluation strategy. This makes it easy to "plug in" memoization, or debug traces, or to restrict backtracking in a similar way to Parsec - all while using the same grammar.

It also contains pre-built parsers for Graph Modelling Language and DOT.

Example

using ParserCombinator


# the AST nodes we will construct, with evaluation via calc()

abstract type Node end
Base.:(==)(n1::Node, n2::Node) = n1.val == n2.val
calc(n::Float64) = n
struct Inv<:Node val end
calc(i::Inv) = 1.0/calc(i.val)
struct Prd<:Node val end
calc(p::Prd) = Base.prod(map(calc, p.val))
struct Neg<:Node val end
calc(n::Neg) = -calc(n.val)
struct Sum<:Node val end
calc(s::Sum) = Base.sum(map(calc, s.val))


# the grammar (the combinators!)

sum = Delayed()
val = E"(" + sum + E")" | PFloat64()

neg = Delayed()       # allow multiple (or no) negations (eg ---3)
neg.matcher = val | (E"-" + neg > Neg)

mul = E"*" + neg
div = E"/" + neg > Inv
prd = neg + (mul | div)[0:end] |> Prd

add = E"+" + prd
sub = E"-" + prd > Neg
sum.matcher = prd + (add | sub)[0:end] |> Sum

all = sum + Eos()


# and test

# this prints 2.5
calc(parse_one("1+2*3/4", all)[1])

# this prints [Sum([Prd([1.0]),Prd([2.0])])]
parse_one("1+2", all)

Some explanation of the above:

I used rather a lot of "syntactic sugar". You can use a more verbose, "parser combinator" style if you prefer. For example, Seq(...) instead of +, or App(...) instead of >.
The matcher E"xyz" matches and then discards the string "xyz".
Every matcher returns a list of matched values. This can be an empty list if the match succeeded but matched nothing.
The operator + matches the expressions to either side and appends the resulting lists. Similarly, | matches one of two alternatives.
The operator |> calls the function to the right, passing in the results from the matchers on the left.
> is similar to |> but interpolates the arguments (ie uses ...). So instead of passing a list of values, it calls the function with multiple arguments.
Delayed() lets you define a loop in the grammar.
The syntax [0:end] is a greedy repeat of the matcher to the left. An alternative would be Star(...), while [3:4] would match only 3 or 4 values.

And it supports packrat parsing too (more exactly, it can memoize results to avoid repeating matches).

Still, for large parsing tasks (eg parsing source code for a compiler) it would probably be better to use a wrapper around an external parser generator, like Anltr.

Note: There's an issue with the Compat library which means the code above (the assignment to Delayed.matcher) doesn't work with 0.3. See calc.jl for the uglier, hopefully temporary, 0.3 version.

Install

julia> Pkg.add("ParserCombinator")

Evaluation

Once you have a grammar (see below) you can evaluate it against some input in various ways:

parse_one() - a simple, recursive decent parser with backtracking, but no memoization. Returns a single result or throws a ParserException.
parse_all() - a packrat parser, with memoization, that returns an iterator (evaluated lazily) over all possible parses of the input.
parse_lines() - a parser in which the source is parsed line by line. Pre-4.0.0 Julia copies strings that are passed to regex, so this reduces memory use when using regular expressions.
parse_try() - similar to Haskell's Parsec, with backtracking only inside the Try() matcher. More info here.
parse_dbg() - as parse_one(), but also prints a trace of evaluation for all of the matchers that are children of a Trace() matchers. Can also be used with other matchers via the keword delegate; for example parse_dbg(...; delegate=Cache) will provide tracing of the packrat parser (parse_all() above). More info here.

These are all implemented by providing different Config subtypes. For more information see Design, types.jl and parsers.jl.

Basic Matchers

In what follows, remember that the power of parser combinators comes from how you combine these. They can all be nested, refer to each other, etc etc.

Equality

julia> parse_one("abc", Equal("ab"))
1-element Array{Any,1}:
 "ab"

julia> parse_one("abc", Equal("abx"))
ERROR: ParserCombinator.ParserException("cannot parse")

This is so common that there's a corresponding string literal (it's "e" for `Equal(), the corresponding matcher).

julia> parse_one("abc", e"ab")
1-element Array{Any,1}:
 "ab"

Sequences

Matchers return lists of values. Multiple matchers can return lists of lists, or the results can be "flattened" a level (usually more useful):

julia> parse_one("abc", Series(Equal("a"), Equal("b")))
2-element Array{Any,1}:
 "a"
 "b"

julia> parse_one("abc", Series(Equal("a"), Equal("b"); flatten=false))
2-element Array{Any,1}:
 Any["a"]
 Any["b"]

julia> parse_one("abc", Seq(Equal("a"), Equal("b")))
2-element Array{Any,1}:
 "a"
 "b"

julia> parse_one("abc", And(Equal("a"), Equal("b")))
2-element Array{Any,1}:
 Any["a"]
 Any["b"]

julia> parse_one("abc", e"a" + e"b")
2-element Array{Any,1}:
 "a"
 "b"

julia> parse_one("abc", e"a" & e"b")
2-element Array{Any,1}:
 Any["a"]
 Any["b"]

Where Series() is implemented as Seq() or And(), depending on the value of flatten (default true).

Warning - The sugared syntax has to follow standard operator precedence, where | binds more tightly that +. This means that

   matcher1 + matcher2 | matcher3

is almost always an error because it means:

   matcher1 + (matcher2 | matcher3)

while what was intended was:

   (matcher1 + matcher2) | matcher3

Empty Values

Often, you want to match something but then discard it. An empty (or discarded) value is an empty list. This may help explain why I said flattening lists was useful above.

julia> parse_one("abc", And(Drop(Equal("a")), Equal("b")))
2-element Array{Any,1}:
 Any[]
 Any["b"]

julia> parse_one("abc", Seq(Drop(Equal("a")), Equal("b")))
1-element Array{Any,1}:
 "b"

julia> parse_one("abc", ~e"a" + e"b")
1-element Array{Any,1}:
 "b"

julia> parse_one("abc", E"a" + e"b")
1-element Array{Any,1}:
 "b"

Note the ~ (tilde / home directory) and capital E in the last two examples, respectively.

Alternates

julia> parse_one("abc", Alt(e"x", e"a"))
1-element Array{Any,1}:
 "a"

julia> parse_one("abc", e"x" | e"a")
1-element Array{Any,1}:
 "a"

Warning - The sugared syntax has to follow standard operator precedence, where | binds more tightly that +. This means that

   matcher1 + matcher2 | matcher3

is almost always an error because it means:

   matcher1 + (matcher2 | matcher3)

while what was intended was:

   (matcher1 + matcher2) | matcher3

Regular Expressions

julia> parse_one("abc", Pattern(r".b."))
1-element Array{Any,1}:
 "abc"

julia> parse_one("abc", p".b.")
1-element Array{Any,1}:
 "abc"

julia> parse_one("abc", P"." + p"b.")
1-element Array{Any,1}:
 "bc"

As with equality, a capital prefix to the string literal ("p" for "pattern" by the way) implies that the value is dropped.

Note that regular expresions do not backtrack. A typical, greedy, regular expression will match as much of the input as possible, every time that it is used. Backtracking only exists within the library matchers (which can duplicate regular expression functionality, when needed).

Repetition

julia> parse_one("abc", Repeat(p"."))
3-element Array{Any,1}:
 "a"
 "b"
 "c"

julia> parse_one("abc", Repeat(p".", 2))
2-element Array{Any,1}:
 "a"
 "b"

julia> collect(parse_all("abc", Repeat(p".", 2, 3)))
2-element Array{Any,1}:
 Any["a","b","c"]
 Any["a","b"]

julia> parse_one("abc", Repeat(p".", 2; flatten=false))
2-element Array{Any,1}:
 Any["a"]
 Any["b"]

julia> collect(parse_all("abc", Repeat(p".", 0, 3)))
4-element Array{Any,1}:
 Any["a","b","c"]
 Any["a","b"]
 Any["a"]
 Any[]

julia> collect(parse_all("abc", Repeat(p".", 0, 3; greedy=false)))
4-element Array{Any,1}:
 Any[]
 Any["a"]
 Any["a","b"]
 Any["a","b","c"]

You can also use Depth() and Breadth() for greedy and non-greedy repeats directly (but Repeat() is more readable, I think).

The sugared version looks like this:

julia> parse_one("abc", p"."[1:2])
2-element Array{Any,1}:
 "a"
 "b"

julia> parse_one("abc", p"."[1:2,:?])
1-element Array{Any,1}:
 "a"

julia> parse_one("abc", p"."[1:2,:&])
2-element Array{Any,1}:
 Any["a"]
 Any["b"]

julia> parse_one("abc", p"."[1:2,:&,:?])
1-element Array{Any,1}:
 Any["a"]

Where the :? symbol is equivalent to greedy=false and :& to flatten=false (compare with + and & above).

There are also some well-known special cases:

julia> collect(parse_all("abc", Plus(p".")))
3-element Array{Any,1}:
 Any["a","b","c"]
 Any["a","b"]
 Any["a"]

julia> collect(parse_all("abc", Star(p".")))
4-element Array{Any,1}:
 Any["a","b","c"]
 Any["a","b"]
 Any["a"]
 Any[]

Full Match

To ensure that all the input is matched, add Eos() to the end of the grammar.

julia> parse_one("abc", Equal("abc") + Eos())
1-element Array{Any,1}:
 "abc"

julia> parse_one("abc", Equal("ab") + Eos())
ERROR: ParserCombinator.ParserException("cannot parse")

Transforms

Use App() or > to pass the current results to a function (or datatype constructor) as individual values.

julia> parse_one("abc", App(Star(p"."), tuple))
1-element Array{Any,1}:
 ("a","b","c")

julia> parse_one("abc", Star(p".") > string)
1-element Array{Any,1}:
 "abc"

The action of Appl() and |> is similar, but everything is passed as a single argument (a list).

julia> type Node children end

julia> parse_one("abc", Appl(Star(p"."), Node))
1-element Array{Any,1}:
 Node(Any["a","b","c"])

julia> parse_one("abc", Star(p".") |> x -> map(uppercase, x))
3-element Array{Any,1}:
 "A"
 "B"
 "C"

Lookahead And Negation

Sometimes you can't write a clean grammar that just consumes data: you need to check ahead to avoid something. Or you need to check ahead to make sure something works a certain way.

julia> parse_one("12c", Lookahead(p"\d") + PInt() + Dot())
2-element Array{Any,1}:
 12
   'c'

julia> parse_one("12c", Not(Lookahead(p"[a-z]")) + PInt() + Dot())
2-element Array{Any,1}:
 12
   'c'

More generally, Not() replaces any match with failure, and failure with an empty match (ie the empty list).

Other

Backtracking

By default, matchers will backtrack as necessary.

In some (unusual) cases, it is useful to disable backtracking. For example, see PCRE's "possessive" matching. This can be done here on a case-by-case basis by adding backtrack=false to Repeat(), Alternatives() and Series(), or by appending ! to the matchers that those functions generate: Depth!, Breadth!, Alt!, Seq! and And!.

For example,

collect(parse_all("123abc", Seq!(p"\d"[0:end], p"[a-z]"[

鲜花

握手

雷人

路过

鸡蛋

该文章已有0人参与评论

请发表评论

全部评论

专题导读

More+

10-27 六六分期app的软件客服如何联系？(六六分期

11-06 可心卡盟:win10系统火狐flash插件崩溃怎么

11-06 亲亲特价:怎么删除回收站图标

11-06 济南大学虚拟社区:鲁大师节能降温的具体办

11-06 xlueops.exe:无线网络安装向导

11-06 女斗合众国:win7系统cf与主机连接不稳定怎

11-06 0xc000022-[cf烟雾头]cf怎么调烟雾头

11-06 qizideyouhuo:应用程序无法正常启动0xc0000

11-06 ipz-185:win7系统vcf文件怎么打开

11-06 傻哥蹦迪:win10系统s4怎么打开usb调试

11-06 八神浩树gtaste:回收站清空了怎么恢复

11-06 妖尾之黑色守护:win10系统电脑没有1440x900

11-06 校园至尊魔王小说:win7系统浏览网页时字体

11-06 女斗合众国:win10系统访问共享文件夹提示请

11-06 tokyo hot n0654:恢复win7系统默认字体一招

11-06 雨酷仙境:设置win7系统转移临时文件夹腾出

11-06 阿穆纳伊之杖:win7系统开始菜单在右边还原

11-06 tunespotting:win10系统火狐flash插件总是

11-06 甘尔葛分析师：计谋网站seo关键词暴涨有什

11-06 蔡贵霖: 计谋网站seo关键词暴涨有什么秘密

11-06 博益网首页:ao3网页版进入不了解决方法

11-06 漏斗子专栏: 网站数据分析小白易懂精华篇

11-06 见证双虹怎么做:win7系统开启telnet命令的

11-06 颾狐蝶蜋:系统资源不足无法完成请求的服务

11-06 国光中学校歌:提交网站到alexa查询详细步骤

11-06 西安有情天:静态网页和动态网页的区别

11-06 红木雅尚斋:外部链接构造对网站的好处

11-06 前官礼遇：防止域名劫持–增强域安全性的10

11-06 密传二转答案: 中文分词算法有哪些

11-06 金泉家园邮编:百度快照劫持的表现及应对方

sdwfrost/Gillespie.jl: Stochastic Gillespie-type simulations using Julia发布时间：2022-07-09

rdeits/RegionTrees.jl: Quadtrees, Octrees, and more in Julia发布时间：2022-07-09

剪的笔顺,诠释剪的笔画,认识剪的部首

1 六六分期app的软件客服如何联系？(六六分期

六六分期app的软件客服如何联系？不知道吗？加qq群【895510560】即可！标题：六六分期

阅读：18341|2023-10-27

2 可心卡盟:win10系统火狐flash插件崩溃怎么

今天小编告诉大家如何处理win10系统火狐flash插件总是崩溃的问题，可能很多用户都不知

阅读：9706|2022-11-06

3 亲亲特价:怎么删除回收站图标

今天小编告诉大家如何对win10系统删除桌面回收站图标进行设置，可能很多用户都不知道

阅读：8195|2022-11-06

4 济南大学虚拟社区:鲁大师节能降温的具体办

今天小编告诉大家如何对win10系统电脑设置节能降温的设置方法，想必大家都遇到过需要

阅读：8563|2022-11-06

5 xlueops.exe:无线网络安装向导

我们在使用xp系统的过程中,经常需要对xp系统无线网络安装向导设置进行设置，可能很多

阅读：8474|2022-11-06

6 女斗合众国:win7系统cf与主机连接不稳定怎

今天小编告诉大家如何处理win7系统玩cf老是与主机连接不稳定的问题，可能很多用户都不

阅读：9417|2022-11-06

7 0xc000022-[cf烟雾头]cf怎么调烟雾头

电脑对日常生活的重要性小编就不多说了，可是一旦碰到win7系统设置cf烟雾头的问题，很

阅读：8447|2022-11-06

8 qizideyouhuo:应用程序无法正常启动0xc0000

我们在日常使用电脑的时候，有的小伙伴们可能在打开应用的时候会遇见提示应用程序无法

阅读：7878|2022-11-06

9 ipz-185:win7系统vcf文件怎么打开

今天小编告诉大家如何对win7系统打开vcf文件进行设置，可能很多用户都不知道怎么对win

阅读：8430|2022-11-06

10 傻哥蹦迪:win10系统s4怎么打开usb调试

今天小编告诉大家如何对win10系统s4开启USB调试模式进行设置，可能很多用户都不知道怎

阅读：7405|2022-11-06

客服电话

电子邮件

andrewcooke/ParserCombinator.jl: A parser combinator library for Julia

开源软件名称：

开源软件地址：

开源编程语言：

开源软件介绍：

ParserCombinator

Example

Install

Manual

Evaluation

Basic Matchers

Equality

Sequences

Empty Values

Alternates

Regular Expressions

Repetition

Full Match

Transforms

Lookahead And Negation

Other

Backtracking

请发表评论

全部评论

上一篇：

下一篇：

CVE-2022-2211

naumari/UI-Library: Fat Ge UI-Library

librespeed/speedtest: Self-hosted Speedt

avehtari/BDA_m_demos: Bayesian Data Anal

四维彩超怎么看性别？四维看男孩女孩诀窍

剪的笔顺,诠释剪的笔画,认识剪的部首

六六分期app的软件客服如何联系？(六六分期

florent37/ViewAnimator: A fluent Android

florent37/Shrine-MaterialDesign2: implem

CVE-2020-36276

SimpleSoftwareIO/simple-sms: Send and re

关于我们

产品与服务

解决方案

139-2527-9053