regex - How does the regular expression ‘(?<=#)[^#]+(?=#)’ work?

Question

Welcome To Ask or Share your Answers For Others

regex - How does the regular expression ‘(?<=#)[^#]+(?=#)’ work?

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

regex - How does the regular expression ‘(?<=#)[^#]+(?=#)’ work?

I have the following regex in a C# program, and have difficulties understanding it:

(?<=#)[^#]+(?=#)

I'll break it down to what I think I understood:

(?<=#)    a group, matching a hash. what's `?<=`?
[^#]+     one or more non-hashes (used to achieve non-greediness)
(?=#)     another group, matching a hash. what's the `?=`?

So the problem I have is the ?<= and ?< part. From reading MSDN, ?<name> is used for naming groups, but in this case the angle bracket is never closed.

I couldn't find ?= in the docs, and searching for it is really difficult, because search engines will mostly ignore those special chars.

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-16T22:24:21+0000

They are called lookarounds; they allow you to assert if a pattern matches or not, without actually making the match. There are 4 basic lookarounds:

Positive lookarounds: see if we CAN match the pattern...
- (?=pattern) - ... to the right of current position (look ahead)
- (?<=pattern) - ... to the left of current position (look behind)
Negative lookarounds - see if we can NOT match the pattern
- (?!pattern) - ... to the right
- (?<!pattern) - ... to the left

As an easy reminder, for a lookaround:

= is positive, ! is negative
< is look behind, otherwise it's look ahead

References

regular-expressions.info/Lookarounds

But why use lookarounds?

One might argue that lookarounds in the pattern above aren't necessary, and #([^#]+)# will do the job just fine (extracting the string captured by 1 to get the non-#).

Not quite. The difference is that since a lookaround doesn't match the #, it can be "used" again by the next attempt to find a match. Simplistically speaking, lookarounds allow "matches" to overlap.

Consider the following input string:

and #one# and #two# and #three#four#

Now, #([a-z]+)# will give the following matches (as seen on rubular.com):

and #one# and #two# and #three#four#
    \___/     \___/     \_____/

Compare this with (?<=#)[a-z]+(?=#), which matches:

and #one# and #two# and #three#four#
     \_/       \_/       \___/ \__/

Unfortunately this can't be demonstrated on rubular.com, since it doesn't support lookbehind. However, it does support lookahead, so we can do something similar with #([a-z]+)(?=#), which matches (as seen on rubular.com):

and #one# and #two# and #three#four#
    \__/      \__/      \____/\___/

References

regular-expressions.info/Flavor Comparison

Categories

regex - How does the regular expression ‘(?<=#)[^#]+(?=#)’ work?

regex - How does the regular expression ‘(?<=#)[^#]+(?=#)’ work?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

References

But why use lookarounds?

References

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags