Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
65 views
in Technique[技术] by (71.8m points)

javascript - Why does this regex only work with an end-of-string symbol?

Given these strings:

'module1'

'{ module2 }'

'{ module3 as module4 }'

and needing a regular expression to capture the (sub)strings 'module1', 'module2' & module4', this works:

/({ )?(.*as )?(.+?)( })?$/

Which breaks down to:

({ )?       // Optional opening parens
(.*as )?    // Optional 'blah as '
(.+?)       // The important bit
( })?$      // Optional closing parens, EOS

Why does it fail to match if the end of string character $ is omitted?

(Also, I'm aware that the unneeded capture groups can be made into matching groups, but keeping it easier to read...)

question from:https://stackoverflow.com/questions/65861645/why-does-this-regex-only-work-with-an-end-of-string-symbol

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The problem

Lets play the two regex patterns out:

Pattern 1

Regex                        : String
/({ )?(.*as )?(.+?)( })?$/   : module1

1. Checks for "{ "    >> but it's optional and doesn't exist so pass.
2. Checks for ".*as " >> but it's optional and doesn't exist so pass.
3. Checks for ".+?"   >> matches any character 1 or more times until the next item it HSA to match (in a non-greedy manor)
4. Checks for " }"    >> but it's optional and doesn't exist so pass.
5.Checks for "$"      >> #3 has to match to this point otherwise there won't be a match!

Regex 2

Regex                        : String
/({ )?(.*as )?(.+?)( })?/    : module1

1. Checks for "{ "    >> but it's optional and doesn't exist so pass.
2. Checks for ".*as " >> but it's optional and doesn't exist so pass.
3. Checks for ".+?"   >> matches any character 1 or more times until the next item it HAS to match (in a non-greedy manor)
4. Checks for " }"    >> but it's optional and doesn't exist so pass.

The problem is that ".+?" is non-greedy and therefore (as the other terms are all ignored, because they don't exist) it stops matching at the next possible match. i.e. each and every character is a match.

A solution

This is tricky without knowing what the values for "module" may be (i.e. letters, spaces, numbers)...

However something like...

(w+)(?:s})?$
(                : Start of capturing group
 w+             : Matches [a-zA-Z0-9_] one or more times
    )            : End of capturing group
     (?:         : Start of non-capturing group
        s*}     : Matches 0 or more white space characters followed by a "}"
            )?   : End of non-capturing group and make it optional
              $  : Matches the end of the string

...will extract the modules without capturing the surrounding spaces and braces etc.

N.B.

This only works if the module is:

  • The last word in the string
  • One word only
  • Only made up of letters, numbers, and underscores

Example

$strings = [
    'module1',
    '{ module2 }',
    '{ module3 as module4 }'
];

foreach($strings as $string){
    preg_match('/(w+)(?:s*})?$/', $string, $match);
    var_dump($match[1]);
}

/*
Output

string(7) "module1"
string(7) "module2"
string(7) "module4"

*/

Second example

Because I realised this question was asked for JS not PHP!!!!

var strings = [
    'module1',
    '{ module2 }',
    '{ module3 as module4 }'
];
var pattern = /(w+)(?:s*})?$/
for(i = 0; i < strings.length; i++){
    console.log(strings[i].match(pattern)[1]);
}

/*
Output

module1
module2
module4

*/

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...