Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
159 views
in Technique[技术] by (71.8m points)

c++ - How many captured groups are supported by pcre2_substitute() function?

I am using pcre2_substitute() function in my c++ project to perform regex replace:

int ret=pcre2_substitute(
  re,                    /*Points to the compiled pattern*/
  subject,               /*Points to the subject string*/
  subject_length,        /*Length of the subject string*/
  0,                     /*Offset in the subject at which to start matching*/
  rplopts,               /*Option bits*/
  0,                     /*Points to a match data block, or is NULL*/
  0,                     /*Points to a match context, or is NULL*/
  replace,               /*Points to the replacement string*/
  replace_length,        /*Length of the replacement string*/
  output,                /*Points to the output buffer*/
  &outlengthptr          /*Points to the length of the output buffer*/
);

This is the man page of the function. It doesn't say how many captured groups are possible. I have tested that $01, ${6}, $12 works, but what is the limit?

I checked if there's a digit limit like the C++ std::regex, but there isn't. $000000000000001 works as $1 while in std::regex it would mean $00 and the rest would be treated as string.

The code I am using for testing is this one. You will need pcre2 library to run this code.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The maximum number of capturing groups is 65,535. And this is also the maximum group number that can be backreferenced in the pattern or in the replacement.

However, generally speaking, a match will probably reach another limit before allowing that big amount of groups: e.g. the maximum length of the subject string, or the number of times match() is called internally (in total, or recursively), though match limits can be increased. For detailed information about match limits, see "The match context" in pcre2api.


From pcre2limits man page

There is no limit to the number of parenthesized subpatterns, but there can be no more than 65,535 capturing subpatterns.

There is, however, a limit to the depth of nesting of parenthesized subpatterns of all kinds. This is imposed in order to limit the amount of system stack used at compile time. The limit can be specified when PCRE2 is built; the default is 250.

and

The maximum number of named subpatterns is 10,000.

By Philip Hazel. Last updated: 25 November 2014. - *As of PCRE2 version 10.20


Size limitations in PCRE and PCRE2

PCRE and PCRE2 have the same limits:

  • All values in repeating quantifiers are limited to 65,535.

  • Unlimited number of parenthesized subpatterns
    (though it's limited to the depth of nesting of parenthesized subpatterns of all kinds).

  • 65,535 capturing subpatterns.

  • 10,000 named subpatterns.

  • The default maximum depth of nested parentheses is 250
    (value of PCRE2_CONFIG_PARENSLIMIT).

  • The maximum length of names for named subpattern is 32 code units.
    A char is represented by 1+ code units (depending on encoding). E.g. in UTF-8 "?" has 2 code units: 0xC3 0x87

  • There is no limit to the number of backward references.

  • The limit to the number of forward references to subsequent subpatterns is around 200,000.

  • Names used in control verbs are limited to 255 (8-bit) and 65,535 (16 or 32-bit).

  • The default value for PCRE2_CONFIG_MATCHLIMIT is 10,000,000 (10m).

  • The default value for PCRE2_CONFIG_RECURSIONLIMIT is 10,000,000 (10m).
    (this limit only applies if it's set smaller than MATCH_LIMIT).

  • The maximum length of a compiled pattern is 64K code units if compiled with the default internal linkage size of 2 (see the pcre2build documentation for details).

  • The maximum length of a subject string is the largest positive number that an integer variable can hold (may be ~1.8E+19). However, the available stack space may limit the size of a subject string that can be processed by certain patterns.
    The maximum length (in code units) of a subject string is one less than the largest number a PCRE2_SIZE variable can hold. PCRE2_SIZE is an unsigned integer type, usually defined as size_t.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...