Get index of each capture in a JavaScript regex

Question

Welcome To Ask or Share your Answers For Others

Get index of each capture in a JavaScript regex

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

Get index of each capture in a JavaScript regex

I want to match a regex like /(a).(b)(c.)d/ with "aabccde", and get the following information back:

"a" at index = 0
"b" at index = 2
"cc" at index = 3

How can I do this? String.match returns list of matches and index of the start of the complete match, not index of every capture.

Edit: A test case which wouldn't work with plain indexOf

regex: /(a).(.)/
string: "aaa"
expected result: "a" at 0, "a" at 2

Note: The question is similar to Javascript Regex: How to find index of each subexpression?, but I cannot modify the regex to make every subexpression a capturing group.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-17T00:10:24+0000

There is currently a proposal (stage 3) to implement this in native Javascript:

RegExp Match Indices for ECMAScript

ECMAScript RegExp Match Indicies provide additional information about the start and end indices of captured substrings relative to the start of the input string.

...We propose the adoption of an additional indices property on the array result (the substrings array) of RegExp.prototype.exec(). This property would itself be an indices array containing a pair of start and end indices for each captured substring. Any unmatched capture groups would be undefined, similar to their corresponding element in the substrings array. In addition, the indices array would itself have a groups property containing the start and end indices for each named capture group.

Here's an example of how things would work:

const re1 = /a+(?<Z>z)?/d;

// indices are relative to start of the input string:
const s1 = "xaaaz";
const m1 = re1.exec(s1);
m1.indices[0][0] === 1;
m1.indices[0][1] === 5;
s1.slice(...m1.indices[0]) === "aaaz";

m1.indices[1][0] === 4;
m1.indices[1][1] === 5;
s1.slice(...m1.indices[1]) === "z";

m1.indices.groups["Z"][0] === 4;
m1.indices.groups["Z"][1] === 5;
s1.slice(...m1.indices.groups["Z"]) === "z";

// capture groups that are not matched return `undefined`:
const m2 = re1.exec("xaaay");
m2.indices[1] === undefined;
m2.indices.groups["Z"] === undefined;

So, for the code in the question, we could do:

const re = /(a).(b)(c.)d/d;
const str = 'aabccde';
const result = re.exec(str);
// indicies[0], like result[0], describes the indicies of the full match
const matchStart = result.indicies[0][0];
result.forEach((matchedStr, i) => {
  const [startIndex, endIndex] = result.indicies[i];
  console.log(`${matchedStr} from index ${startIndex} to ${endIndex} in the original string`);
  console.log(`From index ${startIndex - matchStart} to ${endIndex - matchStart} relative to the match start
-----`);
});

Output:

aabccd from index 0 to 6 in the original string
From index 0 to 6 relative to the match start
-----
a from index 0 to 1 in the original string
From index 0 to 1 relative to the match start
-----
b from index 2 to 3 in the original string
From index 2 to 3 relative to the match start
-----
cc from index 4 to 6 in the original string
From index 4 to 6 relative to the match start

Keep in mind that the indicies array contains the indicies of the matched groups relative to the start of the string, not relative to the start of the match.

The proposal is currently at stage 3, which indicates that the specification text is complete and everyone in TC39 who needs to approve it has done so - all that remains is for environments to start shipping it so that final tests can be done, and then it will be put into the official standard.

A polyfill is available here.

Categories

Get index of each capture in a JavaScript regex

Get index of each capture in a JavaScript regex

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags