Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
100 views
in Technique[技术] by (71.8m points)

Java - Best way to grab ALL Strings between two Strings? (regex?)

This question has been bugging me for a long time now but essentially I'm looking for the most efficient way to grab all Strings between two Strings.

The way I have been doing it for many months now is through using a bunch of temporary indices, strings, substrings, and it's really messy. (Why does Java not have a native method such as String substring(String start, String end)?

Say I have a String:

abcabc [pattern1]foo[pattern2] abcdefg [pattern1]bar[pattern2] morestuff

The end goal would be to output foo and bar. (And later to be added into a JList)

I've been trying to incorporate regex in .split() but haven't been successful. I've tried syntax using *'s and .'s but I don't think it's quite what my intention is especially since .split() only takes one argument to split against.

Otherwise I think another way is to use the Pattern and Matcher classes? But I'm really fuzzy on the appropriate procedure.

question from:https://stackoverflow.com/questions/11255353/java-best-way-to-grab-all-strings-between-two-strings-regex

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can construct the regex to do this for you:

// pattern1 and pattern2 are String objects
String regexString = Pattern.quote(pattern1) + "(.*?)" + Pattern.quote(pattern2);

This will treat the pattern1 and pattern2 as literal text, and the text in between the patterns is captured in the first capturing group. You can remove Pattern.quote() if you want to use regex, but I don't guarantee anything if you do that.

You can add some customization of how the match should occurs by adding flags to the regexString.

  • If you want Unicode-aware case-insensitive matching, then add (?iu) at the beginning of regexString, or supply Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE flag to Pattern.compile method.
  • If you want to capture the content even if the two delimiting strings appear across lines, then add (?s) before (.*?), i.e. "(?s)(.*?)", or supply Pattern.DOTALL flag to Pattern.compile method.

Then compile the regex, obtain a Matcher object, iterate through the matches and save them into a List (or any Collection, it's up to you).

Pattern pattern = Pattern.compile(regexString);
// text contains the full text that you want to extract data
Matcher matcher = pattern.matcher(text);

while (matcher.find()) {
  String textInBetween = matcher.group(1); // Since (.*?) is capturing group 1
  // You can insert match into a List/Collection here
}

Testing code:

String pattern1 = "hgb";
String pattern2 = "|";
String text = "sdfjsdkhfkjsdf hgb sdjfkhsdkfsdf |sdfjksdhfjksd sdf sdkjfhsdkf | sdkjfh hgb sdkjfdshfks|";

Pattern p = Pattern.compile(Pattern.quote(pattern1) + "(.*?)" + Pattern.quote(pattern2));
Matcher m = p.matcher(text);
while (m.find()) {
  System.out.println(m.group(1));
}

Do note that if you search for the text between foo and bar in this input foo text foo text bar text bar with the method above, you will get one match, which is ?text foo text?.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...