Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
452 views
in Technique[技术] by (71.8m points)

java - Issue with below snippet on boundary matchers regex ()

My input:

 1. end 
 2. end of the day or end of the week 
 3. endline
 4. something 
 5. "something" end

Based on the above discussions, If I try to replace a single string using this snippet, it removes the appropriate words from the line successfully

public class DeleteTest {

    public static void main(String[] args) {

        // TODO Auto-generated method stub
        try {
        File file = new File("C:/Java samples/myfile.txt");
        File temp = File.createTempFile("myfile1", ".txt", file.getParentFile());
        String delete="end";
        BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
        PrintWriter writer = new PrintWriter(new OutputStreamWriter(new FileOutputStream(temp)));

        for (String line; (line = reader.readLine()) != null;) {
            line = line.replaceAll("\b"+delete+"\b", "");
       writer.println(line);
        }
        reader.close();
        writer.close();
        }
        catch (Exception e) {
            System.out.println("Something went Wrong");
        }
    }
}

My output If I use the above snippet:(Also my expected output)

 1.  
 2. of the day or of the week
 3. endline
 4. something
 5. "something"

But when I include more words to delete, and for that purpose when I use Set, I use the below code snippet:

public static void main(String[] args) {

    // TODO Auto-generated method stub
    try {

    File file = new File("C:/Java samples/myfile.txt");
    File temp = File.createTempFile("myfile1", ".txt", file.getParentFile());
    BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
    PrintWriter writer = new PrintWriter(new OutputStreamWriter(new FileOutputStream(temp)));

        Set<String> toDelete = new HashSet<>();
        toDelete.add("end");
        toDelete.add("something");

    for (String line; (line = reader.readLine()) != null;) {
        line = line.replaceAll("\b"+toDelete+"\b", "");
    writer.println(line);
    }
    reader.close();
    writer.close();
    }
    catch (Exception e) {
        System.out.println("Something went Wrong");
    }
}

I get my output as: (It just removes the space)

 1. end
 2. endofthedayorendoftheweek
 3. endline
 4. something
 5. "something" end 

Can u guys help me on this?

Click here to follow the thread

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You need to create an alternation group out of the set with

String.join("|", toDelete)

and use as

line = line.replaceAll("\b(?:"+String.join("|", toDelete)+")\b", "");

The pattern will look like

(?:end|something)

See the regex demo. Here, (?:...) is a non-capturing group that is used to group several alternatives without creating a memory buffer for the capture (you do not need it since you remove the matches).

Or, better, compile the regex before entering the loop:

Pattern pat = Pattern.compile("\b(?:" + String.join("|", toDelete) + ")\b");
...
    line = pat.matcher(line).replaceAll("");

UPDATE:

To allow matching whole "words" that may contain special chars, you need to Pattern.quote those words to escape those special chars, and then you need to use unambiguous word boundaries, (?<!w) instead of the initial to make sure there is no word char before and (?!w) negative lookahead instead of the final to make sure there is no word char after the match.

In Java 8, you may use this code:

Set<String> nToDel = new HashSet<>();
nToDel = toDelete.stream()
    .map(Pattern::quote)
    .collect(Collectors.toCollection(HashSet::new));
String pattern = "(?<!\w)(?:" + String.join("|", nToDel) + ")(?!\w)";

The regex will look like (?<!w)(?:Q+endE|Qsomething-E)(?!w). Note that the symbols between Q and E is parsed as literal symbols.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...