Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
238 views
in Technique[技术] by (71.8m points)

string - Find phrases from a text file in Java

The goal is I am reading in a text file and finding certain words to replace. It will then replace the words and output a new text file that has the words replaced. My code works good for single words, but if I try to replace a phrase with a space, it doesn't work. What I have is a HashMap that contains what I need to search for in the file.

 HashMap<String, Integer> hm = new HashMap<>();

            hm.put("null",0);
            hm.put("max",1);
            hm.put("Do not repeat",2);
            hm.put("names",3);

I then iterate through the HashMap and replace the strings with the word if the file contains it.

                    for (String key : hm.keySet()) {
                        String check = key;
                        System.out.println(check);

                        text = text.toLowerCase(Locale.ROOT).replaceAll(check, "WRONG");
                    }
                    String new = text;

This isn't working if I have a space in the words like for "Do not repeat". How can I get this to work for phrases and not just single words? It completely skips over the phrases and outputs the new file with only the single words replaced.

question from:https://stackoverflow.com/questions/65891451/find-phrases-from-a-text-file-in-java

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

It's not to do with the spaces, it's because of the upper-cased D.

text.toLowerCase(Locale.ROOT)

will make a string containing only lower-cased letters, so "Do not repeat" will not be found in it.

You can make replaceAll case insensitive by passing the appropriate flag:

text = text.replaceAll("(?i)" + check, "WRONG");

Note that you might run into problems with metacharacters in the strings you are searching for. If you might include things with e.g. periods (.), you should quote check:

text = text.replaceAll("(?i)" + Pattern.quote(check), "WRONG");

Also, because you're not considering word boundaries, you might run into the Scunthorpe problem.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...