Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.2k views
in Technique[技术] by (71.8m points)

ios - Use regex to match emojis as well as text in string

I am trying to find the range of specific substrings of a string. Each substring begins with a hashtag and can have any character it likes within it (including emojis). Duplicate hashtags should be detected at distinct ranges. A kind user from here suggested this code:

var str = "The range of #hashtag should be different to this #hashtag"
let regex = try NSRegularExpression(pattern: "(#[A-Za-z0-9]*)", options: [])
let matches = regex.matchesInString(str, options:[], range:NSMakeRange(0, str.characters.count))
for match in matches {
    print("match = (match.range)")
}

However, this code does not work for emojis. What would be the regex expression to include emojis? Is there a way to detect a #, followed by any character up until a space/line break?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Similarly as in Swift extract regex matches, you have to pass an NSRange to the match functions, and the returned ranges are NSRanges as well. This can be achieved by converting the given text to an NSString.

The #S+ pattern matches a # followed by one or more non-whitespace characters.

let text = "The ??range of #hashtag?? should ?? be ???? different to this #hashtag??"

let nsText = text as NSString
let regex = try NSRegularExpression(pattern: "#\S+", options: [])
for match in regex.matchesInString(text, options: [], range: NSRange(location: 0, length: nsText.length)) {
    print(match.range)
    print(nsText.substringWithRange(match.range))
}

Output:

(15,10)
#hashtag??
(62,10)
#hashtag??

You can also convert between NSRange and Range<String.Index> using the methods from NSRange to Range<String.Index>.

Remark: As @WiktorStribi?ew correctly noticed, the above pattern will include trailing punctuation (commas, periods, etc). If that is not desired then

let regex = try NSRegularExpression(pattern: "#[^[:punct:][:space:]]+", options: [])

would be an alternative.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...