There appears to be a problem with Regex and the word boundary
matching the beginning of a string with a starting character out of the normal 256 byte range.
Instead of using
, try using (?:^|\s)
var title = "this is simple string with finnish word t?m? on ??kk?stesti ?lk?? ihmetelk?";
// Does not work
var searchterm = "?l";
// does not work
//var searchterm = "??";
// Works
//var searchterm = "wi";
if ( new RegExp("(?:^|\s)"+searchterm, "gi").test(title) ) {
$("#result").html("Match: ("+searchterm+"): "+title);
} else {
$("#result").html("nothing found with term: "+searchterm);
}
Breakdown:
(?:
parenthesis ()
form a capture group in Regex. Parenthesis started with a question mark and colon ?:
form a non-capturing group. They just group the terms together
^
the caret symbol matches the beginning of a string
|
the bar is the "or" operator.
s
matches whitespace (appears as \s
in the string because we have to escape the backslash)
)
closes the group
So instead of using
, which matches word boundaries and doesn't work for unicode characters, we use a non-capturing group which matches the beginning of a string OR whitespace.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…