Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
287 views
in Technique[技术] by (71.8m points)

Azure search not behaving as expected for dashes

I'm having an issue when using azure search for the following example data set: abc-123-456, abc-123-457, abc-123-458, etc When making the search for abc-123-456, I'd expected to only return one results but instead getting all results containing abc-123-... Is there some setting or way to change this behavior?

Current search settings:

TheSearchIndex.TokenFilters.Add(new EdgeNGramTokenFilter("frontEdgeNGram")
{
    Side = EdgeNGramTokenFilterSide.Front,
    MinGram = 3,
    MaxGram = 20
});

TheSearchIndex.Analyzers.Add(new CustomAnalyzer("FrontEdgeNGram", LexicalTokenizerName.Whitespace)
{
    TokenFilters =
    {
        TokenFilterName.Lowercase,
        new TokenFilterName("frontEdgeNGram"),
        TokenFilterName.Classic,
        TokenFilterName.AsciiFolding
    }
});

SearchOptions UsersSearchOptions = new SearchOptions
{
    QueryType = SearchQueryType.Simple,
    SearchMode = SearchMode.All,
};

Using azure.search.documents ver 11.1.1

Edit: Search with abc-123-456* with the asterisk gives me the one result as expected. How to get this behavior working as default?

Just to add to this..

The portal version is 2020-06-30 The sdk version we use is azure.search.documents ver 11.1.1

  1. abc-123-456 does NOT work as expected
  2. "abc-123-456" does NOT work as expected
  3. "abc-123-456"* does NOT work
  4. "abc-123-456*" does NOT work

If we append an asterisks to the end of the search text and it is not within a phrase .. it works as expected. IE: abc-123-456* works as expected. (abc-123-456* | abc-123-457* ) works as expected.

Why is the asterisks required? How can we make this work within a phrase?

question from:https://stackoverflow.com/questions/65830648/azure-search-not-behaving-as-expected-for-dashes

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

This is expected behavior when using the EdgeNGramTokenFilter inside the custom analyzer configuration. The text “abc-123-456” is broken into smaller tokens like “abc”, “abc-1”, “abc-12”, “abc-123”….”abc-123-456”. Check out the Analyzer API for the full list of tokens generated by a particular analyzer.

For a query - abc-123, if the default analyzer is being used, the query terms will be abc and 123 and will match all the documents that contain these terms.

The prefix query on the other hand is not analyzed and looks for documents that contain the prefix as is “abc-123”. A prefix search bypasses full-text search and looks for verbatim matches, which is why the correct result is coming back. Full-text search is over tokens in inverted indexes. Everything else (filters, fuzzy, regex, prefix/wildcard, etc.) is over verbatim strings in a separate unprocessed/internal index.

Another way can be to set only the search analyzer on the field to keyword to avoid breaking the input query.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...