Azure search not behaving as expected for dashes

Question

Welcome To Ask or Share your Answers For Others

Azure search not behaving as expected for dashes

posted Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

Azure search not behaving as expected for dashes

I'm having an issue when using azure search for the following example data set: abc-123-456, abc-123-457, abc-123-458, etc When making the search for abc-123-456, I'd expected to only return one results but instead getting all results containing abc-123-... Is there some setting or way to change this behavior?

Current search settings:

TheSearchIndex.TokenFilters.Add(new EdgeNGramTokenFilter("frontEdgeNGram")
{
    Side = EdgeNGramTokenFilterSide.Front,
    MinGram = 3,
    MaxGram = 20
});

TheSearchIndex.Analyzers.Add(new CustomAnalyzer("FrontEdgeNGram", LexicalTokenizerName.Whitespace)
{
    TokenFilters =
    {
        TokenFilterName.Lowercase,
        new TokenFilterName("frontEdgeNGram"),
        TokenFilterName.Classic,
        TokenFilterName.AsciiFolding
    }
});

SearchOptions UsersSearchOptions = new SearchOptions
{
    QueryType = SearchQueryType.Simple,
    SearchMode = SearchMode.All,
};

Using azure.search.documents ver 11.1.1

Edit: Search with abc-123-456* with the asterisk gives me the one result as expected. How to get this behavior working as default?

Just to add to this..

The portal version is 2020-06-30 The sdk version we use is azure.search.documents ver 11.1.1

abc-123-456 does NOT work as expected
"abc-123-456" does NOT work as expected
"abc-123-456"* does NOT work
"abc-123-456*" does NOT work

If we append an asterisks to the end of the search text and it is not within a phrase .. it works as expected. IE: abc-123-456* works as expected. (abc-123-456* | abc-123-457* ) works as expected.

Why is the asterisks required? How can we make this work within a phrase?

question from:https://stackoverflow.com/questions/65830648/azure-search-not-behaving-as-expected-for-dashes

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-06T19:36:47+0000

This is expected behavior when using the EdgeNGramTokenFilter inside the custom analyzer configuration. The text “abc-123-456” is broken into smaller tokens like “abc”, “abc-1”, “abc-12”, “abc-123”….”abc-123-456”. Check out the Analyzer API for the full list of tokens generated by a particular analyzer.

For a query - abc-123, if the default analyzer is being used, the query terms will be abc and 123 and will match all the documents that contain these terms.

The prefix query on the other hand is not analyzed and looks for documents that contain the prefix as is “abc-123”. A prefix search bypasses full-text search and looks for verbatim matches, which is why the correct result is coming back. Full-text search is over tokens in inverted indexes. Everything else (filters, fuzzy, regex, prefix/wildcard, etc.) is over verbatim strings in a separate unprocessed/internal index.

Another way can be to set only the search analyzer on the field to keyword to avoid breaking the input query.

Categories

Azure search not behaving as expected for dashes

Azure search not behaving as expected for dashes

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags