I am trying to figure out the correct configuration for my analyzer configuration in my Solr/Lucidworks setup.
The results that I am seeing in Solr analysis seem to indicate that I should be getting matches, but when I do the Solr query (native or in the Lucidworks UI), no results are returned.
The relevant fragments from schema are:
<field name="content" indexed="true" multiValued="false" required="false" stored="true" type="dlowe_text_en"/>
<dynamicField indexed="true" name="*_txt_en_dlowe_split_tight" stored="true" type="dlowe_text_en"/>
<fieldType autoGeneratePhraseQueries="true" class="solr.TextField" name="dlowe_text_en" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
I have indexed some content that contains the string:
Administrator's Guide
Now, when I use the Solr analysis, this is the results that I get:
My understanding is if any the results are highlighted, this represents a match, but when I do the search in Solr on "Administrator" no results are found:
If I search on:
Administrator's
I do get the expected result.
I'm I totally miss understanding of how the analysis tool should work?
What I am trying to achieve is a search index that support a lot of technical items, that will only match on exact values. For example:
- V-123-1231-1231
- WILL_NOT_CHANGE
- /mnt/abc/Drivers/
- 4040:5050
So the WhitespaceTokenizer seems to make the most sense, but I also need stemming on the non-technical strings which would be indicated by periods (.), dashes (-), underlines (_), slashes ( or /), etc.
Any insight / suggestions would be greatly appreciated.
question from:
https://stackoverflow.com/questions/66054578/lucene-tokenization-filters-not-working-as-expected-solr-analysis-confusion 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…