Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
630 views
in Technique[技术] by (71.8m points)

lucene - Azure Cognitive Search, how to configure analyzer to support "startsWith"?

I have a field in Azure Cognitive Search that has special characters in it.
they look like this: some_id: 'SOME*STUFF*123'
I'm trying to have a "startsWith" query, but that doesnt return anything as soon as the regex tries to match anything that goes farther than the * After a bit google I found out its the Analyzer, possibly breaking apart strings at '*'
So I changed the Analyzer to "keyword", as I read multiple times its the Analyzer you are supposed to use for this.

the new config looks like this:

{
 "name": "some_id",
 "type": "Edm.String",
 "facetable": false,
 "filterable": true,
 "key": false,
 "retrievable": true,
 "searchable": true,
 "sortable": true,
 "analyzer": "keyword",
 "indexAnalyzer": null,
 "searchAnalyzer": null,
 "synonymMaps": [],
 "fields": []
 },

my request look like this:

{
    "count": true,
    "skip": 0,
    "top": 5,
    "searchMode": "any",
    "queryType": "full",
    "search": "some_id:/SO(.*)/" // SOME\*S(.*) also doesnt work
}

I get zero matches.

With the Standart analyzer I started going no matches as soon as I had a \* in my regex (I escaped them with \)

Clarification on Requirements: I can not change any data, the values (including the *) can not be changed. I'm trying to have the whole field matched as a single token and for me to run startsWith on.

For example this regex: /SOME\*ST(.*)/ is supposed to literally return entries that fully match the regex. No magic with seperators or tokens, simply the whole value as a single token that I can run startsWith on. What I'm trying to say is, take for example JavaScript, I want the exact same results you would get from string.startsWith(value).

I'm guessing there is either something wrong with my config, or with my requests, can anyone help me?

question from:https://stackoverflow.com/questions/65898075/azure-cognitive-search-how-to-configure-analyzer-to-support-startswith

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

IMHO, you should work with a different separator. For example:

Field1 (FROM) | Field2 (TO)

SOME*STUFF*123  | SOME||STUFF||123 

Then use a custom analyzer to break terms every ||. Aditionally, you can also work with tokenizer and specify it to do it every 3 chars.

Samples:

SOM
OME
STU
TUF
UFF
123

Then search using:

SOM*

and it should return the data you're looking for. It would be better if you could provide more details about your content and give us samples, but this answer should point you to the result you're looking for.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...