Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
800 views
in Technique[技术] by (71.8m points)

nlp - How can i add custom annotations to default ANNIE gazetteer?

I'm using the GATE *SDK* and would like to modify the default ANNIE Gazetteer to include a simple annotation based on a new list definition I have created.

  • I've added my list definition to GATE-HOMEpluginsANNIE esourcesgazetteer
  • I've added an entry in the lists.def file to point to my new list file. E.g. *open_source_software:opensouce*
  • I've created an annotation schema and added to the GATE-HOMEpluginsANNIE esourcesschema
  • When i load ANNIE and run the application it does not automatically identify the annotation however when i hover over a word which exists in the new list definition ANNIE highlights the word and suggests the correct annotation

Is it possible to make his automatic so i don't have to train ANNIE? And so i can do it pro-grammatically?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

By default the gazetteer creates annotations of type Lookup with majorType and minorType features, for example an entry in the .def file of

oss.lst:software:open_source

would create Lookups with majorType "software" and minorType "open_source" for entries in the list. The usual approach then would be to write JAPE rules that process the Lookup annotations and create the final annotations.

It is possible to create other annotation types directly from the gazetteer, by adding more fields to the .def line:

oss.lst:software:open_source::Software

would create annotations of type Software instead of Lookup (the fields are list file name, major type, minor type, language, and annotation type). But generally I'd recommend sticking with Lookup and then creating your final annotations with JAPE, so you can add additional rules as necessary (the gazetteer blindly annotates any mentions of anything in the list, you often need heuristics to filter this down, for example "Apache" might be considered software most of the time, but not when followed by the word "License").

Finally, if you want to add your own gazetteer lists and/or JAPE rules then we recommend you don't edit the files under plugins/ANNIE directly. Instead create your own lists.def somewhere else, and load that into a separate instance of the gazetteer PR, inserted at the appropriate place in the pipeline.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...