• 设为首页
  • 点击收藏
  • 手机版
    手机扫一扫访问
    迪恩网络手机版
  • 关注官方公众号
    微信扫一扫关注
    迪恩网络公众号

Java RuleBasedBreakIterator类代码示例

原作者: [db:作者] 来自: [db:来源] 收藏 邀请

本文整理汇总了Java中com.ibm.icu.text.RuleBasedBreakIterator的典型用法代码示例。如果您正苦于以下问题:Java RuleBasedBreakIterator类的具体用法?Java RuleBasedBreakIterator怎么用?Java RuleBasedBreakIterator使用的例子?那么恭喜您, 这里精选的类代码示例或许可以为您提供帮助。



RuleBasedBreakIterator类属于com.ibm.icu.text包,在下文中一共展示了RuleBasedBreakIterator类的20个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于我们的系统推荐出更棒的Java代码示例。

示例1: EmojiTokenizerFactory

import com.ibm.icu.text.RuleBasedBreakIterator; //导入依赖的package包/类
public EmojiTokenizerFactory(IndexSettings indexSettings, Environment environment, String name, Settings settings) {
    super(indexSettings, name, settings);

    config = new DefaultICUTokenizerConfig(true, true) {
        @Override
        public BreakIterator getBreakIterator(int script) {
            // Load the ICU default rules
            RuleBasedBreakIterator rbbi = (RuleBasedBreakIterator)
                    BreakIterator.getWordInstance(Locale.getDefault());
            String defaultRules = rbbi.toString();

            // Customize the rules to add EmojiNRK as first class word
            defaultRules = defaultRules.replace(
                "!!forward;",
                "!!forward;\n$EmojiNRK {200};"
            );

            defaultRules = defaultRules.replace(
                "| $ZWJ)*;",
                "| $ZWJ)* {200};"
            );

            return new RuleBasedBreakIterator(defaultRules);
        }
    };
}
 
开发者ID:jolicode,项目名称:emoji-search,代码行数:27,代码来源:EmojiTokenizerFactory.java


示例2: calcStatus

import com.ibm.icu.text.RuleBasedBreakIterator; //导入依赖的package包/类
private int calcStatus(int current, int next) {
    if (current == BreakIterator.DONE || next == BreakIterator.DONE) {
        return RuleBasedBreakIterator.WORD_NONE;
    }
    int begin = start + current;
    int end = start + next;
    int codepoint;
    for (int i = begin; i < end; i += UTF16.getCharCount(codepoint)) {
        codepoint = UTF16.charAt(text, 0, end, begin);
        if (UCharacter.isDigit(codepoint)) {
            return RuleBasedBreakIterator.WORD_NUMBER;
        } else if (UCharacter.isLetter(codepoint)) {
            return RuleBasedBreakIterator.WORD_LETTER;
        }
    }
    return RuleBasedBreakIterator.WORD_NONE;
}
 
开发者ID:jprante,项目名称:elasticsearch-icu,代码行数:18,代码来源:BreakIteratorWrapper.java


示例3: calcStatus

import com.ibm.icu.text.RuleBasedBreakIterator; //导入依赖的package包/类
private int calcStatus(int current, int next) {
  if (current == BreakIterator.DONE || next == BreakIterator.DONE)
    return RuleBasedBreakIterator.WORD_NONE;

  int begin = start + current;
  int end = start + next;

  int codepoint;
  for (int i = begin; i < end; i += UTF16.getCharCount(codepoint)) {
    codepoint = UTF16.charAt(text, 0, end, begin);

    if (UCharacter.isDigit(codepoint))
      return RuleBasedBreakIterator.WORD_NUMBER;
    else if (UCharacter.isLetter(codepoint)) {
      // TODO: try to separately specify ideographic, kana? 
      // [currently all bundled as letter for this case]
      return RuleBasedBreakIterator.WORD_LETTER;
    }
  }

  return RuleBasedBreakIterator.WORD_NONE;
}
 
开发者ID:europeana,项目名称:search,代码行数:23,代码来源:BreakIteratorWrapper.java


示例4: maybeLoad

import com.ibm.icu.text.RuleBasedBreakIterator; //导入依赖的package包/类
private LineBreakIterator maybeLoad(Reporter reporter) {
    LineBreakIterator iterator = this.iterator;
    if (iterator != null)
        return iterator;
    else {
        BreakIterator bi = null;
        InputStream is = null;
        try {
            URL rulesLocator = getRulesLocator(name, RULES_BINARY_EXT);
            if (rulesLocator != null) {
                is = rulesLocator.openStream();
                bi = RuleBasedBreakIterator.getInstanceFromCompiledRules(is);
                reporter.logInfo(reporter.message("*KEY*", "Loaded rules based break iterator from ''{0}''.", rulesLocator.toString()));
            } else
                bi = BreakIterator.getCharacterInstance();
        } catch (IOException e) {
        } finally {
            IOUtil.closeSafely(is);
        }
        if (bi != null) {
            return this.iterator = new LineBreakIterator(bi);
        } else
            return null;
    }
}
 
开发者ID:skynav,项目名称:ttt,代码行数:26,代码来源:LineBreaker.java


示例5: parseRules

import com.ibm.icu.text.RuleBasedBreakIterator; //导入依赖的package包/类
private BreakIterator parseRules(String filename, Environment env) throws IOException {

        final Path path = env.configFile().resolve(filename);
        String rules = Files.readAllLines(path)
            .stream()
            .filter((v) -> v.startsWith("#") == false)
            .collect(Collectors.joining("\n"));

        return new RuleBasedBreakIterator(rules.toString());
    }
 
开发者ID:justor,项目名称:elasticsearch_my,代码行数:11,代码来源:IcuTokenizerFactory.java


示例6: getType

import com.ibm.icu.text.RuleBasedBreakIterator; //导入依赖的package包/类
@Override
public String getType(int script, int ruleStatus) {
    switch (ruleStatus) {
        case RuleBasedBreakIterator.WORD_IDEO:
            return WORD_IDEO;
        case RuleBasedBreakIterator.WORD_KANA:
            return script == UScript.HIRAGANA ? WORD_HIRAGANA : WORD_KATAKANA;
        case RuleBasedBreakIterator.WORD_LETTER:
            return script == UScript.HANGUL ? WORD_HANGUL : WORD_LETTER;
        case RuleBasedBreakIterator.WORD_NUMBER:
            return WORD_NUMBER;
        default: /* some other custom code */
            return "<OTHER>";
    }
}
 
开发者ID:jprante,项目名称:elasticsearch-icu,代码行数:16,代码来源:DefaultIcuTokenizerConfig.java


示例7: readBreakIterator

import com.ibm.icu.text.RuleBasedBreakIterator; //导入依赖的package包/类
private static RuleBasedBreakIterator readBreakIterator(String filename) {
    InputStream is = DefaultIcuTokenizerConfig.class.getResourceAsStream("/org/apache/lucene/analysis/icu/segmentation/" + filename);
    try {
        RuleBasedBreakIterator bi = RuleBasedBreakIterator.getInstanceFromCompiledRules(is);
        is.close();
        return bi;
    } catch (IOException e) {
        throw new RuntimeException(e);
    }
}
 
开发者ID:jprante,项目名称:elasticsearch-icu,代码行数:11,代码来源:DefaultIcuTokenizerConfig.java


示例8: wrap

import com.ibm.icu.text.RuleBasedBreakIterator; //导入依赖的package包/类
/**
 * If its a RuleBasedBreakIterator, the rule status can be used for token type. If its
 * any other BreakIterator, the rulestatus method is not available, so treat
 * it like a generic BreakIterator.
 */
static BreakIteratorWrapper wrap(BreakIterator breakIterator) {
    if (breakIterator instanceof RuleBasedBreakIterator) {
        return new RBBIWrapper((RuleBasedBreakIterator) breakIterator);
    } else {
        return new BIWrapper(breakIterator);
    }
}
 
开发者ID:jprante,项目名称:elasticsearch-icu,代码行数:13,代码来源:BreakIteratorWrapper.java


示例9: wrap

import com.ibm.icu.text.RuleBasedBreakIterator; //导入依赖的package包/类
/**
 * If its a RuleBasedBreakIterator, the rule status can be used for token type. If its
 * any other BreakIterator, the rulestatus method is not available, so treat
 * it like a generic BreakIterator.
 */
static BreakIteratorWrapper wrap(BreakIterator breakIterator) {
  if (breakIterator instanceof RuleBasedBreakIterator)
    return new RBBIWrapper((RuleBasedBreakIterator) breakIterator);
  else
    return new BIWrapper(breakIterator);
}
 
开发者ID:europeana,项目名称:search,代码行数:12,代码来源:BreakIteratorWrapper.java


示例10: parseRules

import com.ibm.icu.text.RuleBasedBreakIterator; //导入依赖的package包/类
private BreakIterator parseRules(String filename, ResourceLoader loader) throws IOException {
  StringBuilder rules = new StringBuilder();
  InputStream rulesStream = loader.openResource(filename);
  BufferedReader reader = new BufferedReader
      (IOUtils.getDecodingReader(rulesStream, StandardCharsets.UTF_8));
  String line = null;
  while ((line = reader.readLine()) != null) {
    if ( ! line.startsWith("#"))
      rules.append(line);
    rules.append('\n');
  }
  reader.close();
  return new RuleBasedBreakIterator(rules.toString());
}
 
开发者ID:europeana,项目名称:search,代码行数:15,代码来源:ICUTokenizerFactory.java


示例11: getType

import com.ibm.icu.text.RuleBasedBreakIterator; //导入依赖的package包/类
@Override
public String getType(int script, int ruleStatus) {
  switch (ruleStatus) {
    case RuleBasedBreakIterator.WORD_IDEO:
      return WORD_IDEO;
    case RuleBasedBreakIterator.WORD_KANA:
      return script == UScript.HIRAGANA ? WORD_HIRAGANA : WORD_KATAKANA;
    case RuleBasedBreakIterator.WORD_LETTER:
      return script == UScript.HANGUL ? WORD_HANGUL : WORD_LETTER;
    case RuleBasedBreakIterator.WORD_NUMBER:
      return WORD_NUMBER;
    default: /* some other custom code */
      return "<OTHER>";
  }
}
 
开发者ID:europeana,项目名称:search,代码行数:16,代码来源:DefaultICUTokenizerConfig.java


示例12: readBreakIterator

import com.ibm.icu.text.RuleBasedBreakIterator; //导入依赖的package包/类
private static RuleBasedBreakIterator readBreakIterator(String filename) {
  InputStream is = 
    DefaultICUTokenizerConfig.class.getResourceAsStream(filename);
  try {
    RuleBasedBreakIterator bi = 
      RuleBasedBreakIterator.getInstanceFromCompiledRules(is);
    is.close();
    return bi;
  } catch (IOException e) {
    throw new RuntimeException(e);
  }
}
 
开发者ID:europeana,项目名称:search,代码行数:13,代码来源:DefaultICUTokenizerConfig.java


示例13: compile

import com.ibm.icu.text.RuleBasedBreakIterator; //导入依赖的package包/类
static void compile(File srcDir, File destDir) throws Exception {
  File files[] = srcDir.listFiles(new FilenameFilter() {
    public boolean accept(File dir, String name) {
      return name.endsWith("rbbi");
    }});
  if (files == null) throw new IOException("Path does not exist: " + srcDir);
  for (int i = 0; i < files.length; i++) {
    File file = files[i];
    File outputFile = new File(destDir, 
        file.getName().replaceAll("rbbi$", "brk"));
    String rules = getRules(file);
    System.err.print("Compiling " + file.getName() + " to "
        + outputFile.getName() + ": ");
    /*
     * if there is a syntax error, compileRules() may succeed. the way to
     * check is to try to instantiate from the string. additionally if the
     * rules are invalid, you can get a useful syntax error.
     */
    try {
      new RuleBasedBreakIterator(rules);
    } catch (IllegalArgumentException e) {
      /*
       * do this intentionally, so you don't get a massive stack trace
       * instead, get a useful syntax error!
       */
      System.err.println(e.getMessage());
      System.exit(1);
    }
    FileOutputStream os = new FileOutputStream(outputFile);
    RuleBasedBreakIterator.compileRules(rules, os);
    os.close();
    System.err.println(outputFile.length() + " bytes.");
  }
}
 
开发者ID:europeana,项目名称:search,代码行数:35,代码来源:RBBIRuleCompiler.java


示例14: main

import com.ibm.icu.text.RuleBasedBreakIterator; //导入依赖的package包/类
public static void main(String[] args) {
    if (args.length == 2) {
        String inputFilePath = args[0];
        String outputFilePath = args[1];
        InputStream is = null;
        OutputStream os = null;
        BufferedReader r = null;
        try {
            is = new FileInputStream(inputFilePath);
            os = new FileOutputStream(outputFilePath);
            r = new BufferedReader(new InputStreamReader(is, defaultInputEncoding));
            StringBuffer rules = new StringBuffer();
            String line;
            while ((line = r.readLine()) != null) {
                rules.append(line);
                rules.append('\n');
            }
            RuleBasedBreakIterator.compileRules(rules.toString(), os);
        } catch (IOException e) {
        } finally {
            IOUtil.closeSafely(r);
            IOUtil.closeSafely(os);
            IOUtil.closeSafely(is);
        }
    } else {
        System.err.println("Usage: java -cp ... com.skynav.ttpe.text.LineBreaker [INPUT-FILE-PATH] [OUTPUT-FILE-PATH]");
    }
}
 
开发者ID:skynav,项目名称:ttt,代码行数:29,代码来源:LineBreaker.java


示例15: clone

import com.ibm.icu.text.RuleBasedBreakIterator; //导入依赖的package包/类
/**
 * Clone method.  Creates another LaoBreakIterator with the same behavior 
 * and current state as this one.
 * @return The clone.
 */
@Override
public LaoBreakIterator clone() {
  LaoBreakIterator other = (LaoBreakIterator) super.clone();
  other.rules = (RuleBasedBreakIterator) rules.clone();
  other.verify = (RuleBasedBreakIterator) verify.clone();
  if (text != null)
    other.text = text.clone();
  if (working != null)
    other.working = working.clone();
  if (verifyText != null)
    other.verifyText = verifyText.clone();
  return other;
}
 
开发者ID:pkarmstr,项目名称:NYBC,代码行数:19,代码来源:LaoBreakIterator.java


示例16: wrap

import com.ibm.icu.text.RuleBasedBreakIterator; //导入依赖的package包/类
/**
 * If its a DictionaryBasedBreakIterator, it doesn't return rulestatus, so
 * treat it like a generic BreakIterator If its any other
 * RuleBasedBreakIterator, the rule status can be used for token type. If its
 * any other BreakIterator, the rulestatus method is not available, so treat
 * it like a generic BreakIterator.
 */
static BreakIteratorWrapper wrap(BreakIterator breakIterator) {
  if (breakIterator instanceof RuleBasedBreakIterator
      && !(breakIterator instanceof DictionaryBasedBreakIterator))
    return new RBBIWrapper((RuleBasedBreakIterator) breakIterator);
  else
    return new BIWrapper(breakIterator);
}
 
开发者ID:pkarmstr,项目名称:NYBC,代码行数:15,代码来源:BreakIteratorWrapper.java


示例17: parseRules

import com.ibm.icu.text.RuleBasedBreakIterator; //导入依赖的package包/类
private BreakIterator parseRules(String filename, ResourceLoader loader) throws IOException {
  StringBuilder rules = new StringBuilder();
  InputStream rulesStream = loader.openResource(filename);
  BufferedReader reader = new BufferedReader
      (IOUtils.getDecodingReader(rulesStream, IOUtils.CHARSET_UTF_8));
  String line = null;
  while ((line = reader.readLine()) != null) {
    if ( ! line.startsWith("#"))
      rules.append(line);
    rules.append('\n');
  }
  reader.close();
  return new RuleBasedBreakIterator(rules.toString());
}
 
开发者ID:pkarmstr,项目名称:NYBC,代码行数:15,代码来源:ICUTokenizerFactory.java


示例18: setUp

import com.ibm.icu.text.RuleBasedBreakIterator; //导入依赖的package包/类
@Override
public void setUp() throws Exception {
  super.setUp();
  InputStream is = getClass().getResourceAsStream("Lao.brk");
  wordIterator = new LaoBreakIterator(RuleBasedBreakIterator.getInstanceFromCompiledRules(is));
  is.close();
}
 
开发者ID:pkarmstr,项目名称:NYBC,代码行数:8,代码来源:TestLaoBreakIterator.java


示例19: readBreakIterator

import com.ibm.icu.text.RuleBasedBreakIterator; //导入依赖的package包/类
private static RuleBasedBreakIterator readBreakIterator(ClassLoader classLoader, String resourceName) {
    try (InputStream inputStream = classLoader.getResource(resourceName).openStream()) {
        return RuleBasedBreakIterator.getInstanceFromCompiledRules(inputStream);
    } catch (IOException e) {
        throw new UncheckedIOException("unable to load resource " + resourceName + " " + e.getMessage(), e);
    }
}
 
开发者ID:jprante,项目名称:elasticsearch-plugin-bundle,代码行数:8,代码来源:DefaultIcuTokenizerConfig.java


示例20: compile

import com.ibm.icu.text.RuleBasedBreakIterator; //导入依赖的package包/类
public void compile(Path inputPath, Path outputPath) throws IOException {
    String rules = getRules(inputPath);
    try (OutputStream os = Files.newOutputStream(outputPath)) {
        new RuleBasedBreakIterator(rules);
        RuleBasedBreakIterator.compileRules(rules, os);
    } catch (IllegalArgumentException e) {
        logger.error(e.getMessage(), e);
    }
}
 
开发者ID:jprante,项目名称:elasticsearch-plugin-bundle,代码行数:10,代码来源:RBBIRuleCompiler.java



注:本文中的com.ibm.icu.text.RuleBasedBreakIterator类示例整理自Github/MSDocs等源码及文档管理平台,相关代码片段筛选自各路编程大神贡献的开源项目,源码版权归原作者所有,传播和使用请参考对应项目的License;未经允许,请勿转载。


鲜花

握手

雷人

路过

鸡蛋
该文章已有0人参与评论

请发表评论

全部评论

专题导读
上一篇:
Java ContainerLauncher类代码示例发布时间:2022-05-22
下一篇:
Java SetQuotaRequestProto类代码示例发布时间:2022-05-22
热门推荐
阅读排行榜

扫描微信二维码

查看手机版网站

随时了解更新最新资讯

139-2527-9053

在线客服(服务时间 9:00~18:00)

在线QQ客服
地址:深圳市南山区西丽大学城创智工业园
电邮:jeky_zhao#qq.com
移动电话:139-2527-9053

Powered by 互联科技 X3.4© 2001-2213 极客世界.|Sitemap