• 设为首页
  • 点击收藏
  • 手机版
    手机扫一扫访问
    迪恩网络手机版
  • 关注官方公众号
    微信扫一扫关注
    迪恩网络公众号

Java LetterTokenizer类代码示例

原作者: [db:作者] 来自: [db:来源] 收藏 邀请

本文整理汇总了Java中org.apache.lucene.analysis.core.LetterTokenizer的典型用法代码示例。如果您正苦于以下问题:Java LetterTokenizer类的具体用法?Java LetterTokenizer怎么用?Java LetterTokenizer使用的例子?那么恭喜您, 这里精选的类代码示例或许可以为您提供帮助。



LetterTokenizer类属于org.apache.lucene.analysis.core包,在下文中一共展示了LetterTokenizer类的10个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于我们的系统推荐出更棒的Java代码示例。

示例1: main

import org.apache.lucene.analysis.core.LetterTokenizer; //导入依赖的package包/类
public static void main(String[] args) throws IOException {
	System.out.println(NumberUtils.isDigits("12345"));
	System.out.println(NumberUtils.isDigits("12345.1"));
	System.out.println(NumberUtils.isDigits("12345,2"));
	
	System.out.println(NumberUtils.isNumber("12345"));
	System.out.println(NumberUtils.isNumber("12345.1"));
	System.out.println(NumberUtils.isNumber("12345,2".replace(",", ".")));
	System.out.println(NumberUtils.isNumber("12345,2"));
	StringReader input = new StringReader(
			"Правя тест на класификатор и после др.Дулитъл, пада.br2n ще се оправя с данните! които,са много зле. Но това е по-добре. Но24"
					.replaceAll("br2n", ""));

	LetterTokenizer tokenizer = new LetterTokenizer();
	tokenizer.setReader(input);

	TokenFilter stopFilter = new StopFilter(tokenizer, BULGARIAN_STOP_WORDS_SET);
	TokenFilter length = new LengthFilter(stopFilter, 3, 1000);
	TokenFilter stemmer = new BulgarianStemFilter(length);
	TokenFilter ngrams = new ShingleFilter(stemmer, 2, 2);

	try (TokenFilter filter = ngrams) {

		Attribute termAtt = filter.addAttribute(CharTermAttribute.class);
		filter.reset();
		while (filter.incrementToken()) {
			String word = termAtt.toString().replaceAll(",", "\\.").replaceAll("\n|\r", "");
			System.out.println(word);
		}
	}
}
 
开发者ID:mhardalov,项目名称:news-credibility,代码行数:32,代码来源:EgdeMain.java


示例2: testCrossPlaneNormalization

import org.apache.lucene.analysis.core.LetterTokenizer; //导入依赖的package包/类
public void testCrossPlaneNormalization() throws IOException {
  Analyzer analyzer = new Analyzer() {
    @Override
    protected TokenStreamComponents createComponents(String fieldName, Reader reader) {
      Tokenizer tokenizer = new LetterTokenizer(newAttributeFactory(), reader) {
        @Override
        protected int normalize(int c) {
          if (c > 0xffff) {
            return 'δ';
          } else {
            return c;
          }
        }
      };
      return new TokenStreamComponents(tokenizer, tokenizer);
    }
  };
  int num = 1000 * RANDOM_MULTIPLIER;
  for (int i = 0; i < num; i++) {
    String s = TestUtil.randomUnicodeString(random());
    TokenStream ts = analyzer.tokenStream("foo", s);
    try {
      ts.reset();
      OffsetAttribute offsetAtt = ts.addAttribute(OffsetAttribute.class);
      while (ts.incrementToken()) {
        String highlightedText = s.substring(offsetAtt.startOffset(), offsetAtt.endOffset());
        for (int j = 0, cp = 0; j < highlightedText.length(); j += Character.charCount(cp)) {
          cp = highlightedText.codePointAt(j);
          assertTrue("non-letter:" + Integer.toHexString(cp), Character.isLetter(cp));
        }
      }
      ts.end();
    } finally {
      IOUtils.closeWhileHandlingException(ts);
    }
  }
  // just for fun
  checkRandomData(random(), analyzer, num);
}
 
开发者ID:europeana,项目名称:search,代码行数:40,代码来源:TestCharTokenizers.java


示例3: testCrossPlaneNormalization2

import org.apache.lucene.analysis.core.LetterTokenizer; //导入依赖的package包/类
public void testCrossPlaneNormalization2() throws IOException {
  Analyzer analyzer = new Analyzer() {
    @Override
    protected TokenStreamComponents createComponents(String fieldName, Reader reader) {
      Tokenizer tokenizer = new LetterTokenizer(newAttributeFactory(), reader) {
        @Override
        protected int normalize(int c) {
          if (c <= 0xffff) {
            return 0x1043C;
          } else {
            return c;
          }
        }
      };
      return new TokenStreamComponents(tokenizer, tokenizer);
    }
  };
  int num = 1000 * RANDOM_MULTIPLIER;
  for (int i = 0; i < num; i++) {
    String s = TestUtil.randomUnicodeString(random());
    TokenStream ts = analyzer.tokenStream("foo", s);
    try {
      ts.reset();
      OffsetAttribute offsetAtt = ts.addAttribute(OffsetAttribute.class);
      while (ts.incrementToken()) {
        String highlightedText = s.substring(offsetAtt.startOffset(), offsetAtt.endOffset());
        for (int j = 0, cp = 0; j < highlightedText.length(); j += Character.charCount(cp)) {
          cp = highlightedText.codePointAt(j);
          assertTrue("non-letter:" + Integer.toHexString(cp), Character.isLetter(cp));
        }
      }
      ts.end();
    } finally {
      IOUtils.closeWhileHandlingException(ts);
    }
  }
  // just for fun
  checkRandomData(random(), analyzer, num);
}
 
开发者ID:europeana,项目名称:search,代码行数:40,代码来源:TestCharTokenizers.java


示例4: testCrossPlaneNormalization

import org.apache.lucene.analysis.core.LetterTokenizer; //导入依赖的package包/类
public void testCrossPlaneNormalization() throws IOException {
  Analyzer analyzer = new Analyzer() {
    @Override
    protected TokenStreamComponents createComponents(String fieldName, Reader reader) {
      Tokenizer tokenizer = new LetterTokenizer(TEST_VERSION_CURRENT, reader) {
        @Override
        protected int normalize(int c) {
          if (c > 0xffff) {
            return 'δ';
          } else {
            return c;
          }
        }
      };
      return new TokenStreamComponents(tokenizer, tokenizer);
    }
  };
  int num = 1000 * RANDOM_MULTIPLIER;
  for (int i = 0; i < num; i++) {
    String s = _TestUtil.randomUnicodeString(random());
    TokenStream ts = analyzer.tokenStream("foo", new StringReader(s));
    ts.reset();
    OffsetAttribute offsetAtt = ts.addAttribute(OffsetAttribute.class);
    while (ts.incrementToken()) {
      String highlightedText = s.substring(offsetAtt.startOffset(), offsetAtt.endOffset());
      for (int j = 0, cp = 0; j < highlightedText.length(); j += Character.charCount(cp)) {
        cp = highlightedText.codePointAt(j);
        assertTrue("non-letter:" + Integer.toHexString(cp), Character.isLetter(cp));
      }
    }
    ts.end();
    ts.close();
  }
  // just for fun
  checkRandomData(random(), analyzer, num);
}
 
开发者ID:pkarmstr,项目名称:NYBC,代码行数:37,代码来源:TestCharTokenizers.java


示例5: testCrossPlaneNormalization2

import org.apache.lucene.analysis.core.LetterTokenizer; //导入依赖的package包/类
public void testCrossPlaneNormalization2() throws IOException {
  Analyzer analyzer = new Analyzer() {
    @Override
    protected TokenStreamComponents createComponents(String fieldName, Reader reader) {
      Tokenizer tokenizer = new LetterTokenizer(TEST_VERSION_CURRENT, reader) {
        @Override
        protected int normalize(int c) {
          if (c <= 0xffff) {
            return 0x1043C;
          } else {
            return c;
          }
        }
      };
      return new TokenStreamComponents(tokenizer, tokenizer);
    }
  };
  int num = 1000 * RANDOM_MULTIPLIER;
  for (int i = 0; i < num; i++) {
    String s = _TestUtil.randomUnicodeString(random());
    TokenStream ts = analyzer.tokenStream("foo", new StringReader(s));
    ts.reset();
    OffsetAttribute offsetAtt = ts.addAttribute(OffsetAttribute.class);
    while (ts.incrementToken()) {
      String highlightedText = s.substring(offsetAtt.startOffset(), offsetAtt.endOffset());
      for (int j = 0, cp = 0; j < highlightedText.length(); j += Character.charCount(cp)) {
        cp = highlightedText.codePointAt(j);
        assertTrue("non-letter:" + Integer.toHexString(cp), Character.isLetter(cp));
      }
    }
    ts.end();
    ts.close();
  }
  // just for fun
  checkRandomData(random(), analyzer, num);
}
 
开发者ID:pkarmstr,项目名称:NYBC,代码行数:37,代码来源:TestCharTokenizers.java


示例6: testCrossPlaneNormalization

import org.apache.lucene.analysis.core.LetterTokenizer; //导入依赖的package包/类
public void testCrossPlaneNormalization() throws IOException {
  Analyzer analyzer = new Analyzer() {
    @Override
    protected TokenStreamComponents createComponents(String fieldName, Reader reader) {
      Tokenizer tokenizer = new LetterTokenizer(TEST_VERSION_CURRENT, reader) {
        @Override
        protected int normalize(int c) {
          if (c > 0xffff) {
            return 'δ';
          } else {
            return c;
          }
        }
      };
      return new TokenStreamComponents(tokenizer, tokenizer);
    }
  };
  int num = 1000 * RANDOM_MULTIPLIER;
  for (int i = 0; i < num; i++) {
    String s = _TestUtil.randomUnicodeString(random());
    TokenStream ts = analyzer.tokenStream("foo", s);
    try {
      ts.reset();
      OffsetAttribute offsetAtt = ts.addAttribute(OffsetAttribute.class);
      while (ts.incrementToken()) {
        String highlightedText = s.substring(offsetAtt.startOffset(), offsetAtt.endOffset());
        for (int j = 0, cp = 0; j < highlightedText.length(); j += Character.charCount(cp)) {
          cp = highlightedText.codePointAt(j);
          assertTrue("non-letter:" + Integer.toHexString(cp), Character.isLetter(cp));
        }
      }
      ts.end();
    } finally {
      IOUtils.closeWhileHandlingException(ts);
    }
  }
  // just for fun
  checkRandomData(random(), analyzer, num);
}
 
开发者ID:jimaguere,项目名称:Maskana-Gestor-de-Conocimiento,代码行数:40,代码来源:TestCharTokenizers.java


示例7: testCrossPlaneNormalization2

import org.apache.lucene.analysis.core.LetterTokenizer; //导入依赖的package包/类
public void testCrossPlaneNormalization2() throws IOException {
  Analyzer analyzer = new Analyzer() {
    @Override
    protected TokenStreamComponents createComponents(String fieldName, Reader reader) {
      Tokenizer tokenizer = new LetterTokenizer(TEST_VERSION_CURRENT, reader) {
        @Override
        protected int normalize(int c) {
          if (c <= 0xffff) {
            return 0x1043C;
          } else {
            return c;
          }
        }
      };
      return new TokenStreamComponents(tokenizer, tokenizer);
    }
  };
  int num = 1000 * RANDOM_MULTIPLIER;
  for (int i = 0; i < num; i++) {
    String s = _TestUtil.randomUnicodeString(random());
    TokenStream ts = analyzer.tokenStream("foo", s);
    try {
      ts.reset();
      OffsetAttribute offsetAtt = ts.addAttribute(OffsetAttribute.class);
      while (ts.incrementToken()) {
        String highlightedText = s.substring(offsetAtt.startOffset(), offsetAtt.endOffset());
        for (int j = 0, cp = 0; j < highlightedText.length(); j += Character.charCount(cp)) {
          cp = highlightedText.codePointAt(j);
          assertTrue("non-letter:" + Integer.toHexString(cp), Character.isLetter(cp));
        }
      }
      ts.end();
    } finally {
      IOUtils.closeWhileHandlingException(ts);
    }
  }
  // just for fun
  checkRandomData(random(), analyzer, num);
}
 
开发者ID:jimaguere,项目名称:Maskana-Gestor-de-Conocimiento,代码行数:40,代码来源:TestCharTokenizers.java


示例8: create

import org.apache.lucene.analysis.core.LetterTokenizer; //导入依赖的package包/类
@Override
public Tokenizer create() {
    return new LetterTokenizer();
}
 
开发者ID:justor,项目名称:elasticsearch_my,代码行数:5,代码来源:LetterTokenizerFactory.java


示例9: create

import org.apache.lucene.analysis.core.LetterTokenizer; //导入依赖的package包/类
@Override
public LetterTokenizer create(Reader input) {
  return new LetterTokenizer(luceneMatchVersion, input);
}
 
开发者ID:pkarmstr,项目名称:NYBC,代码行数:5,代码来源:LetterTokenizerFactory.java


示例10: tokenStream

import org.apache.lucene.analysis.core.LetterTokenizer; //导入依赖的package包/类
@Override
public TokenStream tokenStream(String fieldName, Reader reader) {
    return new MetaphoneReplacementFilter(new LetterTokenizer(reader));
}
 
开发者ID:xuzhikethinker,项目名称:t4f-data,代码行数:5,代码来源:MetaphoneReplacementAnalyzer.java



注:本文中的org.apache.lucene.analysis.core.LetterTokenizer类示例整理自Github/MSDocs等源码及文档管理平台,相关代码片段筛选自各路编程大神贡献的开源项目,源码版权归原作者所有,传播和使用请参考对应项目的License;未经允许,请勿转载。


鲜花

握手

雷人

路过

鸡蛋
该文章已有0人参与评论

请发表评论

全部评论

专题导读
上一篇:
Java LineSeparator类代码示例发布时间:2022-05-22
下一篇:
Java OrderedBytes类代码示例发布时间:2022-05-22
热门推荐
阅读排行榜

扫描微信二维码

查看手机版网站

随时了解更新最新资讯

139-2527-9053

在线客服(服务时间 9:00~18:00)

在线QQ客服
地址:深圳市南山区西丽大学城创智工业园
电邮:jeky_zhao#qq.com
移动电话:139-2527-9053

Powered by 互联科技 X3.4© 2001-2213 极客世界.|Sitemap