• 设为首页
  • 点击收藏
  • 手机版
    手机扫一扫访问
    迪恩网络手机版
  • 关注官方公众号
    微信扫一扫关注
    迪恩网络公众号

Java CmdLineUtil类代码示例

原作者: [db:作者] 来自: [db:来源] 收藏 邀请

本文整理汇总了Java中opennlp.tools.cmdline.CmdLineUtil的典型用法代码示例。如果您正苦于以下问题:Java CmdLineUtil类的具体用法?Java CmdLineUtil怎么用?Java CmdLineUtil使用的例子?那么恭喜您, 这里精选的类代码示例或许可以为您提供帮助。



CmdLineUtil类属于opennlp.tools.cmdline包,在下文中一共展示了CmdLineUtil类的8个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于我们的系统推荐出更棒的Java代码示例。

示例1: run

import opennlp.tools.cmdline.CmdLineUtil; //导入依赖的package包/类
public void run(String[] args) {
  Params params = validateAndParseParams(args, Params.class);

  File dictInFile = params.getInputFile();

  CmdLineUtil.checkInputFile("dictionary input file", dictInFile);
  Path metadataPath = DictionaryMetadata.getExpectedMetadataLocation(dictInFile.toPath());
  CmdLineUtil.checkInputFile("dictionary metadata (.info) input file", metadataPath.toFile());

  MorfologikDictionayBuilder builder = new MorfologikDictionayBuilder();
  try {
    builder.build(dictInFile.toPath(), params.getOverwrite(),
        params.getValidate(), params.getAcceptBOM(), params.getAcceptCR(),
        params.getIgnoreEmpty());
  } catch (Exception e) {
    throw new TerminateToolException(-1,
        "Error while creating Morfologik POS Dictionay: " + e.getMessage(), e);
  }

}
 
开发者ID:apache,项目名称:opennlp-addons,代码行数:21,代码来源:MorfologikDictionaryBuilderTool.java


示例2: main

import opennlp.tools.cmdline.CmdLineUtil; //导入依赖的package包/类
public static void main(String[] args) {
if (args.length < 2) {
    System.out.println("usage: <input> <output>\n");
    System.exit(0);
}

String input = args[0];
String output = args[1];

TrainingParameters params = new TrainingParameters();
params.put(TrainingParameters.CUTOFF_PARAM, Integer.toString(0));
params.put(TrainingParameters.ITERATIONS_PARAM, Integer.toString(100));
//params.put(TrainingParameters.ALGORITHM_PARAM, NaiveBayesTrainer.NAIVE_BAYES_VALUE);

AgeClassifyModel model;
try {
    model = AgeClassifySparkTrainer.createModel("en", input, 
        "opennlp.tools.tokenize.SentenceTokenizer", "opennlp.tools.tokenize.BagOfWordsTokenizer", params);
} catch (IOException e) {
    throw new TerminateToolException(-1,
        "IO error while reading training data or indexing data: " + e.getMessage(), e);
}
CmdLineUtil.writeModel("age classifier", new File(output), model);
   }
 
开发者ID:USCDataScience,项目名称:AgePredictor,代码行数:25,代码来源:AgeClassifySparkTrainer.java


示例3: serializeEntityGazetteers

import opennlp.tools.cmdline.CmdLineUtil; //导入依赖的package包/类
public static void serializeEntityGazetteers(Path dictionaryFile)
    throws IOException {
  Map<String, String> dictionary = new HashMap<String, String>();
  InputStream inputStream = CmdLineUtil.openInFile(dictionaryFile.toFile());
  BufferedReader breader = new BufferedReader(
      new InputStreamReader(inputStream, Charset.forName("UTF-8")));
  String line;
  while ((line = breader.readLine()) != null) {
    String[] lineArray = tabPattern.split(line);
    if (lineArray.length == 2) {
      String normalizedToken = dotInsideI.matcher(lineArray[0])
          .replaceAll("i");
      dictionary.put(normalizedToken.toLowerCase(), lineArray[1].intern());
    } else {
      System.err.println(lineArray[0] + " is not well formed!");
    }
  }
  String outputFile = dictionaryFile.toString() + SER_GZ;
  IOUtils.writeClusterToFile(dictionary, outputFile, IOUtils.TAB_DELIMITER);
  breader.close();
}
 
开发者ID:ragerri,项目名称:ixa-pipe-convert,代码行数:22,代码来源:SerializeResources.java


示例4: serializeLemmaDictionary

import opennlp.tools.cmdline.CmdLineUtil; //导入依赖的package包/类
public static void serializeLemmaDictionary(Path lemmaDict)
    throws IOException {
  Map<List<String>, String> dictMap = new HashMap<List<String>, String>();
  InputStream inputStream = CmdLineUtil.openInFile(lemmaDict.toFile());
  BufferedReader breader = new BufferedReader(
      new InputStreamReader(inputStream, Charset.forName("UTF-8")));
  String line;
  while ((line = breader.readLine()) != null) {
    final String[] elems = tabPattern.split(line);
    if (elems.length == 3) {
      String normalizedToken = dotInsideI.matcher(elems[0]).replaceAll("I");
      dictMap.put(Arrays.asList(normalizedToken, elems[2]), elems[1]);
    } else {
      System.err.println(elems[0] + " is not well formed!");
    }
  }
  String outputFile = lemmaDict.toString() + SER_GZ;
  IOUtils.writeDictionaryLemmatizerToFile(dictMap, outputFile,
      IOUtils.TAB_DELIMITER);
  breader.close();
}
 
开发者ID:ragerri,项目名称:ixa-pipe-convert,代码行数:22,代码来源:SerializeResources.java


示例5: train

import opennlp.tools.cmdline.CmdLineUtil; //导入依赖的package包/类
/**
 * Main entry point for training.
 * 
 * @throws IOException
 *           throws an exception if errors in the various file inputs.
 */
public final void train() throws IOException {
  // load training parameters file
  final String paramFile = this.parsedArguments.getString("params");
  final TrainingParameters params = InputOutputUtils
      .loadTrainingParameters(paramFile);
  String outModel = null;
  if (params.getSettings().get("OutputModel") == null
      || params.getSettings().get("OutputModel").length() == 0) {
    outModel = Files.getNameWithoutExtension(paramFile) + ".bin";
    params.put("OutputModel", outModel);
  } else {
    outModel = Flags.getModel(params);
  }
  final Trainer chunkerTrainer = new DefaultTrainer(params);
  final ChunkerModel trainedModel = chunkerTrainer.train(params);
  CmdLineUtil.writeModel("ixa-pipe-chunk", new File(outModel), trainedModel);
}
 
开发者ID:ixa-ehu,项目名称:ixa-pipe-chunk,代码行数:24,代码来源:CLI.java


示例6: openSampleData

import opennlp.tools.cmdline.CmdLineUtil; //导入依赖的package包/类
static ObjectStream<POSSample> openSampleData(String sampleDataName, File sampleDataFile, Charset encoding) {
    CmdLineUtil.checkInputFile(sampleDataName + " Data", sampleDataFile);
    FileInputStream sampleDataIn = CmdLineUtil.openInFile(sampleDataFile);
    ObjectStream<String> lineStream = new PlainTextByLineStream(sampleDataIn.getChannel(), encoding);
    return new WordTagSampleStream(lineStream);
}
 
开发者ID:radsimu,项目名称:UaicNlpToolkit,代码行数:7,代码来源:POStrainer.java


示例7: train

import opennlp.tools.cmdline.CmdLineUtil; //导入依赖的package包/类
public void train() throws IOException {
    if (languageCode == null) {
        throw new IllegalStateException("languageCode is not provided");
    }
    if (modelOutFile == null) {
        throw new IllegalStateException("model output path is not provided");
    }
    if (trainParams == null) {
        throw new IllegalStateException("training parameters are not set");
    }
    if (sentenceStream == null) {
        throw new IllegalStateException("sentence stream is not configured");
    }
    if (taggerFactory == null) {
        throw new IllegalStateException("tagger factory is not configured");
    }
    Map<String, String> manifestInfoEntries = new HashMap<>();
    BeamSearchContextGenerator<Token> contextGenerator = taggerFactory.getContextGenerator();

    MaxentModel posModel;
    try {
        if (TrainerFactory.TrainerType.EVENT_MODEL_TRAINER.equals(
                TrainerFactory.getTrainerType(trainParams.getSettings()))) {

            ObjectStream<Event> es = new POSTokenEventStream<>(sentenceStream, contextGenerator);
            EventTrainer trainer = TrainerFactory.getEventTrainer(trainParams.getSettings(), manifestInfoEntries);
            posModel = trainer.train(es);
        } else {
            throw new UnsupportedOperationException("Sequence training");
            //POSSampleSequenceStream ss = new POSSampleSequenceStream(samples, contextGenerator);
            // posModel = TrainUtil.train(ss, trainParams.getSettings(), manifestInfoEntries);
        }
    } finally {
        sentenceStream.close();
    }
    POSModel modelAggregate = new POSModel(languageCode,
            posModel, manifestInfoEntries, taggerFactory);
    CmdLineUtil.writeModel("PoS-tagger", modelOutFile, modelAggregate);
}
 
开发者ID:textocat,项目名称:textokit-core,代码行数:40,代码来源:OpenNLPPosTaggerTrainer.java


示例8: brownCleanUpperCase

import opennlp.tools.cmdline.CmdLineUtil; //导入依赖的package包/类
/**
 * Do not print a sentence if is less than 90% lowercase.
 * 
 * @param sentences
 *          the list of sentences
 * @throws IOException
 */
private static void brownCleanUpperCase(Path inFile) throws IOException {
  StringBuilder precleantext = new StringBuilder();
  InputStream inputStream = CmdLineUtil.openInFile(inFile.toFile());
  BufferedReader breader = new BufferedReader(
      new InputStreamReader(inputStream, Charset.forName("UTF-8")));
  String line;
  while ((line = breader.readLine()) != null) {
    double lowercaseCounter = 0;
    StringBuilder sb = new StringBuilder();
    String[] lineArray = line.split(" ");
    for (String word : lineArray) {
      if (lineArray.length > 0) {
        sb.append(word);
      }
    }
    char[] lineCharArray = sb.toString().toCharArray();
    for (char lineArr : lineCharArray) {
      if (Character.isLowerCase(lineArr)) {
        lowercaseCounter++;
      }
    }
    double percent = lowercaseCounter / (double) lineCharArray.length;
    if (percent >= 0.90) {
      precleantext.append(line).append("\n");
    }
  }
  Path outfile = Files.createFile(Paths.get(inFile.toString() + ".clean"));
  Files.write(outfile,
      precleantext.toString().getBytes(StandardCharsets.UTF_8));
  System.err.println(">> Wrote clean document to " + outfile);
  breader.close();
}
 
开发者ID:ragerri,项目名称:ixa-pipe-convert,代码行数:40,代码来源:Convert.java



注:本文中的opennlp.tools.cmdline.CmdLineUtil类示例整理自Github/MSDocs等源码及文档管理平台,相关代码片段筛选自各路编程大神贡献的开源项目,源码版权归原作者所有,传播和使用请参考对应项目的License;未经允许,请勿转载。


鲜花

握手

雷人

路过

鸡蛋
该文章已有0人参与评论

请发表评论

全部评论

专题导读
上一篇:
Java Tinker类代码示例发布时间:2022-05-23
下一篇:
Java StringList类代码示例发布时间:2022-05-23
热门推荐
阅读排行榜

扫描微信二维码

查看手机版网站

随时了解更新最新资讯

139-2527-9053

在线客服(服务时间 9:00~18:00)

在线QQ客服
地址:深圳市南山区西丽大学城创智工业园
电邮:jeky_zhao#qq.com
移动电话:139-2527-9053

Powered by 互联科技 X3.4© 2001-2213 极客世界.|Sitemap