Java RegexMatches类代码示例

OGeek|极客世界-中国程序员成长平台 › 门户 › 编程› Java›Java编程经验

原作者: [db:作者] 来自: [db:来源] 收藏邀请

本文整理汇总了Java中cc.mallet.pipe.tsf.RegexMatches类的典型用法代码示例。如果您正苦于以下问题：Java RegexMatches类的具体用法？Java RegexMatches怎么用？Java RegexMatches使用的例子？那么恭喜您, 这里精选的类代码示例或许可以为您提供帮助。

RegexMatches类属于cc.mallet.pipe.tsf包，在下文中一共展示了RegexMatches类的9个代码示例，这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞，您的评价将有助于我们的系统推荐出更棒的Java代码示例。

示例1: testMultiTagSerialization

import cc.mallet.pipe.tsf.RegexMatches; //导入依赖的package包/类
public static void testMultiTagSerialization () throws IOException, ClassNotFoundException
{
  Pipe origPipe = new SerialPipes (new Pipe[] {
          new SimpleTaggerSentence2TokenSequence (),
          new TokenText (),
          new RegexMatches ("digits", Pattern.compile ("[0-9]+")),
          new RegexMatches ("ampm", Pattern.compile ("[aApP][mM]")),
          new OffsetFeatureConjunction ("time",
                  new String[] { "digits", "ampm" },
                  new int[] { 0, 1 },
                  true),
          new PrintInputAndTarget (),
  });

  Pipe mtPipe = (Pipe) TestSerializable.cloneViaSerialization (origPipe);
  InstanceList mtLst = new InstanceList (mtPipe);
  mtLst.addThruPipe (new ArrayIterator (doc1));
  Instance mtInst = mtLst.get (0);
  TokenSequence mtTs = (TokenSequence) mtInst.getData ();
  assertEquals (6, mtTs.size ());
  assertEquals (1.0, mtTs.get (3).getFeatureValue ("time"), 1e-15);
  assertEquals (1.0, mtTs.get (4).getFeatureValue ("time"), 1e-15);
}

开发者ID:kostagiolasn，项目名称:NucleosomePatternClassifier，代码行数:24，代码来源:TestOffsetFeatureConjunctions.java

示例2: addFullTextPipes

import cc.mallet.pipe.tsf.RegexMatches; //导入依赖的package包/类
/** Pipes added based on experience with full text */
private static void addFullTextPipes(List<String> usedPipeNames,
        List<Pipe> pipes) {

    // blabla 24 24
    pipes.add(new LongRegexSpaced("digit_then_other_then_digit", Pattern
            .compile("\\d+[^\\d]+\\d+"), 2, 4));

    // 30 mM K SO , 5 mM MgCl 6H O, 10 mM 24 24 22 HEPES
    pipes.add(new LongRegexSpaced(
            "digit_then_other_then_digit_then_other_then_digit", Pattern
                    .compile(".*\\d+[^\\d\\n]+\\d+[^\\d\\n]+\\d+.*"), 4, 9));

    // n 19
    // n 5
    pipes.add(new LongRegexSpaced("n_space_digit", Pattern
            .compile("n \\d+"), 2, 2));
    pipes.add(new LongRegexSpaced("parenthesis_n_space_digit_parenthesis",
            Pattern.compile("\\( n \\d+ \\)"), 3, 4));
    pipes.add(new LongRegexSpaced("n_space_digit_parenthesis", Pattern
            .compile("n \\d+ \\)"), 3, 4));
    pipes.add(new LongRegexSpaced("parenthesis_n_space_digit", Pattern
            .compile("\\( n \\d+"), 3, 4));

    // Fig is never found in any lexicon
    pipes.add(new RegexMatches("Figure", Pattern.compile(".*Fig.*")));
}

开发者ID:BlueBrain，项目名称:bluima，代码行数:28，代码来源:BrainRegionPipes.java

示例3: addPrefixPipes

import cc.mallet.pipe.tsf.RegexMatches; //导入依赖的package包/类
public static void addPrefixPipes(List<Pipe> pipes, File file, String name)
        throws IOException {
    for (String line : linesFrom(file.getAbsolutePath())) {
        pipes.add(new RegexMatches(name, compile("(" + line.trim()
                + ".{1,3})", CASE_INSENSITIVE)));
    }
}

开发者ID:BlueBrain，项目名称:bluima，代码行数:8，代码来源:BrainRegionPipes.java

示例4: addSubstringRegexPipes

import cc.mallet.pipe.tsf.RegexMatches; //导入依赖的package包/类
public static void addSubstringRegexPipes(List<String> usedPipeNames,
        List<Pipe> pipes) throws Exception {
    usedPipeNames.add("Substring regexes");

    // "thalamic" and nuclie are probably in the 1-grams
    for (String substring : new String[] { "cortic", "cerebel" }) {
        pipes.add(new RegexMatches(substring + "Regex", compile(".*"
                + substring + ".*", CASE_INSENSITIVE)));
    }
}

开发者ID:BlueBrain，项目名称:bluima，代码行数:11，代码来源:BrainRegionPipes.java

示例5: testMultiTag

import cc.mallet.pipe.tsf.RegexMatches; //导入依赖的package包/类
public static void testMultiTag ()
{
  Pipe mtPipe = new SerialPipes (new Pipe[] {
          new SimpleTaggerSentence2TokenSequence (),
          new TokenText (),
          new RegexMatches ("digits", Pattern.compile ("[0-9]+")),
          new RegexMatches ("ampm", Pattern.compile ("[aApP][mM]")),
          new OffsetFeatureConjunction ("time",
                  new String[] { "digits", "ampm" },
                  new int[] { 0, 1 },
                  true),
          new PrintInputAndTarget (),
  });
  Pipe noMtPipe = new SerialPipes (new Pipe[] {
          new SimpleTaggerSentence2TokenSequence (),
          new TokenText (),
          new RegexMatches ("digits", Pattern.compile ("[0-9]+")),
          new RegexMatches ("ampm", Pattern.compile ("[aApP][mM]")),
          new OffsetFeatureConjunction ("time",
                  new String[] { "digits", "ampm" },
                  new int[] { 0, 1 },
                  false),
          new PrintInputAndTarget (),
  });

  InstanceList mtLst = new InstanceList (mtPipe);
  InstanceList noMtLst = new InstanceList (noMtPipe);

  mtLst.addThruPipe (new ArrayIterator (doc1));
  noMtLst.addThruPipe (new ArrayIterator (doc1));

  Instance mtInst = mtLst.get (0);
  Instance noMtInst = noMtLst.get (0);

  TokenSequence mtTs = (TokenSequence) mtInst.getData ();
  TokenSequence noMtTs = (TokenSequence) noMtInst.getData ();

  assertEquals (6, mtTs.size ());
  assertEquals (6, noMtTs.size ());

  assertEquals (1.0, mtTs.get (3).getFeatureValue ("time"), 1e-15);
  assertEquals (1.0, noMtTs.get (3).getFeatureValue ("time"), 1e-15);
  assertEquals (1.0, mtTs.get (4).getFeatureValue ("time"), 1e-15);
  assertEquals (0.0, noMtTs.get (4).getFeatureValue ("time"), 1e-15);
}

开发者ID:kostagiolasn，项目名称:NucleosomePatternClassifier，代码行数:46，代码来源:TestOffsetFeatureConjunctions.java

示例6: createDefaultPipes

import cc.mallet.pipe.tsf.RegexMatches; //导入依赖的package包/类
public static SerialPipes createDefaultPipes(Alphabet dataAlphabet, Alphabet targetAlphabet) {
	List<Pipe> pipes = new ArrayList<Pipe>();
	pipes.add(new TokenText());
	pipes.add(new TokenTextCharPrefix("PREFIX=", 2));
	pipes.add(new TokenTextCharPrefix("PREFIX=", 3));
	pipes.add(new TokenTextCharSuffix("SUFFIX=", 2));
	pipes.add(new TokenTextCharSuffix("SUFFIX=", 3));
	pipes.add(new TokenTextCharNGrams("NGRAM=", new int[] { 2, 3 }));
	pipes.add(new RegexMatches("ALL_CAPS_REGEX", Pattern.compile(TextUtil.ALL_CAPS_REGEX)));
	pipes.add(new RegexMatches("ALPHA_NUMERIC_REGEX", Pattern.compile(TextUtil.ALPHA_NUMERIC_REGEX)));
	pipes.add(new RegexMatches("CAPS_MIX_REGEX", Pattern.compile(TextUtil.CAPS_MIX_REGEX)));
	pipes.add(new RegexMatches("EMAIL_REGEX", Pattern.compile(TextUtil.EMAIL_REGEX)));
	pipes.add(new RegexMatches("END_DASH_REGEX", Pattern.compile(TextUtil.END_DASH_REGEX)));
	pipes.add(new RegexMatches("EXP_NUMBER_REGEX", Pattern.compile(TextUtil.EXP_NUMBER_REGEX)));
	pipes.add(new RegexMatches("FLOATING_POINT_NUMBER_REGEX", Pattern.compile(TextUtil.FLOATING_POINT_NUMBER_REGEX)));
	pipes.add(new RegexMatches("FOUR_CAPS_REGEX", Pattern.compile(TextUtil.FOUR_CAPS_REGEX)));
	pipes.add(new RegexMatches("FOUR_DIGITS_REGEX", Pattern.compile(TextUtil.FOUR_DIGITS_REGEX)));
	pipes.add(new RegexMatches("HAS_DASH_REGEX", Pattern.compile(TextUtil.HAS_DASH_REGEX)));
	pipes.add(new RegexMatches("HAS_DIGIT_REGEX", Pattern.compile(TextUtil.HAS_DIGIT_REGEX)));
	pipes.add(new RegexMatches("HEX_REGEX", Pattern.compile(TextUtil.HEX_REGEX)));
	pipes.add(new RegexMatches("HTML_REGEX", Pattern.compile(TextUtil.HTML_REGEX)));
	pipes.add(new RegexMatches("IN_PARENTHESES_REGEX", Pattern.compile(TextUtil.IN_PARENTHESES_REGEX)));
	pipes.add(new RegexMatches("INIT_CAPS_ALPHA_REGEX", Pattern.compile(TextUtil.INIT_CAPS_ALPHA_REGEX)));
	pipes.add(new RegexMatches("INIT_CAPS_REGEX", Pattern.compile(TextUtil.INIT_CAPS_REGEX)));
	pipes.add(new RegexMatches("INIT_DASH_REGEX", Pattern.compile(TextUtil.INIT_DASH_REGEX)));
	pipes.add(new RegexMatches("IP_REGEX", Pattern.compile(TextUtil.IP_REGEX)));
	pipes.add(new RegexMatches("NEGATIVE_INTEGER_REGEX", Pattern.compile(TextUtil.NEGATIVE_INTEGER_REGEX)));
	pipes.add(new RegexMatches("ONE_CAP_REGEX", Pattern.compile(TextUtil.ONE_CAP_REGEX)));
	pipes.add(new RegexMatches("ONE_DIGIT_REGEX", Pattern.compile(TextUtil.ONE_DIGIT_REGEX)));
	pipes.add(new RegexMatches("POSITIVE_INTEGER_REGEX", Pattern.compile(TextUtil.POSITIVE_INTEGER_REGEX)));
	pipes.add(new RegexMatches("PUNCTUATION_REGEX", Pattern.compile(TextUtil.PUNCTUATION_REGEX)));
	pipes.add(new RegexMatches("ROMAN_NUMBER_CAPITAL_REGEX", Pattern.compile(TextUtil.ROMAN_NUMBER_CAPITAL_REGEX)));
	pipes.add(new RegexMatches("ROMAN_NUMBER_SMALL_REGEX", Pattern.compile(TextUtil.ROMAN_NUMBER_SMALL_REGEX)));
	pipes.add(new RegexMatches("SINGLE_INITIAL_REGEX", Pattern.compile(TextUtil.SINGLE_INITIAL_REGEX)));
	pipes.add(new RegexMatches("THREE_CAPS_REGEX", Pattern.compile(TextUtil.THREE_CAPS_REGEX)));
	pipes.add(new RegexMatches("THREE_DIGITS_REGEX", Pattern.compile(TextUtil.THREE_DIGITS_REGEX)));
	pipes.add(new RegexMatches("TWO_CAPS_REGEX", Pattern.compile(TextUtil.TWO_CAPS_REGEX)));
	pipes.add(new RegexMatches("TWO_DIGITS_REGEX", Pattern.compile(TextUtil.TWO_DIGITS_REGEX)));
	pipes.add(new RegexMatches("URL_REGEX", Pattern.compile(TextUtil.URL_REGEX)));
	pipes.add(new RegexMatches("YEAR_REGEX", Pattern.compile(TextUtil.YEAR_REGEX)));
	pipes.add(new RegexMatches("OBD_REGEX", Pattern.compile(TextUtil.OBD_REGEX)));
	pipes.add(new RegexMatches("ONE_QUESTION_MARK_REGEX", Pattern.compile(TextUtil.ONE_QUESTION_MARK_REGEX)));
	pipes.add(new RegexMatches("TWO_QUESTION_MARKS_REGEX", Pattern.compile(TextUtil.TWO_QUESTION_MARKS_REGEX)));
	pipes.add(new RegexMatches("THREE_QUESTION_MARKS_REGEX", Pattern.compile(TextUtil.THREE_QUESTION_MARKS_REGEX)));
	pipes.add(new RegexMatches("MULTIPLE_QUESTION_MARKS_REGEX", Pattern
			.compile(TextUtil.MULTIPLE_QUESTION_MARKS_REGEX)));
	pipes.add(new RegexMatches("ONE_EXCLAMATION_MARK_REGEX", Pattern.compile(TextUtil.ONE_EXCLAMATION_MARK_REGEX)));
	pipes.add(new RegexMatches("TWO_EXCLAMATION_MARKS_REGEX", Pattern.compile(TextUtil.TWO_EXCLAMATION_MARKS_REGEX)));
	pipes.add(new RegexMatches("THREE_EXCLAMATION_MARKS_REGEX", Pattern
			.compile(TextUtil.THREE_EXCLAMATION_MARKS_REGEX)));
	pipes.add(new RegexMatches("MULTIPLE_EXCLAMATION_MARKS_REGEX", Pattern
			.compile(TextUtil.MULTIPLE_EXCLAMATION_MARKS_REGEX)));
	pipes.add(new RegexMatches("QUESTION_EXCLAMATION_MARK_REGEX", Pattern
			.compile(TextUtil.QUESTION_EXCLAMATION_MARK_REGEX)));
	pipes.add(new RegexMatches("EXCLAMATION_QUESTION_MARK_REGEX", Pattern
			.compile(TextUtil.EXCLAMATION_QUESTION_MARK_REGEX)));
	pipes.add(new OffsetConjunctions(new int[][] { { -1 }, { 1 } }));
	pipes.add(new TokenSequence2FeatureVectorSequence(targetAlphabet));
	SerialPipes serialPipes = new SerialPipes(pipes);
	serialPipes.setDataAlphabet(dataAlphabet);
	serialPipes.setTargetAlphabet(targetAlphabet);
	serialPipes.setTargetProcessing(true);
	return serialPipes;
}

开发者ID:jdmp，项目名称:java-data-mining-package，代码行数:65，代码来源:MalletUtil.java

示例7: NEPipes

import cc.mallet.pipe.tsf.RegexMatches; //导入依赖的package包/类
public NEPipes() {
    super(
            new Pipe[] {
                    //new TokenText( "text=" ),

                    new RegexMatches( "SingleLetter", Pattern.compile( "[A-Za-z]" ) ),
                    new RegexMatches( "AllCaps", Pattern.compile( ALLCAPS ) ),
                    new RegexMatches( "AllLower", Pattern.compile( ALLLOWER ) ),
                    new RegexMatches( "InitCaps", Pattern.compile( INITCAPS ) ),
                    new RegexMatches( "MixedCase", Pattern.compile( MIXEDCASE ) ),
                    new RegexMatches( "MixedNum", Pattern.compile( MIXEDNUM ) ),
                    new RegexMatches( "EndSentPunc", Pattern.compile( ENDSENTENCE ) ),
                    new RegexMatches( "Punc", Pattern.compile( PUNCTUATION ) ),
                    new RegexMatches( "Bracket", Pattern.compile( BRACKET ) ),
                    new RegexMatches( "Ordinal", Pattern.compile( ORDINAL, Pattern.CASE_INSENSITIVE ) ),

                    new LongRegexMatches( "Quoted", Pattern.compile( QUOTED ), 1, 4 ),
                    new LongRegexMatches( "Bracketed", Pattern.compile( BRACKETED ), 1, 4 ),
                    new LongRegexMatches( "Initial", Pattern.compile( INITIAL ), 2, 2 ),
                    new LongRegexMatches( "Ellipse", Pattern.compile( DOTS ), 1, 2 ),
                    new LongRegexMatches( "Dashes", Pattern.compile( DASHES ), 2, 2 ),
                    new LongRegexMatches( "Fraction", Pattern.compile( FRACTION ), 1, 3 ),
                    new LongRegexMatches( "DotDecimal", Pattern.compile( DOTDECIMAL ), 1, 3 ),

                    new LongRegexMatches( "Percent", Pattern.compile( "(" + RANGE + "|" + DECIMAL + ")%" ), 2, 4 ),
                    new RegexMatches( "10^3n", Pattern.compile( ILLION, Pattern.CASE_INSENSITIVE ) ),
                    new LongRegexMatches( "Numeric", Pattern.compile( DECIMAL ), 1, 3 ),
                    new LongRegexMatches( "BigNumber", Pattern.compile( COMMA_DECIMAL ), 1, 7 ),
                    new LongRegexMatches( "kmbNumber",
                            Pattern.compile( DECIMAL + ILLION, Pattern.CASE_INSENSITIVE ), 1, 4 ),
                    new RegexMatches( "kmbMixed", Pattern.compile( MIXED_ILLION, Pattern.CASE_INSENSITIVE ) ),
                    new LongRegexMatches( "Dollars", Pattern.compile( "[$](" + RANGE + "|" + DECIMAL + "|"
                            + COMMA_DECIMAL + "|" + DECIMAL + ILLION + "|" + MIXED_ILLION + ")",
                            Pattern.CASE_INSENSITIVE ), 2, 8 ),

                    new RegexMatches( "NumberWord", Pattern.compile( NUMBER_WORD, Pattern.CASE_INSENSITIVE ) ),
                   //FIXME useful beyond this?
                    new RegexMatches( "Currency", Pattern.compile( CURRENCY, Pattern.CASE_INSENSITIVE ) ),
                    new LongRegexMatches( "MoneyWords", Pattern.compile( MONEYWORDS, Pattern.CASE_INSENSITIVE ), 2,
                            4 ),

                    new LongRegexMatches( "AmPm", Pattern.compile( AMPM, Pattern.CASE_INSENSITIVE ), 1, 4 ),
                    new RegexMatches( "MixedAmPm", Pattern.compile( MIXED_AMPM, Pattern.CASE_INSENSITIVE ) ),
                    new LongRegexMatches( "TimeNum", Pattern.compile( TIMENUM ), 3, 5 ),
                    new RegexMatches( "TimeZone", Pattern.compile( TIMEZONES, Pattern.CASE_INSENSITIVE ) ),
                    new LongRegexMatches( "Time", Pattern.compile( TIME, Pattern.CASE_INSENSITIVE ), 1, 9 ),
                    new LongRegexMatches( "TimeRange", Pattern.compile( TIMERANGE, Pattern.CASE_INSENSITIVE ), 3,
                            19 ),

                    new LongRegexMatches( "P10", Pattern.compile( P10 ), 3, 7 ),
                    new LongRegexMatches( "P5", Pattern.compile( P10 ), 3, 3 ),
                    new LongRegexMatches( "Phone", Pattern.compile( P10 + "|" + P5 ), 3, 7 ),

                    new RegexMatches( "UncasedMonthName", Pattern.compile( MONTHNAME, Pattern.CASE_INSENSITIVE ) ),
                    new LongRegexMatches( "UncasedMonthAbbr",
                            Pattern.compile( MONTHABBR, Pattern.CASE_INSENSITIVE ), 1, 2 ),
                    new LongRegexMatches( "CasedMonth", Pattern.compile( MONTH ), 1, 2 ),
                    new LongRegexMatches( "UncasedMonth", Pattern.compile( MONTH, Pattern.CASE_INSENSITIVE ), 1, 2 ),

                    new RegexMatches( "UncasedWeekdayName", Pattern.compile( WEEKDAYNAME, Pattern.CASE_INSENSITIVE ) ),
                    new LongRegexMatches( "UncasedWeekdayAbbr", Pattern.compile( WEEKDAYABBR,
                            Pattern.CASE_INSENSITIVE ), 1, 2 ),
                    new LongRegexMatches( "CasedWeekday", Pattern.compile( WEEKDAY ), 1, 2 ),
                    new LongRegexMatches( "UncasedWeekday", Pattern.compile( WEEKDAY, Pattern.CASE_INSENSITIVE ),
                            1, 2 ),

                    new LongRegexMatches( "MonthDay", Pattern.compile( MONTHDAY, Pattern.CASE_INSENSITIVE ), 2, 3 ),
                    new LongRegexMatches( "DayMonthDay", Pattern.compile( DAYMONTHDAY, Pattern.CASE_INSENSITIVE ),
                            3, 6 ),
                    new LongRegexMatches( "MonthYear", Pattern.compile( MONTHYEAR, Pattern.CASE_INSENSITIVE ), 2, 4 ),
                    new LongRegexMatches( "MonthDayYear",
                            Pattern.compile( MONTHDAYYEAR, Pattern.CASE_INSENSITIVE ), 3, 5 ),
                    new LongRegexMatches( "DayMonthDayYear", Pattern.compile( DAYMONTHDAYYEAR,
                            Pattern.CASE_INSENSITIVE ), 4, 8 ),

                    new LongRegexMatches( "SeparatorDate", Pattern.compile( SEPDATE ), 3, 5 ),
                    new LongRegexMatches( "FullSeparatorDate", Pattern.compile( FULLSEPDATE ), 5, 5 ),
            } );
}

开发者ID:BlueBrain，项目名称:bluima，代码行数:80，代码来源:NEPipes.java

示例8: TrainCRF

import cc.mallet.pipe.tsf.RegexMatches; //导入依赖的package包/类
public TrainCRF(String trainingFilename, String testingFilename) throws IOException {

        ArrayList<Pipe> pipes = new ArrayList<Pipe>();

        int[][] conjunctions = new int[2][];
        conjunctions[0] = new int[] { -1 };
        conjunctions[1] = new int[] { 1 };

        pipes.add(new SimpleTaggerSentence2TokenSequence());
        pipes.add(new OffsetConjunctions(conjunctions));
        //pipes.add(new FeaturesInWindow("PREV-", -1, 1));
        pipes.add(new TokenTextCharSuffix("C1=", 1));
        pipes.add(new TokenTextCharSuffix("C2=", 2));
        pipes.add(new TokenTextCharSuffix("C3=", 3));
        pipes.add(new RegexMatches("CAPITALIZED", Pattern.compile("^\\p{Lu}.*")));
        pipes.add(new RegexMatches("STARTSNUMBER", Pattern.compile("^[0-9].*")));
        pipes.add(new RegexMatches("HYPHENATED", Pattern.compile(".*\\-.*")));
        pipes.add(new RegexMatches("DOLLARSIGN", Pattern.compile(".*\\$.*")));
        pipes.add(new TokenFirstPosition("FIRSTTOKEN"));
        pipes.add(new TokenSequence2FeatureVectorSequence());

        Pipe pipe = new SerialPipes(pipes);

        InstanceList trainingInstances = new InstanceList(pipe);
        InstanceList testingInstances = new InstanceList(pipe);

        trainingInstances.addThruPipe(new LineGroupIterator(new BufferedReader(new InputStreamReader(new GZIPInputStream(new FileInputStream(trainingFilename)))), Pattern.compile("^\\s*$"), true));
        testingInstances.addThruPipe(new LineGroupIterator(new BufferedReader(new InputStreamReader(new GZIPInputStream(new FileInputStream(testingFilename)))), Pattern.compile("^\\s*$"), true));

        CRF crf = new CRF(pipe, null);
        //crf.addStatesForLabelsConnectedAsIn(trainingInstances);
        crf.addStatesForThreeQuarterLabelsConnectedAsIn(trainingInstances);
        crf.addStartState();

        CRFTrainerByLabelLikelihood trainer =
                new CRFTrainerByLabelLikelihood(crf);
        trainer.setGaussianPriorVariance(10.0);

        //CRFTrainerByStochasticGradient trainer =
        //new CRFTrainerByStochasticGradient(crf, 1.0);

        //CRFTrainerByL1LabelLikelihood trainer =
        //	new CRFTrainerByL1LabelLikelihood(crf, 0.75);

        //trainer.addEvaluator(new PerClassAccuracyEvaluator(trainingInstances, "training"));
        trainer.addEvaluator(new PerClassAccuracyEvaluator(testingInstances, "testing"));
        trainer.addEvaluator(new TokenAccuracyEvaluator(testingInstances, "testing"));
        trainer.train(trainingInstances);

    }

开发者ID:karahindiba，项目名称:WikiInfoboxExtractor，代码行数:51，代码来源:TrainCRF.java

示例9: TrainWikiCRF

import cc.mallet.pipe.tsf.RegexMatches; //导入依赖的package包/类
public TrainWikiCRF(String trainingFilename, String testingFilename) throws IOException {
	
	ArrayList<Pipe> pipes = new ArrayList<Pipe>();

	int[][] conjunctions = new int[2][];
	conjunctions[0] = new int[] { -1 };
	conjunctions[1] = new int[] { 1 };

	pipes.add(new SimpleTaggerSentence2TokenSequence());
	pipes.add(new OffsetConjunctions(conjunctions));
	//pipes.add(new FeaturesInWindow("PREV-", -1, 1));
	pipes.add(new TokenTextCharSuffix("C1=", 1));
	pipes.add(new TokenTextCharSuffix("C2=", 2));
	pipes.add(new TokenTextCharSuffix("C3=", 3));
	pipes.add(new RegexMatches("CAPITALIZED", Pattern.compile("^\\p{Lu}.*")));
	pipes.add(new RegexMatches("STARTSNUMBER", Pattern.compile("^[0-9].*")));
	pipes.add(new RegexMatches("HYPHENATED", Pattern.compile(".*\\-.*")));
	pipes.add(new RegexMatches("DOLLARSIGN", Pattern.compile(".*\\$.*")));
	pipes.add(new TokenFirstPosition("FIRSTTOKEN"));
	pipes.add(new TokenSequence2FeatureVectorSequence());

	Pipe pipe = new SerialPipes(pipes);

	InstanceList trainingInstances = new InstanceList(pipe);
	InstanceList testingInstances = new InstanceList(pipe);

	trainingInstances.addThruPipe(new LineGroupIterator(new BufferedReader(new InputStreamReader(new GZIPInputStream(new FileInputStream(trainingFilename)))), Pattern.compile("^\\s*$"), true));
	testingInstances.addThruPipe(new LineGroupIterator(new BufferedReader(new InputStreamReader(new GZIPInputStream(new FileInputStream(testingFilename)))), Pattern.compile("^\\s*$"), true));
	
	CRF crf = new CRF(pipe, null);
	//crf.addStatesForLabelsConnectedAsIn(trainingInstances);
	crf.addStatesForThreeQuarterLabelsConnectedAsIn(trainingInstances);
	crf.addStartState();

	CRFTrainerByLabelLikelihood trainer = 
		new CRFTrainerByLabelLikelihood(crf);
	trainer.setGaussianPriorVariance(10.0);

	//CRFTrainerByStochasticGradient trainer = 
	//new CRFTrainerByStochasticGradient(crf, 1.0);

	//CRFTrainerByL1LabelLikelihood trainer = 
	//	new CRFTrainerByL1LabelLikelihood(crf, 0.75);

	//trainer.addEvaluator(new PerClassAccuracyEvaluator(trainingInstances, "training"));
	trainer.addEvaluator(new PerClassAccuracyEvaluator(testingInstances, "testing"));
	trainer.addEvaluator(new TokenAccuracyEvaluator(testingInstances, "testing"));
	trainer.train(trainingInstances);
	
}

开发者ID:karahindiba，项目名称:WikiInfoboxExtractor，代码行数:51，代码来源:TrainWikiCRF.java

注：本文中的cc.mallet.pipe.tsf.RegexMatches类示例整理自Github/MSDocs等源码及文档管理平台，相关代码片段筛选自各路编程大神贡献的开源项目，源码版权归原作者所有，传播和使用请参考对应项目的License；未经允许，请勿转载。

鲜花

握手

雷人

路过

鸡蛋

该文章已有0人参与评论

请发表评论

全部评论

专题导读

More+

10-27 六六分期app的软件客服如何联系？(六六分期

11-06 可心卡盟:win10系统火狐flash插件崩溃怎么

11-06 亲亲特价:怎么删除回收站图标

11-06 济南大学虚拟社区:鲁大师节能降温的具体办

11-06 xlueops.exe:无线网络安装向导

11-06 女斗合众国:win7系统cf与主机连接不稳定怎

11-06 0xc000022-[cf烟雾头]cf怎么调烟雾头

11-06 qizideyouhuo:应用程序无法正常启动0xc0000

11-06 ipz-185:win7系统vcf文件怎么打开

11-06 傻哥蹦迪:win10系统s4怎么打开usb调试

11-06 八神浩树gtaste:回收站清空了怎么恢复

11-06 妖尾之黑色守护:win10系统电脑没有1440x900

11-06 校园至尊魔王小说:win7系统浏览网页时字体

11-06 女斗合众国:win10系统访问共享文件夹提示请

11-06 tokyo hot n0654:恢复win7系统默认字体一招

11-06 雨酷仙境:设置win7系统转移临时文件夹腾出

11-06 阿穆纳伊之杖:win7系统开始菜单在右边还原

11-06 tunespotting:win10系统火狐flash插件总是

11-06 甘尔葛分析师：计谋网站seo关键词暴涨有什

11-06 蔡贵霖: 计谋网站seo关键词暴涨有什么秘密

11-06 博益网首页:ao3网页版进入不了解决方法

11-06 漏斗子专栏: 网站数据分析小白易懂精华篇

11-06 见证双虹怎么做:win7系统开启telnet命令的

11-06 颾狐蝶蜋:系统资源不足无法完成请求的服务

11-06 国光中学校歌:提交网站到alexa查询详细步骤

11-06 西安有情天:静态网页和动态网页的区别

11-06 红木雅尚斋:外部链接构造对网站的好处

11-06 前官礼遇：防止域名劫持–增强域安全性的10

11-06 密传二转答案: 中文分词算法有哪些

11-06 金泉家园邮编:百度快照劫持的表现及应对方

Java Instruction30t类代码示例发布时间：2022-05-22

Java FieldInfo类代码示例发布时间：2022-05-22

剪的笔顺,诠释剪的笔画,认识剪的部首

1 六六分期app的软件客服如何联系？(六六分期

六六分期app的软件客服如何联系？不知道吗？加qq群【895510560】即可！标题：六六分期

阅读：18047|2023-10-27

2 可心卡盟:win10系统火狐flash插件崩溃怎么

今天小编告诉大家如何处理win10系统火狐flash插件总是崩溃的问题，可能很多用户都不知

阅读：9601|2022-11-06

3 亲亲特价:怎么删除回收站图标

今天小编告诉大家如何对win10系统删除桌面回收站图标进行设置，可能很多用户都不知道

阅读：8143|2022-11-06

4 济南大学虚拟社区:鲁大师节能降温的具体办

今天小编告诉大家如何对win10系统电脑设置节能降温的设置方法，想必大家都遇到过需要

阅读：8525|2022-11-06

5 xlueops.exe:无线网络安装向导

我们在使用xp系统的过程中,经常需要对xp系统无线网络安装向导设置进行设置，可能很多

阅读：8428|2022-11-06

6 女斗合众国:win7系统cf与主机连接不稳定怎

今天小编告诉大家如何处理win7系统玩cf老是与主机连接不稳定的问题，可能很多用户都不

阅读：9334|2022-11-06

7 0xc000022-[cf烟雾头]cf怎么调烟雾头

电脑对日常生活的重要性小编就不多说了，可是一旦碰到win7系统设置cf烟雾头的问题，很

阅读：8392|2022-11-06

8 qizideyouhuo:应用程序无法正常启动0xc0000

我们在日常使用电脑的时候，有的小伙伴们可能在打开应用的时候会遇见提示应用程序无法

阅读：7827|2022-11-06

9 ipz-185:win7系统vcf文件怎么打开

今天小编告诉大家如何对win7系统打开vcf文件进行设置，可能很多用户都不知道怎么对win

阅读：8380|2022-11-06

10 傻哥蹦迪:win10系统s4怎么打开usb调试

今天小编告诉大家如何对win10系统s4开启USB调试模式进行设置，可能很多用户都不知道怎

阅读：7376|2022-11-06

客服电话

电子邮件

Java RegexMatches类代码示例

示例1: testMultiTagSerialization

示例2: addFullTextPipes

示例3: addPrefixPipes

示例4: addSubstringRegexPipes

示例5: testMultiTag

示例6: createDefaultPipes

示例7: NEPipes

示例8: TrainCRF

示例9: TrainWikiCRF

请发表评论

全部评论

上一篇：

下一篇：

librespeed/speedtest: Self-hosted Speedt

CVE-2022-30275

avehtari/BDA_m_demos: Bayesian Data Anal

四维彩超怎么看性别？四维看男孩女孩诀窍

膛的拼音和组词，带膛字词语大全

剪的笔顺,诠释剪的笔画,认识剪的部首

六六分期app的软件客服如何联系？(六六分期

florent37/ViewAnimator: A fluent Android

florent37/Shrine-MaterialDesign2: implem

CVE-2020-36276

SimpleSoftwareIO/simple-sms: Send and re

关于我们

产品与服务

解决方案

139-2527-9053