Java IndexingException类代码示例

OGeek|极客世界-中国程序员成长平台 › 门户 › 编程› Java›Java编程经验

原作者: [db:作者] 来自: [db:来源] 收藏邀请

本文整理汇总了Java中org.apache.nutch.indexer.IndexingException类的典型用法代码示例。如果您正苦于以下问题：Java IndexingException类的具体用法？Java IndexingException怎么用？Java IndexingException使用的例子？那么恭喜您, 这里精选的类代码示例或许可以为您提供帮助。

IndexingException类属于org.apache.nutch.indexer包，在下文中一共展示了IndexingException类的20个代码示例，这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞，您的评价将有助于我们的系统推荐出更棒的Java代码示例。

示例1: filter

import org.apache.nutch.indexer.IndexingException; //导入依赖的package包/类
/**
 * This will take the metatags that you have listed in your "urlmeta.tags"
 * property, and looks for them inside the CrawlDatum object. If they exist,
 * this will add it as an attribute inside the NutchDocument.
 * 
 * @see IndexingFilter#filter
 */
public NutchDocument filter(NutchDocument doc, Parse parse, Text url,
    CrawlDatum datum, Inlinks inlinks) throws IndexingException {
  if (conf != null)
    this.setConf(conf);

  if (urlMetaTags == null || doc == null)
    return doc;

  for (String metatag : urlMetaTags) {
    Text metadata = (Text) datum.getMetaData().get(new Text(metatag));

    if (metadata != null)
      doc.add(metatag, metadata.toString());
  }

  return doc;
}

开发者ID:jorcox，项目名称:GeoCrawler，代码行数:25，代码来源:URLMetaIndexingFilter.java

示例2: filter

import org.apache.nutch.indexer.IndexingException; //导入依赖的package包/类
public NutchDocument filter(NutchDocument doc, Parse parse, Text url,
    CrawlDatum datum, Inlinks inlinks) throws IndexingException {

  // check if LANGUAGE found, possibly put there by HTMLLanguageParser
  String lang = parse.getData().getParseMeta().get(Metadata.LANGUAGE);

  // check if HTTP-header tels us the language
  if (lang == null) {
    lang = parse.getData().getContentMeta().get(Response.CONTENT_LANGUAGE);
  }

  if (lang == null || lang.length() == 0) {
    lang = "unknown";
  }

  doc.add("lang", lang);

  return doc;
}

开发者ID:jorcox，项目名称:GeoCrawler，代码行数:20，代码来源:LanguageIndexingFilter.java

示例3: filter

import org.apache.nutch.indexer.IndexingException; //导入依赖的package包/类
/**
 * {@inheritDoc}
 */
public NutchDocument filter(NutchDocument doc, Parse parse, Text url,
    CrawlDatum datum, Inlinks inlinks) throws IndexingException {

  if (doc != null) {
    if (FIELDREPLACERS_BY_HOST.size() > 0) {
      this.doReplace(doc, "host", FIELDREPLACERS_BY_HOST);
    }

    if (FIELDREPLACERS_BY_URL.size() > 0) {
      this.doReplace(doc, "url", FIELDREPLACERS_BY_URL);
    }
  }

  return doc;
}

开发者ID:jorcox，项目名称:GeoCrawler，代码行数:19，代码来源:ReplaceIndexer.java

示例4: filter

import org.apache.nutch.indexer.IndexingException; //导入依赖的package包/类
@Override
public NutchDocument filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
		throws IndexingException {
	ParseData dataP = parse.getData();
	Metadata meta = dataP.getParseMeta();
	boolean index = false;
	
	for (String key : meta.names()) {
		if(key.equals("ogc_service"))
			index = true;
		String value = meta.get(key);
		LOG.info("Adding " + url + " to NutchDocument");
		doc.add(key, value);
	}
	/* Return the document if it is an ogc service, otherwise return null */
	return index ? doc : null;
}

开发者ID:jorcox，项目名称:GeoCrawler，代码行数:18，代码来源:OgcIndexingFilter.java

示例5: testOgcIndexingFilter

import org.apache.nutch.indexer.IndexingException; //导入依赖的package包/类
@Test
public void testOgcIndexingFilter() throws FileNotFoundException, URISyntaxException, IndexingException {
	File f = new File(getClass().getResource("testWMS.xml").toURI());
	@SuppressWarnings("resource")
	String contentValue = new Scanner(f).useDelimiter("\\Z").next();
	ParseResult testParseResult = Utils.createParseResultWithMetadata(new Metadata(), url);
	Content testContent = Utils.createContent(url, contentValue);

	OgcIndexingFilter indexingFilter = new OgcIndexingFilter();
	OgcParseFilter parseFilter = new OgcParseFilter();

	ParseResult res = parseFilter.filter(testContent, testParseResult, null, null);
	parse = res.get(url);

	NutchDocument doc = indexingFilter.filter(nutchDocument, parse, urlText, datum, inlinks);

	assertTrue("Comprobación de que el campo ogc_version esta indexado",
			doc.getFieldNames().contains("ogc_version"));
	assertTrue("Comprobación de que el campo ogc_service esta indexado",
			doc.getFieldNames().contains("ogc_service"));
	assertTrue("Comprobación de que el campo raw_content esta indexado",
			doc.getFieldNames().contains("raw_content"));
}

开发者ID:jorcox，项目名称:GeoCrawler，代码行数:24，代码来源:OgcIndexingFilterTest.java

示例6: filter

import org.apache.nutch.indexer.IndexingException; //导入依赖的package包/类
/**
 * This will take the metatags that you have listed in your "urlmeta.tags"
 * property, and looks for them inside the CrawlDatum object. If they exist,
 * this will add it as an attribute inside the NutchDocument.
 * 
 * @see IndexingFilter#filter
 */
public NutchDocument filter(NutchDocument doc, Parse parse, Text url,
		CrawlDatum datum, Inlinks inlinks) throws IndexingException {
	if (conf != null)
		this.setConf(conf);

	if (urlMetaTags == null || doc == null)
		return doc;

	for (String metatag : urlMetaTags) {
		Text metadata = (Text) datum.getMetaData().get(new Text(metatag));

		if (metadata != null)
			doc.add(metatag, metadata.toString());
	}

	return doc;
}

开发者ID:yahoo，项目名称:anthelion，代码行数:25，代码来源:URLMetaIndexingFilter.java

示例7: filter

import org.apache.nutch.indexer.IndexingException; //导入依赖的package包/类
public NutchDocument filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
  throws IndexingException {

  // check if LANGUAGE found, possibly put there by HTMLLanguageParser
  String lang = parse.getData().getParseMeta().get(Metadata.LANGUAGE);

  // check if HTTP-header tels us the language
  if (lang == null) {
      lang = parse.getData().getContentMeta().get(Response.CONTENT_LANGUAGE);
  }

  if (lang == null || lang.length() == 0) {
    lang = "unknown";
  }

  doc.add("lang", lang);

  return doc;
}

开发者ID:yahoo，项目名称:anthelion，代码行数:20，代码来源:LanguageIndexingFilter.java

示例8: getSiteHashFromJsonStream

import org.apache.nutch.indexer.IndexingException; //导入依赖的package包/类
/**
 * gets the siteHash from the received Json data in form of InputStream
 *
 * @param stream
 *            with Json data
 * @return siteHash
 * @throws IndexingException
 */
protected String getSiteHashFromJsonStream(InputStream stream)
		throws IndexingException {
	try {
		JsonNode rootNode = jsonMapper.readValue(stream, JsonNode.class);
		String siteHash = rootNode.get("sitehash").getTextValue();

		LOG.info("TYPO3 Solr siteHash retrieved: " + siteHash);

		return siteHash;
	} catch (Exception e) {
		LOG.error("ERROR! could not receive correct siteHash data from the Solr TYPO3 Api");

		throw (new IndexingException(e));
	} finally {
		if (stream != null) {
			try {
				stream.close();
			} catch (IOException streamException) {
				LOG.error(streamException.getMessage());
			}
		}
	}
}

开发者ID:dkd，项目名称:nutch-typo3-cms，代码行数:32，代码来源:SiteHashIndexingFilter.java

示例9: filter

import org.apache.nutch.indexer.IndexingException; //导入依赖的package包/类
@Override
public NutchDocument filter(NutchDocument doc, Parse parse, Text url,
		CrawlDatum datum, Inlinks inlinks) throws IndexingException {

	// convert ISO date to time stamp
	String isoDate = conf.get(CONF_ENDTIME_PROPERTY, "1970-01-01T00:00:00Z");
	long epoch = 0;
	try {
		epoch = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ssZ").parse(isoDate).getTime();
	} catch (ParseException e) {
		LOG.error("ERROR! Cannot parse date, must fit pattern yyyy-MM-dd'T'HH:mm:ssZ : " + isoDate);
	}

	// Index the endtime
	doc.add(INDEXING_FIELD, new Date(epoch));

	return doc;
}

开发者ID:dkd，项目名称:nutch-typo3-cms，代码行数:19，代码来源:EndtimeIndexingFilter.java

示例10: filter

import org.apache.nutch.indexer.IndexingException; //导入依赖的package包/类
@Override
public NutchDocument filter(NutchDocument document, String s, WebPage webPage) throws IndexingException {
    if (storageField != null) {
        CharsetDecoder decoder = Charset.forName("UTF-8").newDecoder();
        try {
            String strippedContent = decoder.decode(webPage.getMetadata().get(new Utf8(storageField))).toString();
            if (strippedContent != null) {
                document.add(storageField, strippedContent);
            }
        } catch (CharacterCodingException e) {
            e.printStackTrace();
        }
    }

    return document;
}

开发者ID:kaqqao，项目名称:nutch-element-selector，代码行数:17，代码来源:HtmlElementSelectorIndexer.java

示例11: testOgcIndexingFilter

import org.apache.nutch.indexer.IndexingException; //导入依赖的package包/类
@Test
public void testOgcIndexingFilter() throws FileNotFoundException, URISyntaxException, IndexingException {
	int results = th.execQuery("agua");
	assertEquals(results, 1);
	results = th.execQuery("вода");
	assertEquals(results, 1);
	results = th.execQuery("Mar");
	assertEquals(results, 30);

}

开发者ID:jorcox，项目名称:GeoCrawler，代码行数:11，代码来源:ThesaurusTest.java

示例12: filter

import org.apache.nutch.indexer.IndexingException; //导入依赖的package包/类
public NutchDocument filter(NutchDocument doc, Parse parse, Text url,
    CrawlDatum datum, Inlinks inlinks) throws IndexingException {

  // Check if some Rel-Tags found, possibly put there by RelTagParser
  String[] tags = parse.getData().getParseMeta()
      .getValues(RelTagParser.REL_TAG);
  if (tags != null) {
    for (int i = 0; i < tags.length; i++) {
      doc.add("tag", tags[i]);
    }
  }

  return doc;
}

开发者ID:jorcox，项目名称:GeoCrawler，代码行数:15，代码来源:RelTagIndexingFilter.java

示例13: filter

import org.apache.nutch.indexer.IndexingException; //导入依赖的package包/类
public NutchDocument filter(NutchDocument doc, Parse parse, Text urlText,
    CrawlDatum datum, Inlinks inlinks) throws IndexingException {

  try {
    URL url = new URL(urlText.toString());
    DomainSuffix d = URLUtil.getDomainSuffix(url);

    doc.add("tld", d.getDomain());

  } catch (Exception ex) {
    LOG.warn(ex.toString());
  }

  return doc;
}

开发者ID:jorcox，项目名称:GeoCrawler，代码行数:16，代码来源:TLDIndexingFilter.java

示例14: filter

import org.apache.nutch.indexer.IndexingException; //导入依赖的package包/类
public NutchDocument filter(NutchDocument doc, Parse parse, Text url,
    CrawlDatum datum, Inlinks inlinks) throws IndexingException {

  String url_s = url.toString();

  addTime(doc, parse.getData(), url_s, datum);
  addLength(doc, parse.getData(), url_s);
  addType(doc, parse.getData(), url_s, datum);
  resetTitle(doc, parse.getData(), url_s);

  return doc;
}

开发者ID:jorcox，项目名称:GeoCrawler，代码行数:13，代码来源:MoreIndexingFilter.java

示例15: assertContentType

import org.apache.nutch.indexer.IndexingException; //导入依赖的package包/类
private void assertContentType(Configuration conf, String source,
    String expected) throws IndexingException {
  Metadata metadata = new Metadata();
  metadata.add(Response.CONTENT_TYPE, source);
  MoreIndexingFilter filter = new MoreIndexingFilter();
  filter.setConf(conf);
  NutchDocument doc = filter.filter(new NutchDocument(), new ParseImpl(
      "text", new ParseData(new ParseStatus(), "title", new Outlink[0],
          metadata)), new Text("http://www.example.com/"), new CrawlDatum(),
      new Inlinks());
  Assert.assertEquals("mime type not detected", expected,
      doc.getFieldValue("type"));
}

开发者ID:jorcox，项目名称:GeoCrawler，代码行数:14，代码来源:TestMoreIndexingFilter.java

示例16: filter

import org.apache.nutch.indexer.IndexingException; //导入依赖的package包/类
/**
 * The {@link AnchorIndexingFilter} filter object which supports boolean
 * configuration settings for the deduplication of anchors. See
 * {@code anchorIndexingFilter.deduplicate} in nutch-default.xml.
 * 
 * @param doc
 *          The {@link NutchDocument} object
 * @param parse
 *          The relevant {@link Parse} object passing through the filter
 * @param url
 *          URL to be filtered for anchor text
 * @param datum
 *          The {@link CrawlDatum} entry
 * @param inlinks
 *          The {@link Inlinks} containing anchor text
 * @return filtered NutchDocument
 */
public NutchDocument filter(NutchDocument doc, Parse parse, Text url,
    CrawlDatum datum, Inlinks inlinks) throws IndexingException {

  String[] anchors = (inlinks != null ? inlinks.getAnchors() : new String[0]);

  HashSet<String> set = null;

  for (int i = 0; i < anchors.length; i++) {
    if (deduplicate) {
      if (set == null)
        set = new HashSet<String>();
      String lcAnchor = anchors[i].toLowerCase();

      // Check if already processed the current anchor
      if (!set.contains(lcAnchor)) {
        doc.add("anchor", anchors[i]);

        // Add to map
        set.add(lcAnchor);
      }
    } else {
      doc.add("anchor", anchors[i]);
    }
  }

  return doc;
}

开发者ID:jorcox，项目名称:GeoCrawler，代码行数:45，代码来源:AnchorIndexingFilter.java

示例17: filter

import org.apache.nutch.indexer.IndexingException; //导入依赖的package包/类
public NutchDocument filter(NutchDocument doc, Parse parse, Text url,
    CrawlDatum datum, Inlinks inlinks) throws IndexingException {

  Metadata metadata = parse.getData().getParseMeta();
  // index the license
  String licenseUrl = metadata.get(CreativeCommons.LICENSE_URL);
  if (licenseUrl != null) {
    if (LOG.isInfoEnabled()) {
      LOG.info("CC: indexing " + licenseUrl + " for: " + url.toString());
    }

    // add the entire license as cc:license=xxx
    addFeature(doc, "license=" + licenseUrl);

    // index license attributes extracted of the license url
    addUrlFeatures(doc, licenseUrl);
  }

  // index the license location as cc:meta=xxx
  String licenseLocation = metadata.get(CreativeCommons.LICENSE_LOCATION);
  if (licenseLocation != null) {
    addFeature(doc, "meta=" + licenseLocation);
  }

  // index the work type cc:type=xxx
  String workType = metadata.get(CreativeCommons.WORK_TYPE);
  if (workType != null) {
    addFeature(doc, workType);
  }

  return doc;
}

开发者ID:jorcox，项目名称:GeoCrawler，代码行数:33，代码来源:CCIndexingFilter.java

示例18: filter

import org.apache.nutch.indexer.IndexingException; //导入依赖的package包/类
@Override
public NutchDocument filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
		throws IndexingException {
	String text = parse.getText();
	doc.add("length", text.length());		
	return doc;
}

开发者ID:jorcox，项目名称:GeoCrawler，代码行数:8，代码来源:LengthIndexingFilter.java

示例19: filter

import org.apache.nutch.indexer.IndexingException; //导入依赖的package包/类
public NutchDocument filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
  throws IndexingException {

  // Check if some Rel-Tags found, possibly put there by RelTagParser
  String[] tags = parse.getData().getParseMeta().getValues(RelTagParser.REL_TAG);
  if (tags != null) {
    for (int i=0; i<tags.length; i++) {
      doc.add("tag", tags[i]);
    }
  }

  return doc;
}

开发者ID:yahoo，项目名称:anthelion，代码行数:14，代码来源:RelTagIndexingFilter.java

示例20: filter

import org.apache.nutch.indexer.IndexingException; //导入依赖的package包/类
public NutchDocument filter(NutchDocument doc, Parse parse, Text urlText, CrawlDatum datum, Inlinks inlinks)
throws IndexingException {

  try {
    URL url = new URL(urlText.toString());
    DomainSuffix d = URLUtil.getDomainSuffix(url);
    
    doc.add("tld", d.getDomain());
    
  }catch (Exception ex) {
    LOG.warn(ex.toString());
  }

  return doc;
}

开发者ID:yahoo，项目名称:anthelion，代码行数:16，代码来源:TLDIndexingFilter.java

注：本文中的org.apache.nutch.indexer.IndexingException类示例整理自Github/MSDocs等源码及文档管理平台，相关代码片段筛选自各路编程大神贡献的开源项目，源码版权归原作者所有，传播和使用请参考对应项目的License；未经允许，请勿转载。

鲜花

握手

雷人

路过

鸡蛋

该文章已有0人参与评论

请发表评论

全部评论

专题导读

More+

10-27 六六分期app的软件客服如何联系？(六六分期

11-06 可心卡盟:win10系统火狐flash插件崩溃怎么

11-06 亲亲特价:怎么删除回收站图标

11-06 济南大学虚拟社区:鲁大师节能降温的具体办

11-06 xlueops.exe:无线网络安装向导

11-06 女斗合众国:win7系统cf与主机连接不稳定怎

11-06 0xc000022-[cf烟雾头]cf怎么调烟雾头

11-06 qizideyouhuo:应用程序无法正常启动0xc0000

11-06 ipz-185:win7系统vcf文件怎么打开

11-06 傻哥蹦迪:win10系统s4怎么打开usb调试

11-06 八神浩树gtaste:回收站清空了怎么恢复

11-06 妖尾之黑色守护:win10系统电脑没有1440x900

11-06 校园至尊魔王小说:win7系统浏览网页时字体

11-06 女斗合众国:win10系统访问共享文件夹提示请

11-06 tokyo hot n0654:恢复win7系统默认字体一招

11-06 雨酷仙境:设置win7系统转移临时文件夹腾出

11-06 阿穆纳伊之杖:win7系统开始菜单在右边还原

11-06 tunespotting:win10系统火狐flash插件总是

11-06 甘尔葛分析师：计谋网站seo关键词暴涨有什

11-06 蔡贵霖: 计谋网站seo关键词暴涨有什么秘密

11-06 博益网首页:ao3网页版进入不了解决方法

11-06 漏斗子专栏: 网站数据分析小白易懂精华篇

11-06 见证双虹怎么做:win7系统开启telnet命令的

11-06 颾狐蝶蜋:系统资源不足无法完成请求的服务

11-06 国光中学校歌:提交网站到alexa查询详细步骤

11-06 西安有情天:静态网页和动态网页的区别

11-06 红木雅尚斋:外部链接构造对网站的好处

11-06 前官礼遇：防止域名劫持–增强域安全性的10

11-06 密传二转答案: 中文分词算法有哪些

11-06 金泉家园邮编:百度快照劫持的表现及应对方

Java Response类代码示例发布时间：2022-05-22

Java Standardize类代码示例发布时间：2022-05-22

剪的笔顺,诠释剪的笔画,认识剪的部首

1 六六分期app的软件客服如何联系？(六六分期

六六分期app的软件客服如何联系？不知道吗？加qq群【895510560】即可！标题：六六分期

阅读：18245|2023-10-27

2 可心卡盟:win10系统火狐flash插件崩溃怎么

今天小编告诉大家如何处理win10系统火狐flash插件总是崩溃的问题，可能很多用户都不知

阅读：9668|2022-11-06

3 亲亲特价:怎么删除回收站图标

今天小编告诉大家如何对win10系统删除桌面回收站图标进行设置，可能很多用户都不知道

阅读：8175|2022-11-06

4 济南大学虚拟社区:鲁大师节能降温的具体办

今天小编告诉大家如何对win10系统电脑设置节能降温的设置方法，想必大家都遇到过需要

阅读：8547|2022-11-06

5 xlueops.exe:无线网络安装向导

我们在使用xp系统的过程中,经常需要对xp系统无线网络安装向导设置进行设置，可能很多

阅读：8454|2022-11-06

6 女斗合众国:win7系统cf与主机连接不稳定怎

今天小编告诉大家如何处理win7系统玩cf老是与主机连接不稳定的问题，可能很多用户都不

阅读：9383|2022-11-06

7 0xc000022-[cf烟雾头]cf怎么调烟雾头

电脑对日常生活的重要性小编就不多说了，可是一旦碰到win7系统设置cf烟雾头的问题，很

阅读：8426|2022-11-06

8 qizideyouhuo:应用程序无法正常启动0xc0000

我们在日常使用电脑的时候，有的小伙伴们可能在打开应用的时候会遇见提示应用程序无法

阅读：7858|2022-11-06

9 ipz-185:win7系统vcf文件怎么打开

今天小编告诉大家如何对win7系统打开vcf文件进行设置，可能很多用户都不知道怎么对win

阅读：8410|2022-11-06

10 傻哥蹦迪:win10系统s4怎么打开usb调试

今天小编告诉大家如何对win10系统s4开启USB调试模式进行设置，可能很多用户都不知道怎

阅读：7394|2022-11-06

客服电话

电子邮件

Java IndexingException类代码示例

示例1: filter

示例2: filter

示例3: filter

示例4: filter

示例5: testOgcIndexingFilter

示例6: filter

示例7: filter

示例8: getSiteHashFromJsonStream

示例9: filter

示例10: filter

示例11: testOgcIndexingFilter

示例12: filter

示例13: filter

示例14: filter

示例15: assertContentType

示例16: filter

示例17: filter

示例18: filter

示例19: filter

示例20: filter

请发表评论

全部评论

上一篇：

下一篇：

bradtraversy/iweather: Ionic 3 mobile we

微信小程序如何刷新当前界面

joaomh/curso-de-matlab

断牙刷新位置时间（断牙属性及刷新位置介绍

rugk/mastodon-simplified-federation: Sim

剪的笔顺,诠释剪的笔画,认识剪的部首

六六分期app的软件客服如何联系？(六六分期

florent37/ViewAnimator: A fluent Android

florent37/Shrine-MaterialDesign2: implem

CVE-2020-36276

SimpleSoftwareIO/simple-sms: Send and re

关于我们

产品与服务

解决方案

139-2527-9053