Java Document类代码示例

OGeek|极客世界-中国程序员成长平台 › 门户 › 编程› Java›Java编程经验

原作者: [db:作者] 来自: [db:来源] 收藏邀请

本文整理汇总了Java中org.carrot2.core.Document类的典型用法代码示例。如果您正苦于以下问题：Java Document类的具体用法？Java Document怎么用？Java Document使用的例子？那么恭喜您, 这里精选的类代码示例或许可以为您提供帮助。

Document类属于org.carrot2.core包，在下文中一共展示了Document类的12个代码示例，这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞，您的评价将有助于我们的系统推荐出更棒的Java代码示例。

示例1: process

import org.carrot2.core.Document; //导入依赖的package包/类
@Override
public void process() throws ProcessingException {
  clusters = Lists.newArrayListWithCapacity(documents.size());
  
  for (Document document : documents) {
    final Cluster cluster = new Cluster();
    cluster.addPhrases(document.getTitle(), document.getSummary());
    if (document.getLanguage() != null) {
      cluster.addPhrases(document.getLanguage().name());
    }
    for (String field : customFields.split(",")) {
      Object value = document.getField(field);
      if (value != null) {
        cluster.addPhrases(value.toString());
      }
    }
    cluster.addDocuments(document);
    clusters.add(cluster);
  }
}

开发者ID:europeana，项目名称:search，代码行数:21，代码来源:EchoClusteringAlgorithm.java

示例2: displayResults

import org.carrot2.core.Document; //导入依赖的package包/类
/**
 * 对processingResult进行全面的展示,输出至控制台.
 * @author GS
 * @param processingResult
 */
public static void displayResults(ProcessingResult processingResult)
{
    final Collection<Document> documents = processingResult.getDocuments();//所有的文档
    final Collection<Cluster> clusters = processingResult.getClusters();//所有的类别
    final Map<String, Object> attributes = processingResult.getAttributes();//参数

    // Show documents
    if (documents != null)
    {
        displayDocuments(documents);//打印所有文档
    }

    // Show clusters
    if (clusters != null)
    {
        displayClusters(clusters);//打印所有分类
    }

    // Show attributes other attributes
    displayAttributes(attributes);//打印参数
}

开发者ID:gsh199449，项目名称:DistributedCrawler，代码行数:27，代码来源:ConsoleFormatter.java

示例3: cluster

import org.carrot2.core.Document; //导入依赖的package包/类
/**
 * 对所有的PagePOJO进行聚类
 * 
 * @author GS
 * @return
 * @throws IOException
 * @throws Exception
 */
public ProcessingResult cluster(String docPath) throws IOException,
		Exception {
	@SuppressWarnings("unchecked")
	final Controller controller = ControllerFactory
			.createCachingPooling(IDocumentSource.class);
	final List<Document> documents = Lists.newArrayList();
	JsonReader jr = new JsonReader(new File(docPath));
	while (jr.hasNext()) {
		Hit h = jr.next();
		documents.add(new Document(h.getPagePOJO().getTitle(), h
				.getPagePOJO().getContent()));
	}
	jr.close();
	final Map<String, Object> attributes = Maps.newHashMap();
	CommonAttributesDescriptor.attributeBuilder(attributes).documents(
			documents);
	final ProcessingResult englishResult = controller.process(attributes,
			LingoClusteringAlgorithm.class);
	ConsoleFormatter.displayResults(englishResult);// 展示
	return englishResult;
}

开发者ID:gsh199449，项目名称:DistributedCrawler，代码行数:30，代码来源:Cluster.java

示例4: adapt

import org.carrot2.core.Document; //导入依赖的package包/类
private DocumentGroup adapt(Cluster cluster) {
    DocumentGroup group = new DocumentGroup();
    group.setId(cluster.getId());
    List<String> phrases = cluster.getPhrases();
    group.setPhrases(phrases.toArray(new String[phrases.size()]));
    group.setLabel(cluster.getLabel());
    group.setScore(cluster.getScore());
    group.setOtherTopics(cluster.isOtherTopics());

    List<Document> documents = cluster.getDocuments();
    String[] documentReferences = new String[documents.size()];
    for (int i = 0; i < documentReferences.length; i++) {
        documentReferences[i] = documents.get(i).getStringId();
    }
    group.setDocumentReferences(documentReferences);

    List<Cluster> subclusters = cluster.getSubclusters();
    subclusters = (subclusters == null ? Collections.emptyList() : subclusters);
    group.setSubgroups(adapt(subclusters));

    return group;
}

开发者ID:carrot2，项目名称:elasticsearch-carrot2，代码行数:23，代码来源:ClusteringAction.java

示例5: cluster

import org.carrot2.core.Document; //导入依赖的package包/类
@Override
public Object cluster(Query query, SolrDocumentList solrDocList,
    Map<SolrDocument, Integer> docIds, SolrQueryRequest sreq) {
  try {
    // Prepare attributes for Carrot2 clustering call
    Map<String, Object> attributes = new HashMap<>();
    List<Document> documents = getDocuments(solrDocList, docIds, query, sreq);
    attributes.put(AttributeNames.DOCUMENTS, documents);
    attributes.put(AttributeNames.QUERY, query.toString());

    // Pass the fields on which clustering runs.
    attributes.put("solrFieldNames", getFieldsForClustering(sreq));

    // Pass extra overriding attributes from the request, if any
    extractCarrotAttributes(sreq.getParams(), attributes);

    // Perform clustering and convert to an output structure of clusters.
    //
    // Carrot2 uses current thread's context class loader to get
    // certain classes (e.g. custom tokenizer/stemmer) at runtime.
    // To make sure classes from contrib JARs are available,
    // we swap the context class loader for the time of clustering.
    Thread ct = Thread.currentThread();
    ClassLoader prev = ct.getContextClassLoader();
    try {
      ct.setContextClassLoader(core.getResourceLoader().getClassLoader());
      return clustersToNamedList(controller.process(attributes,
              clusteringAlgorithmClass).getClusters(), sreq.getParams());
    } finally {
      ct.setContextClassLoader(prev);
    }
  } catch (Exception e) {
    log.error("Carrot2 clustering failed", e);
    throw new SolrException(ErrorCode.SERVER_ERROR, "Carrot2 clustering failed", e);
  }
}

开发者ID:europeana，项目名称:search，代码行数:37，代码来源:CarrotClusteringEngine.java

示例6: cluster

import org.carrot2.core.Document; //导入依赖的package包/类
@Override
public Object cluster(Query query, SolrDocumentList solrDocList,
    Map<SolrDocument, Integer> docIds, SolrQueryRequest sreq) {
  try {
    // Prepare attributes for Carrot2 clustering call
    Map<String, Object> attributes = new HashMap<String, Object>();
    List<Document> documents = getDocuments(solrDocList, docIds, query, sreq);
    attributes.put(AttributeNames.DOCUMENTS, documents);
    attributes.put(AttributeNames.QUERY, query.toString());

    // Pass the fields on which clustering runs to the
    // SolrStopwordsCarrot2LexicalDataFactory
    attributes.put("solrFieldNames", getFieldsForClustering(sreq));

    // Pass extra overriding attributes from the request, if any
    extractCarrotAttributes(sreq.getParams(), attributes);

    // Perform clustering and convert to named list
    // Carrot2 uses current thread's context class loader to get
    // certain classes (e.g. custom tokenizer/stemmer) at runtime.
    // To make sure classes from contrib JARs are available,
    // we swap the context class loader for the time of clustering.
    Thread ct = Thread.currentThread();
    ClassLoader prev = ct.getContextClassLoader();
    try {
      ct.setContextClassLoader(core.getResourceLoader().getClassLoader());
      return clustersToNamedList(controller.process(attributes,
              clusteringAlgorithmClass).getClusters(), sreq.getParams());
    } finally {
      ct.setContextClassLoader(prev);
    }
  } catch (Exception e) {
    log.error("Carrot2 clustering failed", e);
    throw new SolrException(ErrorCode.SERVER_ERROR, "Carrot2 clustering failed", e);
  }
}

开发者ID:pkarmstr，项目名称:NYBC，代码行数:37，代码来源:CarrotClusteringEngine.java

示例7: cluster

import org.carrot2.core.Document; //导入依赖的package包/类
@Override
public Object cluster(Query query, SolrDocumentList solrDocList,
    Map<SolrDocument, Integer> docIds, SolrQueryRequest sreq) {
  try {
    // Prepare attributes for Carrot2 clustering call
    Map<String, Object> attributes = new HashMap<String, Object>();
    List<Document> documents = getDocuments(solrDocList, docIds, query, sreq);
    attributes.put(AttributeNames.DOCUMENTS, documents);
    attributes.put(AttributeNames.QUERY, query.toString());

    // Pass the fields on which clustering runs.
    attributes.put("solrFieldNames", getFieldsForClustering(sreq));

    // Pass extra overriding attributes from the request, if any
    extractCarrotAttributes(sreq.getParams(), attributes);

    // Perform clustering and convert to an output structure of clusters.
    //
    // Carrot2 uses current thread's context class loader to get
    // certain classes (e.g. custom tokenizer/stemmer) at runtime.
    // To make sure classes from contrib JARs are available,
    // we swap the context class loader for the time of clustering.
    Thread ct = Thread.currentThread();
    ClassLoader prev = ct.getContextClassLoader();
    try {
      ct.setContextClassLoader(core.getResourceLoader().getClassLoader());
      return clustersToNamedList(controller.process(attributes,
              clusteringAlgorithmClass).getClusters(), sreq.getParams());
    } finally {
      ct.setContextClassLoader(prev);
    }
  } catch (Exception e) {
    log.error("Carrot2 clustering failed", e);
    throw new SolrException(ErrorCode.SERVER_ERROR, "Carrot2 clustering failed", e);
  }
}

开发者ID:yintaoxue，项目名称:read-open-source-code，代码行数:37，代码来源:CarrotClusteringEngine.java

示例8: displayDocuments

import org.carrot2.core.Document; //导入依赖的package包/类
/**
 * 显示Collection里面的每一个文档,显示标题和URL
 * @author GS
 * @param documents
 */
public static void displayDocuments(final Collection<Document> documents)
{
    System.out.println("Collected " + documents.size() + " documents\n");//所有的文档总数
    for (final Document document : documents)
    {
        displayDocument(0, document);//显示单个文档,包括显示标题和URL
    }
}

开发者ID:gsh199449，项目名称:DistributedCrawler，代码行数:14，代码来源:ConsoleFormatter.java

示例9: displayDocument

import org.carrot2.core.Document; //导入依赖的package包/类
/**
 * 展示单个文档
 * @author GS
 * @param level
 * @param document
 */
private static void displayDocument(final int level, Document document)//展示每一个文档
{
    final String indent = getIndent(level);

    System.out.printf(indent + "[%2s] ", document.getStringId());//打印文档ID号
    System.out.println(document.getField(Document.TITLE));//打印标题
    final String url = document.getField(Document.CONTENT_URL);//正文URL
    if (StringUtils.isNotBlank(url))//如果document里面带有正文的URL则打印
    {
        System.out.println(indent + "     " + url);
    }
    System.out.println();
}

开发者ID:gsh199449，项目名称:DistributedCrawler，代码行数:20，代码来源:ConsoleFormatter.java

示例10: displayCluster

import org.carrot2.core.Document; //导入依赖的package包/类
/**
 * 对一个类进行展示.
 * @author GS
 * @param level
 * @param tag
 * @param cluster
 * @param maxNumberOfDocumentsToShow
 * @param clusterDetailsFormatter
 */
private static void displayCluster(final int level, String tag, Cluster cluster,
    int maxNumberOfDocumentsToShow, ClusterDetailsFormatter clusterDetailsFormatter)
{
    final String label = cluster.getLabel();//当前类的标题

    // indent up to level and display this cluster's description phrase
    for (int i = 0; i < level; i++)
    {
        System.out.print("  ");
    }
    System.out.println(label + "  "
        + clusterDetailsFormatter.formatClusterDetails(cluster));

    // if this cluster has documents, display three topmost documents.
    int documentsShown = 0;
    for (final Document document : cluster.getDocuments())
    {
        if (documentsShown >= maxNumberOfDocumentsToShow)//如果达到最大展示数的话不再展示
        {
            break;
        }
        displayDocument(level + 1, document);//这个level是干嘛的?
        documentsShown++;//当前分类已经展示的文档数
    }
    if (maxNumberOfDocumentsToShow > 0
        && (cluster.getDocuments().size() > documentsShown))
    {
        System.out.println(getIndent(level + 1) + "... and "
            + (cluster.getDocuments().size() - documentsShown) + " more\n");
    }

    // finally, if this cluster has subclusters, descend into recursion.
    final int num = 1;
    for (final Cluster subcluster : cluster.getSubclusters())
    {
        displayCluster(level + 1, tag + "." + num, subcluster,
            maxNumberOfDocumentsToShow, clusterDetailsFormatter);
    }
}

开发者ID:gsh199449，项目名称:DistributedCrawler，代码行数:49，代码来源:ConsoleFormatter.java

示例11: clustersToNamedList

import org.carrot2.core.Document; //导入依赖的package包/类
private void clustersToNamedList(List<Cluster> outputClusters,
                                 List<NamedList<Object>> parent, boolean outputSubClusters, int maxLabels) {
  for (Cluster outCluster : outputClusters) {
    NamedList<Object> cluster = new SimpleOrderedMap<>();
    parent.add(cluster);

    // Add labels
    List<String> labels = outCluster.getPhrases();
    if (labels.size() > maxLabels) {
      labels = labels.subList(0, maxLabels);
    }
    cluster.add("labels", labels);

    // Add cluster score
    final Double score = outCluster.getScore();
    if (score != null) {
      cluster.add("score", score);
    }

    // Add other topics marker
    if (outCluster.isOtherTopics()) {
      cluster.add("other-topics", outCluster.isOtherTopics());
    }

    // Add documents
    List<Document> docs = outputSubClusters ? outCluster.getDocuments() : outCluster.getAllDocuments();
    List<Object> docList = Lists.newArrayList();
    cluster.add("docs", docList);
    for (Document doc : docs) {
      docList.add(doc.getField(SOLR_DOCUMENT_ID));
    }

    // Add subclusters
    if (outputSubClusters && !outCluster.getSubclusters().isEmpty()) {
      List<NamedList<Object>> subclusters = Lists.newArrayList();
      cluster.add("clusters", subclusters);
      clustersToNamedList(outCluster.getSubclusters(), subclusters,
              outputSubClusters, maxLabels);
    }
  }
}

开发者ID:europeana，项目名称:search，代码行数:42，代码来源:CarrotClusteringEngine.java

示例12: clustersToNamedList

import org.carrot2.core.Document; //导入依赖的package包/类
private void clustersToNamedList(List<Cluster> outputClusters,
                                 List<NamedList<Object>> parent, boolean outputSubClusters, int maxLabels) {
  for (Cluster outCluster : outputClusters) {
    NamedList<Object> cluster = new SimpleOrderedMap<Object>();
    parent.add(cluster);

    // Add labels
    List<String> labels = outCluster.getPhrases();
    if (labels.size() > maxLabels) {
      labels = labels.subList(0, maxLabels);
    }
    cluster.add("labels", labels);

    // Add cluster score
    final Double score = outCluster.getScore();
    if (score != null) {
      cluster.add("score", score);
    }

    // Add other topics marker
    if (outCluster.isOtherTopics()) {
      cluster.add("other-topics", outCluster.isOtherTopics());
    }

    // Add documents
    List<Document> docs = outputSubClusters ? outCluster.getDocuments() : outCluster.getAllDocuments();
    List<Object> docList = Lists.newArrayList();
    cluster.add("docs", docList);
    for (Document doc : docs) {
      docList.add(doc.getField(SOLR_DOCUMENT_ID));
    }

    // Add subclusters
    if (outputSubClusters && !outCluster.getSubclusters().isEmpty()) {
      List<NamedList<Object>> subclusters = Lists.newArrayList();
      cluster.add("clusters", subclusters);
      clustersToNamedList(outCluster.getSubclusters(), subclusters,
              outputSubClusters, maxLabels);
    }
  }
}

开发者ID:pkarmstr，项目名称:NYBC，代码行数:42，代码来源:CarrotClusteringEngine.java

注：本文中的org.carrot2.core.Document类示例整理自Github/MSDocs等源码及文档管理平台，相关代码片段筛选自各路编程大神贡献的开源项目，源码版权归原作者所有，传播和使用请参考对应项目的License；未经允许，请勿转载。

鲜花

握手

雷人

路过

鸡蛋

该文章已有0人参与评论

请发表评论

全部评论

专题导读

More+

10-27 六六分期app的软件客服如何联系？(六六分期

11-06 可心卡盟:win10系统火狐flash插件崩溃怎么

11-06 亲亲特价:怎么删除回收站图标

11-06 济南大学虚拟社区:鲁大师节能降温的具体办

11-06 xlueops.exe:无线网络安装向导

11-06 女斗合众国:win7系统cf与主机连接不稳定怎

11-06 0xc000022-[cf烟雾头]cf怎么调烟雾头

11-06 qizideyouhuo:应用程序无法正常启动0xc0000

11-06 ipz-185:win7系统vcf文件怎么打开

11-06 傻哥蹦迪:win10系统s4怎么打开usb调试

11-06 八神浩树gtaste:回收站清空了怎么恢复

11-06 妖尾之黑色守护:win10系统电脑没有1440x900

11-06 校园至尊魔王小说:win7系统浏览网页时字体

11-06 女斗合众国:win10系统访问共享文件夹提示请

11-06 tokyo hot n0654:恢复win7系统默认字体一招

11-06 雨酷仙境:设置win7系统转移临时文件夹腾出

11-06 阿穆纳伊之杖:win7系统开始菜单在右边还原

11-06 tunespotting:win10系统火狐flash插件总是

11-06 甘尔葛分析师：计谋网站seo关键词暴涨有什

11-06 蔡贵霖: 计谋网站seo关键词暴涨有什么秘密

11-06 博益网首页:ao3网页版进入不了解决方法

11-06 漏斗子专栏: 网站数据分析小白易懂精华篇

11-06 见证双虹怎么做:win7系统开启telnet命令的

11-06 颾狐蝶蜋:系统资源不足无法完成请求的服务

11-06 国光中学校歌:提交网站到alexa查询详细步骤

11-06 西安有情天:静态网页和动态网页的区别

11-06 红木雅尚斋:外部链接构造对网站的好处

11-06 前官礼遇：防止域名劫持–增强域安全性的10

11-06 密传二转答案: 中文分词算法有哪些

11-06 金泉家园邮编:百度快照劫持的表现及应对方

Java QueryOptions类代码示例发布时间：2022-05-22

Java FloatPointer类代码示例发布时间：2022-05-22

剪的笔顺,诠释剪的笔画,认识剪的部首

1 六六分期app的软件客服如何联系？(六六分期

六六分期app的软件客服如何联系？不知道吗？加qq群【895510560】即可！标题：六六分期

阅读：18257|2023-10-27

2 可心卡盟:win10系统火狐flash插件崩溃怎么

今天小编告诉大家如何处理win10系统火狐flash插件总是崩溃的问题，可能很多用户都不知

阅读：9673|2022-11-06

3 亲亲特价:怎么删除回收站图标

今天小编告诉大家如何对win10系统删除桌面回收站图标进行设置，可能很多用户都不知道

阅读：8175|2022-11-06

4 济南大学虚拟社区:鲁大师节能降温的具体办

今天小编告诉大家如何对win10系统电脑设置节能降温的设置方法，想必大家都遇到过需要

阅读：8547|2022-11-06

5 xlueops.exe:无线网络安装向导

我们在使用xp系统的过程中,经常需要对xp系统无线网络安装向导设置进行设置，可能很多

阅读：8455|2022-11-06

6 女斗合众国:win7系统cf与主机连接不稳定怎

今天小编告诉大家如何处理win7系统玩cf老是与主机连接不稳定的问题，可能很多用户都不

阅读：9387|2022-11-06

7 0xc000022-[cf烟雾头]cf怎么调烟雾头

电脑对日常生活的重要性小编就不多说了，可是一旦碰到win7系统设置cf烟雾头的问题，很

阅读：8427|2022-11-06

8 qizideyouhuo:应用程序无法正常启动0xc0000

我们在日常使用电脑的时候，有的小伙伴们可能在打开应用的时候会遇见提示应用程序无法

阅读：7861|2022-11-06

9 ipz-185:win7系统vcf文件怎么打开

今天小编告诉大家如何对win7系统打开vcf文件进行设置，可能很多用户都不知道怎么对win

阅读：8411|2022-11-06

10 傻哥蹦迪:win10系统s4怎么打开usb调试

今天小编告诉大家如何对win10系统s4开启USB调试模式进行设置，可能很多用户都不知道怎

阅读：7394|2022-11-06

客服电话

电子邮件

Java Document类代码示例

示例1: process

示例2: displayResults

示例3: cluster

示例4: adapt

示例5: cluster

示例6: cluster

示例7: cluster

示例8: displayDocuments

示例9: displayDocument

示例10: displayCluster

示例11: clustersToNamedList

示例12: clustersToNamedList

请发表评论

全部评论

上一篇：

下一篇：

dphi-official/Machine_Learning_Bootcamp

juven/maven-bash-completion: Maven Bash

win7系统注册表编辑器打开的操作方法

route101/mastoinker: Quick image view as

CVE-2022-21509

剪的笔顺,诠释剪的笔画,认识剪的部首

六六分期app的软件客服如何联系？(六六分期

florent37/ViewAnimator: A fluent Android

florent37/Shrine-MaterialDesign2: implem

CVE-2020-36276

SimpleSoftwareIO/simple-sms: Send and re

关于我们

产品与服务

解决方案

139-2527-9053