Java CoreStopWordDictionary类代码示例

OGeek|极客世界-中国程序员成长平台 › 门户 › 编程› Java›Java编程经验

原作者: [db:作者] 来自: [db:来源] 收藏邀请

本文整理汇总了Java中com.hankcs.hanlp.dictionary.stopword.CoreStopWordDictionary类的典型用法代码示例。如果您正苦于以下问题：Java CoreStopWordDictionary类的具体用法？Java CoreStopWordDictionary怎么用？Java CoreStopWordDictionary使用的例子？那么恭喜您, 这里精选的类代码示例或许可以为您提供帮助。

CoreStopWordDictionary类属于com.hankcs.hanlp.dictionary.stopword包，在下文中一共展示了CoreStopWordDictionary类的16个代码示例，这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞，您的评价将有助于我们的系统推荐出更棒的Java代码示例。

示例1: convertSentenceListToDocument

import com.hankcs.hanlp.dictionary.stopword.CoreStopWordDictionary; //导入依赖的package包/类
/**
 * 将句子列表转化为文档
 *
 * @param sentenceList
 * @return
 */
private static List<List<String>> convertSentenceListToDocument(List<String> sentenceList)
{
    List<List<String>> docs = new ArrayList<List<String>>(sentenceList.size());
    for (String sentence : sentenceList)
    {
        List<Term> termList = StandardTokenizer.segment(sentence.toCharArray());
        List<String> wordList = new LinkedList<String>();
        for (Term term : termList)
        {
            if (CoreStopWordDictionary.shouldInclude(term))
            {
                wordList.add(term.word);
            }
        }
        docs.add(wordList);
    }
    return docs;
}

开发者ID:priester，项目名称:hanlpStudy，代码行数:25，代码来源:TextRankSentence.java

示例2: seg2sentence

import com.hankcs.hanlp.dictionary.stopword.CoreStopWordDictionary; //导入依赖的package包/类
/**
 * 切分为句子形式
 *
 * @param text
 * @return
 */
public static List<List<Term>> seg2sentence(String text)
{
    List<List<Term>> sentenceList = SEGMENT.seg2sentence(text);
    for (List<Term> sentence : sentenceList)
    {
        ListIterator<Term> listIterator = sentence.listIterator();
        while (listIterator.hasNext())
        {
            if (!CoreStopWordDictionary.shouldInclude(listIterator.next()))
            {
                listIterator.remove();
            }
        }
    }

    return sentenceList;
}

开发者ID:priester，项目名称:hanlpStudy，代码行数:24，代码来源:NotionalTokenizer.java

示例3: testSegmentCorpus

import com.hankcs.hanlp.dictionary.stopword.CoreStopWordDictionary; //导入依赖的package包/类
public void testSegmentCorpus() throws Exception
{
    File root = new File("D:\\Doc\\语料库\\搜狗文本分类语料库精简版");
    for (File folder : root.listFiles())
    {
        if (folder.isDirectory())
        {
            for (File file : folder.listFiles())
            {
                System.out.println(file.getAbsolutePath());
                List<Term> termList = HanLP.segment(IOUtil.readTxt(file.getAbsolutePath()));
                StringBuilder sbOut = new StringBuilder();
                for (Term term : termList)
                {
                    if (CoreStopWordDictionary.shouldInclude(term))
                    {
                        sbOut.append(term.word).append(" ");
                    }
                }
                IOUtil.saveTxt("D:\\Doc\\语料库\\segmented\\" + folder.getName() + "_" + file.getName(), sbOut.toString());
            }
        }
    }
}

开发者ID:priester，项目名称:hanlpStudy，代码行数:25，代码来源:TestLDA.java

示例4: StandardSegment

import com.hankcs.hanlp.dictionary.stopword.CoreStopWordDictionary; //导入依赖的package包/类
/**
 * 标准分词
 * @param content 文本
 * @param filterStopWord 滤掉停用词
 * @return
 */
public static List<Term> StandardSegment(String content, boolean filterStopWord) {
    List<Term> result = StandardTokenizer.segment(content);
    if (filterStopWord)
        CoreStopWordDictionary.apply(result);
    return result;
}

开发者ID:jsksxs360，项目名称:AHANLP，代码行数:13，代码来源:Segment.java

示例5: segment

import com.hankcs.hanlp.dictionary.stopword.CoreStopWordDictionary; //导入依赖的package包/类
/**
 * 分词
 *
 * @param text 文本
 * @return 分词结果
 */
public static List<Term> segment(char[] text)
{
    List<Term> resultList = SEGMENT.seg(text);
    ListIterator<Term> listIterator = resultList.listIterator();
    while (listIterator.hasNext())
    {
        if (!CoreStopWordDictionary.shouldInclude(listIterator.next()))
        {
            listIterator.remove();
        }
    }

    return resultList;
}

开发者ID:priester，项目名称:hanlpStudy，代码行数:21，代码来源:NotionalTokenizer.java

示例6: main

import com.hankcs.hanlp.dictionary.stopword.CoreStopWordDictionary; //导入依赖的package包/类
public static void main(String[] args)
{
    String text = "小区居民有的反对喂养流浪猫，而有的居民却赞成喂养这些小宝贝";
    // 可以动态修改停用词词典
    CoreStopWordDictionary.add("居民");
    System.out.println(NotionalTokenizer.segment(text));
    CoreStopWordDictionary.remove("居民");
    System.out.println(NotionalTokenizer.segment(text));
    // 可以对任意分词器的结果执行过滤
    List<Term> termList = BasicTokenizer.segment(text);
    System.out.println(termList);
    CoreStopWordDictionary.apply(termList);
    System.out.println(termList);
    // 还可以自定义过滤逻辑
    CoreStopWordDictionary.FILTER = new Filter()
    {
        @Override
        public boolean shouldInclude(Term term)
        {
            switch (term.nature)
            {
                case nz:
                return !CoreStopWordDictionary.contains(term.word);
            }
            return false;
        }
    };
    System.out.println(NotionalTokenizer.segment(text));
}

开发者ID:priester，项目名称:hanlpStudy，代码行数:30，代码来源:DemoStopWord.java

示例7: getTopSentenceList

import com.hankcs.hanlp.dictionary.stopword.CoreStopWordDictionary; //导入依赖的package包/类
/**
     * 一句话调用接口
     * @param document 目标文档
     * @param size 需要的关键句的个数
     * @return 关键句列表
     */
    public static List<String> getTopSentenceList(String document, int size)
    {
        List<String> sentenceList = spiltSentence(document);
        List<List<String>> docs = new ArrayList<List<String>>();
        for (String sentence : sentenceList)
        {
            List<Term> termList = StandardTokenizer.segment(sentence.toCharArray());
            List<String> wordList = new LinkedList<String>();
            for (Term term : termList)
            {
                if (CoreStopWordDictionary.shouldInclude(term))
                {
                    wordList.add(term.word);
                }
            }
            docs.add(wordList);
//            System.out.println(wordList);
        }
        TextRankSentence textRank = new TextRankSentence(docs);
        int[] topSentence = textRank.getTopSentence(size);
        List<String> resultList = new LinkedList<String>();
        for (int i : topSentence)
        {
            resultList.add(sentenceList.get(i));
        }
        return resultList;
    }

开发者ID:ml-distribution，项目名称:HanLP，代码行数:34，代码来源:TextRankSentence.java

示例8: main

import com.hankcs.hanlp.dictionary.stopword.CoreStopWordDictionary; //导入依赖的package包/类
public static void main(String[] args)
{
    String text = "小区居民有的反对喂养流浪猫，而有的居民却赞成喂养这些小宝贝";
    // 可以动态修改停用词词典
    CoreStopWordDictionary.add("居民");
    System.out.println(NotionalTokenizer.segment(text));
    CoreStopWordDictionary.remove("居民");
    System.out.println(NotionalTokenizer.segment(text));
    // 可以对任意分词器的结果执行过滤
    List<Term> termList = BasicTokenizer.segment(text);
    System.out.println(termList);
    CoreStopWordDictionary.apply(termList);
    System.out.println(termList);
}

开发者ID:ml-distribution，项目名称:HanLP，代码行数:15，代码来源:DemoStopWord.java

示例9: testContains

import com.hankcs.hanlp.dictionary.stopword.CoreStopWordDictionary; //导入依赖的package包/类
public void testContains() throws Exception
{
    HanLP.Config.enableDebug();
    System.out.println(CoreStopWordDictionary.contains("这就是说"));
}

开发者ID:priester，项目名称:hanlpStudy，代码行数:6，代码来源:TestStopWordDictionary.java

示例10: testContainsSomeWords

import com.hankcs.hanlp.dictionary.stopword.CoreStopWordDictionary; //导入依赖的package包/类
public void testContainsSomeWords() throws Exception
{
    assertEquals(true, CoreStopWordDictionary.contains("可以"));
}

开发者ID:priester，项目名称:hanlpStudy，代码行数:5，代码来源:TestStopWordDictionary.java

示例11: getSummary

import com.hankcs.hanlp.dictionary.stopword.CoreStopWordDictionary; //导入依赖的package包/类
/**
     * 一句话调用接口
     * @param document 目标文档
     * @param max_length 需要摘要的长度
     * @return 摘要文本
     */
    public static String getSummary(String document, int max_length)
    {
        if(!validate_document(document, max_length)){
            return "";
        }
        List<String> sentenceList = spiltSentence(document);

        int sentence_count = sentenceList.size();
        List<List<String>> docs = new ArrayList<List<String>>();
        for (String sentence : sentenceList)
        {
            List<Term> termList = StandardTokenizer.segment(sentence.toCharArray());
            List<String> wordList = new LinkedList<String>();
            for (Term term : termList)
            {
                if (CoreStopWordDictionary.shouldInclude(term))
                {
                    wordList.add(term.word);
                }
            }
            docs.add(wordList);
//            System.out.println(wordList);
        }

        TextRankSentence textRank = new TextRankSentence(docs);
        int[] topSentence = textRank.getTopSentence(sentence_count);
        List<String> resultList = new LinkedList<String>();
        for (int i : topSentence)
        {
            resultList.add(sentenceList.get(i));
        }

        resultList = permutation(resultList, sentenceList);
        resultList = pick_sentences(resultList, max_length);

        String summary = "";
        for(String temp : resultList)
        {
        	summary += temp;
        }

        if (summary.length() < 15){
            summary = "";
        }
        return summary;
    }

开发者ID:furaoing，项目名称:HanLP-1.2.4-Taikor，代码行数:53，代码来源:TextRankSentence.java

示例12: NLPSegment

import com.hankcs.hanlp.dictionary.stopword.CoreStopWordDictionary; //导入依赖的package包/类
/**
 * NLP分词
 * 执行全部命名实体识别和词性标注
 * @param content 文本
 * @param filterStopWord 滤掉停用词
 * @return
 */
public static List<Term> NLPSegment(String content, boolean filterStopWord) {
    List<Term> result = NLPTokenizer.segment(content);
    if (filterStopWord)
    	CoreStopWordDictionary.apply(result);
    return result;
}

开发者ID:jsksxs360，项目名称:AHANLP，代码行数:14，代码来源:Segment.java

示例13: shouldInclude

import com.hankcs.hanlp.dictionary.stopword.CoreStopWordDictionary; //导入依赖的package包/类
/**
 * judge whether a word belongs to stop words
 * @param term(Term): word needed to be judged
 * @return(boolean):  if the word is a stop word,return false;otherwise return true    
 */
public static boolean shouldInclude(Term term)
{
    return CoreStopWordDictionary.shouldInclude(term);
}

开发者ID:WuLC，项目名称:KeywordExtraction，代码行数:10，代码来源:TFIDF.java

示例14: shouldInclude

import com.hankcs.hanlp.dictionary.stopword.CoreStopWordDictionary; //导入依赖的package包/类
/**
 * judge whether a word belongs to stop words
 * @param term(Term): word needed to be judged
 * @return(boolean):  if the word is a stop word,return false;otherwise return true    
 */
public static boolean shouldInclude(Term term)
 {
     return CoreStopWordDictionary.shouldInclude(term);
 }

开发者ID:WuLC，项目名称:KeywordExtraction，代码行数:10，代码来源:TextRank.java

示例15: shouldInclude

import com.hankcs.hanlp.dictionary.stopword.CoreStopWordDictionary; //导入依赖的package包/类
/**
 * 是否应当将这个term纳入计算，词性属于名词、动词、副词、形容词
 * @param term
 * @return 是否应当
 */
public boolean shouldInclude(Term term)
{
    return CoreStopWordDictionary.shouldInclude(term);
}

开发者ID:hankcs，项目名称:TextRank，代码行数:10，代码来源:TextRankKeyword.java

示例16: shouldInclude

import com.hankcs.hanlp.dictionary.stopword.CoreStopWordDictionary; //导入依赖的package包/类
/**
 * 是否应当将这个term纳入计算，词性属于名词、动词、副词、形容词
 * @param term
 * @return 是否应当
 */
public static boolean shouldInclude(Term term)
{
    return CoreStopWordDictionary.shouldInclude(term);
}

开发者ID:hankcs，项目名称:TextRank，代码行数:10，代码来源:TextRankSummary.java

注：本文中的com.hankcs.hanlp.dictionary.stopword.CoreStopWordDictionary类示例整理自Github/MSDocs等源码及文档管理平台，相关代码片段筛选自各路编程大神贡献的开源项目，源码版权归原作者所有，传播和使用请参考对应项目的License；未经允许，请勿转载。

鲜花

握手

雷人

路过

鸡蛋

该文章已有0人参与评论

请发表评论

全部评论

专题导读

More+

10-27 六六分期app的软件客服如何联系？(六六分期

11-06 可心卡盟:win10系统火狐flash插件崩溃怎么

11-06 亲亲特价:怎么删除回收站图标

11-06 济南大学虚拟社区:鲁大师节能降温的具体办

11-06 xlueops.exe:无线网络安装向导

11-06 女斗合众国:win7系统cf与主机连接不稳定怎

11-06 0xc000022-[cf烟雾头]cf怎么调烟雾头

11-06 qizideyouhuo:应用程序无法正常启动0xc0000

11-06 ipz-185:win7系统vcf文件怎么打开

11-06 傻哥蹦迪:win10系统s4怎么打开usb调试

11-06 八神浩树gtaste:回收站清空了怎么恢复

11-06 妖尾之黑色守护:win10系统电脑没有1440x900

11-06 校园至尊魔王小说:win7系统浏览网页时字体

11-06 女斗合众国:win10系统访问共享文件夹提示请

11-06 tokyo hot n0654:恢复win7系统默认字体一招

11-06 雨酷仙境:设置win7系统转移临时文件夹腾出

11-06 阿穆纳伊之杖:win7系统开始菜单在右边还原

11-06 tunespotting:win10系统火狐flash插件总是

11-06 甘尔葛分析师：计谋网站seo关键词暴涨有什

11-06 蔡贵霖: 计谋网站seo关键词暴涨有什么秘密

11-06 博益网首页:ao3网页版进入不了解决方法

11-06 漏斗子专栏: 网站数据分析小白易懂精华篇

11-06 见证双虹怎么做:win7系统开启telnet命令的

11-06 颾狐蝶蜋:系统资源不足无法完成请求的服务

11-06 国光中学校歌:提交网站到alexa查询详细步骤

11-06 西安有情天:静态网页和动态网页的区别

11-06 红木雅尚斋:外部链接构造对网站的好处

11-06 前官礼遇：防止域名劫持–增强域安全性的10

11-06 密传二转答案: 中文分词算法有哪些

11-06 金泉家园邮编:百度快照劫持的表现及应对方

Java TestRunListener类代码示例发布时间：2022-05-23

Java WSSConfig类代码示例发布时间：2022-05-23

剪的笔顺,诠释剪的笔画,认识剪的部首

1 六六分期app的软件客服如何联系？(六六分期

六六分期app的软件客服如何联系？不知道吗？加qq群【895510560】即可！标题：六六分期

阅读：17949|2023-10-27

2 可心卡盟:win10系统火狐flash插件崩溃怎么

今天小编告诉大家如何处理win10系统火狐flash插件总是崩溃的问题，可能很多用户都不知

阅读：9562|2022-11-06

3 亲亲特价:怎么删除回收站图标

今天小编告诉大家如何对win10系统删除桌面回收站图标进行设置，可能很多用户都不知道

阅读：8126|2022-11-06

4 济南大学虚拟社区:鲁大师节能降温的具体办

今天小编告诉大家如何对win10系统电脑设置节能降温的设置方法，想必大家都遇到过需要

阅读：8507|2022-11-06

5 xlueops.exe:无线网络安装向导

我们在使用xp系统的过程中,经常需要对xp系统无线网络安装向导设置进行设置，可能很多

阅读：8410|2022-11-06

6 女斗合众国:win7系统cf与主机连接不稳定怎

今天小编告诉大家如何处理win7系统玩cf老是与主机连接不稳定的问题，可能很多用户都不

阅读：9302|2022-11-06

7 0xc000022-[cf烟雾头]cf怎么调烟雾头

电脑对日常生活的重要性小编就不多说了，可是一旦碰到win7系统设置cf烟雾头的问题，很

阅读：8374|2022-11-06

8 qizideyouhuo:应用程序无法正常启动0xc0000

我们在日常使用电脑的时候，有的小伙伴们可能在打开应用的时候会遇见提示应用程序无法

阅读：7805|2022-11-06

9 ipz-185:win7系统vcf文件怎么打开

今天小编告诉大家如何对win7系统打开vcf文件进行设置，可能很多用户都不知道怎么对win

阅读：8357|2022-11-06

10 傻哥蹦迪:win10系统s4怎么打开usb调试

今天小编告诉大家如何对win10系统s4开启USB调试模式进行设置，可能很多用户都不知道怎

阅读：7359|2022-11-06

客服电话

电子邮件

Java CoreStopWordDictionary类代码示例

示例1: convertSentenceListToDocument

示例2: seg2sentence

示例3: testSegmentCorpus

示例4: StandardSegment

示例5: segment

示例6: main

示例7: getTopSentenceList

示例8: main

示例9: testContains

示例10: testContainsSomeWords

示例11: getSummary

示例12: NLPSegment

示例13: shouldInclude

示例14: shouldInclude

示例15: shouldInclude

示例16: shouldInclude

请发表评论

全部评论

上一篇：

下一篇：

librespeed/speedtest: Self-hosted Speedt

avehtari/BDA_m_demos: Bayesian Data Anal

四维彩超怎么看性别？四维看男孩女孩诀窍

medfreeman/markdown-it-toc-and-anchor: m

CVE-2022-35234

剪的笔顺,诠释剪的笔画,认识剪的部首

六六分期app的软件客服如何联系？(六六分期

florent37/ViewAnimator: A fluent Android

florent37/Shrine-MaterialDesign2: implem

CVE-2020-36276

SimpleSoftwareIO/simple-sms: Send and re

关于我们

产品与服务

解决方案

139-2527-9053