Python api.CorpusReader类代码示例

OGeek|极客世界-中国程序员成长平台 › 门户 › 编程› Python›Python编程经验

原作者: [db:作者] 来自: [db:来源] 收藏邀请

本文整理汇总了Python中nltk.corpus.reader.api.CorpusReader类的典型用法代码示例。如果您正苦于以下问题：Python CorpusReader类的具体用法？Python CorpusReader怎么用？Python CorpusReader使用的例子？那么恭喜您, 这里精选的类代码示例或许可以为您提供帮助。

在下文中一共展示了CorpusReader类的15个代码示例，这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞，您的评价将有助于我们的系统推荐出更棒的Python代码示例。

示例1: init

    def __init__(
        self,
        root,
        fileids,
        sep="/",
        word_tokenizer=WhitespaceTokenizer(),
        sent_tokenizer=RegexpTokenizer("\n", gaps=True),
        alignedsent_block_reader=read_alignedsent_block,
        encoding="latin1",
    ):
        """
        Construct a new Aligned Corpus reader for a set of documents
        located at the given root directory.  Example usage:

            >>> root = '/...path to corpus.../'
            >>> reader = AlignedCorpusReader(root, '.*', '.txt') # doctest: +SKIP

        :param root: The root directory for this corpus.
        :param fileids: A list or regexp specifying the fileids in this corpus.
        """
        CorpusReader.__init__(self, root, fileids, encoding)
        self._sep = sep
        self._word_tokenizer = word_tokenizer
        self._sent_tokenizer = sent_tokenizer
        self._alignedsent_block_reader = alignedsent_block_reader

开发者ID:Reagankm，项目名称:KnockKnock，代码行数:25，代码来源:aligned.py

示例2: init

    def __init__(self, root, fileids=None, encoding='utf8', skip_keywords=None,
                 target_language=None, paragraph_separator='\n\n', **kwargs):
        """
        :param root: The file root of the corpus directory
        :param fileids: the list of file ids to consider, or wildcard expression
        :param skip_keywords: a list of words which indicate whole paragraphs that should
        be skipped by the paras and words methods()
        :param target_language: which files to select; sometimes a corpus contains English
         translations, we expect these files to be named ...english.json -- if not, pass in fileids
        :param paragraph_separator: character sequence demarcating paragraph separation
        :param encoding: utf8
        :param kwargs: Any values to be passed to NLTK super classes, such as sent_tokenizer,
        word_tokenizer.
        """

        if not target_language:
            target_language = ''
        if not fileids:
            fileids = r'.*{}\.json'.format(target_language)

        # Initialize the NLTK corpus reader objects
        CorpusReader.__init__(self, root, fileids, encoding)
        if 'sent_tokenizer' in kwargs:
            self._sent_tokenizer = kwargs['sent_tokenizer']
        if 'word_tokenizer' in kwargs:
            self._word_tokenizer = kwargs['word_tokenizer']
        self.skip_keywords = skip_keywords
        self.paragraph_separator = paragraph_separator

开发者ID:diyclassics，项目名称:cltk，代码行数:28，代码来源:readers.py

示例3: init

 def __init__(self, root, fileids, encoding='utf8', morphs2str=_morphs2str_default):
     """
     Initialize KNBCorpusReader
     morphs2str is a function to convert morphlist to str for tree representation
     for _parse()
     """
     CorpusReader.__init__(self, root, fileids, encoding)
     self.morphs2str = morphs2str

开发者ID:DrDub，项目名称:nltk，代码行数:8，代码来源:knbc.py

示例4: init

 def __init__(self, root, fileids=DOC_PATTERN, **kwargs):
     """
     Initialize the corpus reader.  Categorization arguments
     (``cat_pattern``, ``cat_map``, and ``cat_file``) are passed to
     the ``CategorizedCorpusReader`` constructor.  The remaining
     arguments are passed to the ``CorpusReader`` constructor.
     """
     CorpusReader.__init__(self, root, fileids)

开发者ID:yokeyong，项目名称:atap，代码行数:8，代码来源:am_reader.py

示例5: init

 def __init__(self, root, fileids, tone, tag, wrap_etree=False):
     self.fileids = fileids
     self._wrap_etree = wrap_etree
     CorpusReader.__init__(self, root, fileids)
     self.tagged_sents = []
     self.sents = []
     self.words = []
     self.tagged_words = []
     self.option_tone = tone
     self.option_tag = tag

开发者ID:Batene，项目名称:Bamanankan，代码行数:10，代码来源:htmlreaderALL.py

示例6: init

 def __init__(self, root, fileids,
              syntax_parser=CaboChaParser(),
              word_tokenizer=MeCabTokenizer(),
              sent_tokenizer=jp_sent_tokenizer,
              case_parser=KNPParser(),
              encoding='utf-8'):
   CorpusReader.__init__(self, root, fileids, encoding)
   self._syntax_parser = syntax_parser
   self._word_tokenizer = word_tokenizer
   self._sent_tokenizer = sent_tokenizer
   self._case_parser = case_parser

开发者ID:miyamofigo，项目名称:Japanese-corpus-and-utility，代码行数:11，代码来源:corpus.py

示例7: init

 def __init__(self, root, zipfile, fileids):
     if isinstance(root, basestring):
         root = FileSystemPathPointer(root)
     elif not isinstance(root, PathPointer): 
         raise TypeError('CorpusReader: expected a string or a PathPointer')
     
     # convert to a ZipFilePathPointer
     root = ZipFilePathPointer(root.join(zipfile))
     
     CorpusReader.__init__(self, root, fileids)
     
     self._parse_char_replacements()

开发者ID:IMAmuseum，项目名称:getty-vocab-reconciliation，代码行数:12，代码来源:getty.py

示例8: init

    def __init__(self, root, fileids=PKL_PATTERN, **kwargs):
        """
        Initialize the corpus reader.  Categorization arguments
        (``cat_pattern``, ``cat_map``, and ``cat_file``) are passed to
        the ``CategorizedCorpusReader`` constructor.  The remaining arguments
        are passed to the ``CorpusReader`` constructor.
        """
        # Add the default category pattern if not passed into the class.
        if not any(key.startswith('cat_') for key in kwargs.keys()):
            kwargs['cat_pattern'] = CAT_PATTERN

        CategorizedCorpusReader.__init__(self, kwargs)
        CorpusReader.__init__(self, root, fileids)

开发者ID:yokeyong，项目名称:atap，代码行数:13，代码来源:reader.py

示例9: init

 def __init__(self, root, fileids, 
              sep='/', word_tokenizer=WhitespaceTokenizer(),
              sent_tokenizer=RegexpTokenizer('\n', gaps=True),
              encoding=None):
     """
     @param root: The root directory for this corpus.
     @param fileids: A list or regexp specifying the fileids in this corpus.
     """
     CorpusReader.__init__(self, root, fileids, encoding)
     self._sep = sep
     self._word_tokenizer = word_tokenizer
     self._sent_tokenizer = sent_tokenizer
     self._alignedsent_block_reader=None,
     self._alignedsent_block_reader = self._alignedsent_block_reader
     self._alignedsent_corpus_view = None

开发者ID:yochananmkp，项目名称:clir，代码行数:15，代码来源:aligned_corpus_reader.py

示例10: assemble_corpus

def assemble_corpus(corpus_reader: CorpusReader,
                    types_requested: List[str],
                    type_dirs: Dict[str, List[str]] = None,
                    type_files: Dict[str, List[str]] = None) -> CorpusReader:
    """
    Create a filtered corpus.
    :param corpus_reader: This get mutated
    :param types_requested: a list of string types, which are to be found in the type_dirs and
    type_files mappings
    :param type_dirs: a dict of corpus types to directories
    :param type_files: a dict of corpus types to files
    :return: a CorpusReader object containing only the mappings desired
    """
    fileid_names = []  # type: List[str]
    try:
        all_file_ids = list(corpus_reader.fileids())
        clean_ids_types = []  # type: List[Tuple[str, str]]
        if type_files:
            for key, valuelist in type_files.items():
                if key in types_requested:
                    for value in valuelist:
                        if value in all_file_ids:
                            if key:
                                clean_ids_types.append((value, key))
        if type_dirs:
            for key, valuelist in type_dirs.items():
                if key in types_requested:
                    for value in valuelist:
                        corrected_dir = value.replace('./', '')
                        corrected_dir = '{}/'.format(corrected_dir)
                        for name in all_file_ids:
                            if name and name.startswith(corrected_dir):
                                clean_ids_types.append((name, key))
        clean_ids_types.sort(key=lambda x: x[0])
        fileid_names, categories = zip(*clean_ids_types)  # type: ignore
        corpus_reader._fileids = fileid_names
        return corpus_reader
    except Exception:
        LOG.exception('failure in corpus building')

开发者ID:diyclassics，项目名称:cltk，代码行数:39，代码来源:readers.py

示例11: init

    def __init__(self, root, fileids=None,
                 word_tokenizer=TweetTokenizer(),
                 encoding='utf8'):
        """

        :param root: The root directory for this corpus.

        :param fileids: A list or regexp specifying the fileids in this corpus.

        :param word_tokenizer: Tokenizer for breaking the text of Tweets into
        smaller units, including but not limited to words.

        """
        CorpusReader.__init__(self, root, fileids, encoding)

        for path in self.abspaths(self._fileids):
            if isinstance(path, ZipFilePathPointer):
                pass
            elif os.path.getsize(path) == 0:
                raise ValueError("File {} is empty".format(path))
        """Check that all user-created corpus files are non-empty."""

        self._word_tokenizer = word_tokenizer

开发者ID:Weiming-Hu，项目名称:text-based-six-degree，代码行数:23，代码来源:twitter.py

示例12: init

    def __init__(self,
                 root,
                 fileids,
                 column_types=None,
                 top_node='S',
                 beginning_of_sentence=r'#BOS.+$',
                 end_of_sentence=r'#EOS.+$',
                 encoding=None):
        """ Construct a new corpus reader for reading NEGRA corpus files.
        @param root: The root directory of the corpus files.
        @param fileids: A list of or regex specifying the files to read from.
        @param column_types: An optional C{list} of columns in the corpus.
        @param top_node: The top node of parsed sentence trees.
        @param beginning_of_sentence: A regex specifying the start of a sentence
        @param end_of_sentence: A regex specifying the end of a sentence
        @param encoding: The default corpus file encoding.
        """

        # Make sure there are no invalid column type
        if isinstance(column_types, list):
            for column_type in column_types:
                if column_type not in self.COLUMN_TYPES:
                    raise ValueError("Column %r is not supported." % columntype)
        else:
            column_types = self.COLUMN_TYPES

        # Define stuff
        self._top_node = top_node
        self._column_types = column_types
        self._fileids = fileids
        self._bos = beginning_of_sentence
        self._eos = end_of_sentence
        self._colmap = dict((c,i) for (i,c) in enumerate(column_types))

        # Finish constructing by calling the extended class' constructor
        CorpusReader.__init__(self, root, fileids, encoding)

开发者ID:wroberts，项目名称:NLTK-Contributions，代码行数:36，代码来源:NegraCorpusReader.py

示例13: fileids

 def fileids(self, channels=None, domains=None, categories=None):
     if channels is not None and domains is not None and \
             categories is not None:
         raise ValueError('You can specify only one of channels, domains '
                          'and categories parameter at once')
     if channels is None and domains is None and \
             categories is None:
         return CorpusReader.fileids(self)
     if isinstance(channels, basestring):
         channels = [channels]
     if isinstance(domains, basestring):
         domains = [domains]
     if isinstance(categories, basestring):
         categories = [categories]
     if channels:
         return self._list_morph_files_by('channel', channels)
     elif domains:
         return self._list_morph_files_by('domain', domains)
     else:
         return self._list_morph_files_by('keyTerm', categories,
                 map=self._map_category)

开发者ID:B-Rich，项目名称:Fem-Coding-Challenge，代码行数:21，代码来源:ipipan.py

示例14: init

 def __init__(self, root, fileids):
     CorpusReader.__init__(self, root, fileids, None, None)

开发者ID:B-Rich，项目名称:Fem-Coding-Challenge，代码行数:2，代码来源:ipipan.py

示例15: init

 def __init__(self, root, fileids, wrap_etree=False):
     self._wrap_etree = wrap_etree
     CorpusReader.__init__(self, root, fileids)

开发者ID:NavinManaswi，项目名称:nltk，代码行数:3，代码来源:xmldocs.py

注：本文中的nltk.corpus.reader.api.CorpusReader类示例由纯净天空整理自Github/MSDocs等源码及文档管理平台，相关代码片段筛选自各路编程大神贡献的开源项目，源码版权归原作者所有，传播和使用请参考对应项目的License；未经允许，请勿转载。

鲜花

握手

雷人

路过

鸡蛋

该文章已有0人参与评论

请发表评论

全部评论

专题导读

More+

10-27 六六分期app的软件客服如何联系？(六六分期

11-06 可心卡盟:win10系统火狐flash插件崩溃怎么

11-06 亲亲特价:怎么删除回收站图标

11-06 济南大学虚拟社区:鲁大师节能降温的具体办

11-06 xlueops.exe:无线网络安装向导

11-06 女斗合众国:win7系统cf与主机连接不稳定怎

11-06 0xc000022-[cf烟雾头]cf怎么调烟雾头

11-06 qizideyouhuo:应用程序无法正常启动0xc0000

11-06 ipz-185:win7系统vcf文件怎么打开

11-06 傻哥蹦迪:win10系统s4怎么打开usb调试

11-06 八神浩树gtaste:回收站清空了怎么恢复

11-06 妖尾之黑色守护:win10系统电脑没有1440x900

11-06 校园至尊魔王小说:win7系统浏览网页时字体

11-06 女斗合众国:win10系统访问共享文件夹提示请

11-06 tokyo hot n0654:恢复win7系统默认字体一招

11-06 雨酷仙境:设置win7系统转移临时文件夹腾出

11-06 阿穆纳伊之杖:win7系统开始菜单在右边还原

11-06 tunespotting:win10系统火狐flash插件总是

11-06 甘尔葛分析师：计谋网站seo关键词暴涨有什

11-06 蔡贵霖: 计谋网站seo关键词暴涨有什么秘密

11-06 博益网首页:ao3网页版进入不了解决方法

11-06 漏斗子专栏: 网站数据分析小白易懂精华篇

11-06 见证双虹怎么做:win7系统开启telnet命令的

11-06 颾狐蝶蜋:系统资源不足无法完成请求的服务

11-06 国光中学校歌:提交网站到alexa查询详细步骤

11-06 西安有情天:静态网页和动态网页的区别

11-06 红木雅尚斋:外部链接构造对网站的好处

11-06 前官礼遇：防止域名劫持–增强域安全性的10

11-06 密传二转答案: 中文分词算法有哪些

11-06 金泉家园邮编:百度快照劫持的表现及应对方

Python util.concat函数代码示例发布时间：2022-05-27

Python reader.ChunkedCorpusReader类代码示例发布时间：2022-05-27

Python util.grid_equal函数代码示例

1 Python 入门教程

Python入门教程 Python 是一种解释型、面向对象、动态数据类型的高级程序设计语言。 P

阅读：13806|2022-01-22

2 Python wikiutil.getFrontPage函数代码示例

Python wikiutil.getFrontPage函数代码示例

阅读：10193|2022-05-24

3 Python 简介

Python 简介 Python 是一个高层次的结合了解释性、编译性、互动性和面向对象的脚本

阅读：4090|2022-01-22

4 Python tests.group函数代码示例

Python tests.group函数代码示例

阅读：4043|2022-05-27

5 Python util.check_if_user_has_permission

Python util.check_if_user_has_permission函数代码示例

阅读：3845|2022-05-27

6 Python 操练实例98

Python 练习实例98 Python 100例题目：从键盘输入一个字符串，将小写字母全部转换成大

阅读：3510|2022-01-22

7 Python 环境搭建

Python 环境搭建本章节我们将向大家介绍如何在本地搭建 Python 开发环境。 Py

阅读：3030|2022-01-22

8 Python output.darkgreen函数代码示例

Python output.darkgreen函数代码示例

阅读：2653|2022-05-25

9 Python 基础语法

Python 基础语法 Python 语言与 Perl，C 和 Java 等语言有许多相似之处。但是，也

阅读：2649|2022-01-22

10 Python 中文编码

Python 中文编码前面章节中我们已经学会了如何用 Python 输出 Hello, World!，英文没

阅读：2302|2022-01-22

客服电话

电子邮件

Python api.CorpusReader类代码示例

示例1: init

示例2: init

示例3: init

示例4: init

示例5: init

示例6: init

示例7: init

示例8: init

示例9: init

示例10: assemble_corpus

示例11: init

示例12: init

示例13: fileids

示例14: init

示例15: init

请发表评论

全部评论

上一篇：

下一篇：

Python util.grid_equal函数代码示例

Python util.get_worker_name函数代码示例

Python util.get_webmention_target函数代

Python util.get_uuid函数代码示例

Python util.get_type_by_name函数代码示例

Python util.grid_equal函数代码示例

Python util.get_worker_name函数代码示例

Python util.get_webmention_target函数代

Python util.get_uuid函数代码示例

Python util.get_type_by_name函数代码示例

Python util.get_stdout函数代码示例

关于我们

产品与服务

解决方案

139-2527-9053

客服电话

电子邮件

Python api.CorpusReader类代码示例

示例1: __init__

示例2: __init__

示例3: __init__

示例4: __init__

示例5: __init__

示例6: __init__

示例7: __init__

示例8: __init__

示例9: __init__

示例10: assemble_corpus

示例11: __init__

示例12: __init__

示例13: fileids

示例14: __init__

示例15: __init__

请发表评论

全部评论

上一篇：

下一篇：

Python util.grid_equal函数代码示例

Python util.get_worker_name函数代码示例

Python util.get_webmention_target函数代

Python util.get_uuid函数代码示例

Python util.get_type_by_name函数代码示例

Python util.grid_equal函数代码示例

Python util.get_worker_name函数代码示例

Python util.get_webmention_target函数代

Python util.get_uuid函数代码示例

Python util.get_type_by_name函数代码示例

Python util.get_stdout函数代码示例

关于我们

产品与服务

解决方案

139-2527-9053

示例1: init

示例2: init

示例3: init

示例4: init

示例5: init

示例6: init

示例7: init

示例8: init

示例9: init

示例11: init

示例12: init

示例14: init

示例15: init