Python regex.finditer函数代码示例

OGeek|极客世界-中国程序员成长平台 › 门户 › 编程› Python›Python编程经验

原作者: [db:作者] 来自: [db:来源] 收藏邀请

本文整理汇总了Python中regex.finditer函数的典型用法代码示例。如果您正苦于以下问题：Python finditer函数的具体用法？Python finditer怎么用？Python finditer使用的例子？那么恭喜您, 这里精选的函数代码示例或许可以为您提供帮助。

在下文中一共展示了finditer函数的20个代码示例，这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞，您的评价将有助于我们的系统推荐出更棒的Python代码示例。

示例1: _match_by_edit_distance

def _match_by_edit_distance(full_text, text_to_match):
    text_to_match = text_to_match.replace("-LRB-", "(").replace("-RRB-", ")")
    text_to_match = text_to_match.replace("-LCB-", "{").replace("-RCB-", "}")
    text_to_match = re.sub(r'\[\\\]\\\)\]$', ')', text_to_match)

    try:
        end_point = (text_to_match.index(" ") if " " in text_to_match else len(text_to_match))
        potential_matches = [full_text[m.start():(m.start() + len(text_to_match) + 1)] for m in 
                             re.finditer(re.escape(text_to_match[0:end_point]), full_text, re.U | re.I)]
    except:
        import sys

        print(full_text)
        print()
        print(text_to_match)
        sys.exit(1)
        
    if len(potential_matches) == 0:
        potential_matches = [full_text[m.start():(m.start() + len(text_to_match) + 1)] for m in 
                             re.finditer(re.escape(text_to_match[0]), full_text, re.U)]
    if len(potential_matches) == 0:
        text_to_match = text_to_match.replace("(", "[")
        potential_matches = [full_text[m.start():(m.start() + len(text_to_match) + 1)] for m in 
                             re.finditer(re.escape(text_to_match[0]), full_text, re.U)]

    potential_matches = [(p[0:p.rindex(text_to_match[-1])+1] 
                          if text_to_match[-1] in p and len(p) > len(text_to_match)
                          else p)
                         for p in potential_matches]

    if len(potential_matches) == 0:
        # No idea why this would ever happen, but it does
        return text_to_match

    match_with_lowest_edit_distance = ""
    lowest_edit_distance = -1
    for match in potential_matches:
        e_d = edit_distance(match, text_to_match)
        if lowest_edit_distance == -1 or e_d <= lowest_edit_distance:
            lowest_edit_distance = e_d
            match_with_lowest_edit_distance = match

    result = match_with_lowest_edit_distance.strip()
    if text_to_match[-1] in result:
        while result[-1] != text_to_match[-1]:
            result = result[0:-1]
    elif text_to_match[-1] == '"' and re.search(r'["”\u201d]', result):
        while result[-1] not in ['"', '”', "\u201d"]:
            result = result[0:-1]
    elif text_to_match[-1] not in [']', '}', ')'] and text_to_match[-2:] != "..":
        while result[-1] != text_to_match[-1]:
            result += full_text[full_text.index(result) + len(result)][-1]

    return result

开发者ID:EducationalTestingService，项目名称:match，代码行数:54，代码来源:Match.py

示例2: _sustituirReg

 def _sustituirReg(self,aux,regOrig,regDest):
     #Resulta necesario el uso de regex que es mas potente y en ambos sentidos
     regAux="^"+regOrig+"$"
     for m in regex.finditer(regAux,aux,overlapped=True):
         if self.parentesisCoherentes1(m.groupdict()):
             aux2=regDest.format(**agregaCTX(m.groupdict())) + "\n"
             return aux2
     regAux="(?r)^"+regOrig+"$"
     for m in regex.finditer(regAux,aux,overlapped=True):
         if self.parentesisCoherentes1(m.groupdict()):
             aux2=regDest.format(**agregaCTX(m.groupdict())) + "\n"
             return aux2
     return aux

开发者ID:yeboyebo，项目名称:AQNext，代码行数:13，代码来源:Flujo.py

示例3: search_strand

def search_strand(pattern, sequence_to_scan, strand=1):
    '''
    take a sequence pattern (element) and find occurrences of that on the 
    provided, larger 5'-->3' sequence.
    Assumes strand is first unless provided.

    Tracks the start and end points of each occurrence, returning a list of
    that information where each element is a tuple of the start and end points
    along with the strand.

    Works with overlapped sequences because now 
    "regex.findall and regex.finditer support an ‘overlapped’ flag which 
    permits overlapped matches."
    , see https://pypi.python.org/pypi/regex/2018.02.21

    based on https://www.biostars.org/p/209383/ (specifically steve's answer)
    '''
    occurrences = []
    for match in regex.finditer(
        pattern.upper(), str(sequence_to_scan.upper()),overlapped=True):
        if strand == 1:
            start_pos = match.start() + 1
            end_pos = match.end() + 1
        else:
            start_pos = (len(sequence_to_scan) - match.start() ) + 1
            end_pos = (len(sequence_to_scan) - match.end() ) + 1
        # print (start_pos, '\t', end_pos, '\t',strand) # for debugging
        occurrences.append((start_pos, end_pos,strand))
    return occurrences

开发者ID:fomightez，项目名称:sequencework，代码行数:29，代码来源:find_sequence_element_occurrences_in_sequence.py

示例4: plx_wrapper

    def plx_wrapper(text):
        before = text
        text = delimToPanlex(text)
        idx_list = [ex_match.start() for ex_match in re.finditer('⫷(?:ex|df)(?::\w{1,4}-\d{1,3})?⫸', text)]
        if len(idx_list) == 0:
            return process_synonyms(proc)(text)

        idx_list.append( len(text))
        if len(text[ 0:idx_list[0] ].strip()) > 0:
            idx_list.insert(0,0)
        final_exp = []

        for idx in range(len(idx_list) - 1):
            ex = text[ idx_list[idx] : idx_list[idx+1]]

            tag,ex_text,attributes = get_plx_fields(ex)
            result = proc(ex_text)
            result_match = re.search('(⫷(?:ex|df)(?::\w{1,4}-\d{1,3})?⫸)(.+)', result)
            
            if result_match:
                if len(result_match[1].strip()) > 0:
                    final_exp.append('%s%s' % (result,attributes))
            else:
                if len(result.strip()) > 0:
                    final_exp.append('%s%s%s' % (tag,result,attributes))

        final_exp = filter_unique_meanings(final_exp)
        text = ''.join(final_exp)
        return text

开发者ID:longnow，项目名称:panlex-tools，代码行数:29，代码来源:source.py

示例5: init

    def __init__(self, names, features={}, ftstr='', weights=None):
        """Construct a `Segment` object

        Args:
            names (list): ordered list of feature names
            features (dict): name-value pairs for specified features
            ftstr (unicode): a string, each /(+|0|-)\w+/ sequence of which is
                             interpreted as a feature specification
            weights (float): order list of feature weights/saliences
            """
        self.n2s = {-1: '-', 0: '0', 1: '+'}
        self.s2n = {k: v for (v, k) in self.n2s.items()}
        self.names = names
        """Set a feature specification"""
        self.data = {}
        for name in names:
            if name in features:
                self.data[name] = features[name]
            else:
                self.data[name] = 0
        for m in re.finditer(r'(\+|0|-)(\w+)', ftstr):
            v, k = m.groups()
            self.data[k] = self.s2n[v]
        if weights:
            self.weights = weights
        else:
            self.weights = [1 for _ in names]

开发者ID:dmort27，项目名称:panphon，代码行数:27，代码来源:segment.py

示例6: tokenize

    def tokenize(self, value):
        """
        Perform the tokenizing.

        Required Argument
        value -- The unicode string to tokenize.

        """
        t = Token()  # The token instance we will reuse
        if not self._gaps:
            # The default: expression matches are used as tokens
            for pos, match in enumerate(self._regexp.finditer(value)):
                yield t.update(match.group(0), index=(match.start(), match.end(),), position=pos)
        else:
            # When gaps=True, iterate through the matches and
            # yield the text between them.
            left = 0
            last_pos = 0
            for pos, match in enumerate(regex.finditer(self._regexp, value)):
                right, next = match.span()
                if right != 0:
                    yield t.update(value[left:right], position=pos, index=(left, right,))
                left = next
                last_pos = pos
            if left != len(value):
                yield t.update(value[left:], position=last_pos+1, index=(left, len(value),))

开发者ID:Kapiche，项目名称:caterpillar，代码行数:26，代码来源:tokenize.py

示例7: determine_match

def determine_match(commentary_name, commentary_regex):
    issues = 0
    full_pattern = u'{} (?>[@!/*][^ ]* )*(?P<dh>[^ ]+)'.format(commentary_regex)
    full_mechaber = Root('../../Even_HaEzer.xml').get_base_text()
    error_counter = Counter()

    for siman_num, siman in enumerate(full_mechaber.get_simanim()):
        for seif_num, seif in enumerate(siman.get_child()):
            matches = regex.finditer(full_pattern, unicode(seif))

            for regex_match in matches:
                c_ref = Ref(u'{} {}:{}'.format(commentary_name, siman_num+1, getGematria(regex_match.group('ref'))))
                try:
                    c_text = c_ref.text('he').text.split()[0]
                except IndexError:
                    continue
                c_text = re.sub(u'[^\u05d0-\u05ea]', u'', c_text)
                dh_text = re.sub(u'[^\u05d0-\u05ea]', u'',regex_match.group('dh'))

                ratio = fuzz.ratio(dh_text, c_text)

                if ratio < 75.0:
                    issues += 1
                    print u"Potential mismatch:"
                    print u"Shulchan Arukh, Even HaEzer {}:{}   {}".format(siman_num+1, seif_num+1, dh_text)
                    print u"{}   {}".format(c_ref.normal(), c_text)
                    print u"Score: {}".format(ratio)
                    error_counter[(dh_text, c_text)] += 1
    print u"Total issues: {}".format(issues)
    return error_counter

开发者ID:JonMosenkis，项目名称:Sefaria-Data，代码行数:30，代码来源:validate_existing_commentaries.py

示例8: tokenize

def tokenize(text):

    """
    Yield tokens.

    Args:
        text (str): The original text.

    Yields:
        dict: The next token.
    """

    stem = SnowballStemmer('norwegian').stem
    tokens = regex.finditer('\p{L}+', text.lower())


    for offset, match in enumerate(tokens):

        # Get the raw token.
        unstemmed = match.group(0)

        yield { # Emit the token.
            'stemmed':      stem(unstemmed),
            'unstemmed':    unstemmed,
            'offset':       offset
        }

开发者ID:kmelve，项目名称:textplot，代码行数:26，代码来源:utils.py

示例9: _match

    def _match(self, text):
        matches = []
        if self.mapping:
            seq = self.map.keys()
        else:
            seq = self.regex_sequence

        for r in seq:
            for matchobj in re.finditer(r, text, overlapped=True):
                groups = (matchobj.groupdict())
                result = {
                    'start': matchobj.start(),
                    'end': matchobj.end(),
                    'regex': r,
                    'groups':groups
                }

                if self.mapping:
                    for k, v in self.map[r].items():
                        if k not in result.keys():
                            result[k] = v

                matches.append(
                    result
                )

        return matches

开发者ID:estnltk，项目名称:estnltk，代码行数:27，代码来源:event_tagger.py

示例10: parse_ind_vars

    def parse_ind_vars(self):
        """Define values of independent variables by parsing first example of form:
        [var name]  [value 0]
                    [value 1]
                    ...
                    [value n]
                    [blank]
                *or*
        [text]
        :return:
        """
        selem = np.array([[1, 0],
                         [1, 1]])
        # some weird bug with np.pad and string dtype
        s_type = 'S%d' % (max([len(x) for y in self.sheet for x in y]) + 10)
        xs_values = np.zeros(np.array(self.sheet.shape) + 1, dtype=s_type)
        xs_values[:-1, :-1] = self.sheet

        mask = (xs_values[:, :2] != '').astype(int)
        mask[:, 1] *= 2
        mask_string = ''.join(['ABCD'[i] for i in mask.sum(axis=1)])

        ind_vars = {}
        for x in re.finditer('(DC+)[ABD]', mask_string):
            name = xs_values[x.span()[0], 0]
            values = xs_values[x.span()[0]:x.span()[1] - 1, 1]
            ind_vars[name] = list(values)

        self.ind_vars.update(ind_vars)

开发者ID:feldman4，项目名称:lasagna，代码行数:29，代码来源:conditions_.py

示例11: parse

    def parse(self, data, regex = None, encoding = "utf-8"):
        regex = regex or self.master

        is_unicode = appier.legacy.is_unicode(data)
        if not is_unicode: data = data.decode(encoding)

        nodes = []
        matches = regex.finditer(data)

        current = 0

        for match in matches:
            name = match.lastgroup
            parts = match.groupdict()

            start, end = match.span()
            if start > current:
                value = data[current:start]
                value = value.replace("\r", "")
                value = value.replace("\n", " ")
                if value: nodes.append(value)

            method = getattr(self, "parse_" + name)
            node = method(parts)
            nodes.append(node)

            current = end

        remaining = data[current:]
        remaining = remaining.replace("\r", "")
        remaining = remaining.replace("\n", " ")
        if remaining: nodes.append(remaining)

        return nodes

开发者ID:hivesolutions，项目名称:appier_extras，代码行数:34，代码来源:markdown.py

示例12: findall_p_in_s

 def findall_p_in_s(p,s):
     """"returns a series of matches for a pattern (p) in a str (s)"""""
     match_strs = regex.findall(p,s)
     #get pairs of left and right indexes
     match_indexes = [(i.start(0),i.end(0)) for i in regex.finditer(p,s)]
     all_p_in_s = [Match(match_strs[i],match_indexes[i][0],match_indexes[i][1]) for i in range(0,len(match_strs))]
     return all_p_in_s

开发者ID:apathinwalking，项目名称:tidyaddr，代码行数:7，代码来源:match.py

示例13: _find

 def _find(self, *args):
   global index
   try:
     for match in regex.finditer(u'^.+?:\d+?:.*%s.*$' % args[0], index.data, regex.MULTILINE | regex.IGNORECASE | regex.V1, concurrent=True):
       self._print(match.group(0))
   except sre_constants.error, e:
     print

开发者ID:taky，项目名称:textile，代码行数:7，代码来源:textile.py

示例14: prune_by_precision

    def prune_by_precision(self, min_precision, text_data_pairs):
        """
        Removes patterns from the model that don't reach a minimum precision

        :param float min_precision: the minimum precision required of a pattern when applied to the given data
        :param collections.Iterable text_data_pairs: an iterable of `(text, data)` pairs where `text` is a string and
            `data` is an anafora.AnaforaData object
        """
        pattern_scores = collections.defaultdict(lambda: anafora.evaluate.Scores())
        for text, data in text_data_pairs:

            # collect the spans of each type of reference annotation
            reference_type_spans_map = collections.defaultdict(lambda: set())
            for annotation in data.annotations:
                reference_type_spans_map[annotation.type].add(annotation.spans)

            # make predictions with each pattern in the model
            for pattern in self.regex_type_attributes_map:
                predicted_spans = {((m.start(), m.end()),) for m in regex.finditer(pattern, text)}
                if predicted_spans:
                    predicted_type, _ = self.regex_type_attributes_map[pattern]

                    # update the scores for this pattern
                    pattern_scores[pattern].add(reference_type_spans_map[predicted_type], predicted_spans)

        # delete any pattern with a precision lower than the minimum requested
        for pattern, scores in pattern_scores.items():
            if scores.precision() < min_precision:
                del self.regex_type_attributes_map[pattern]

开发者ID:bethard，项目名称:anaforatools，代码行数:29，代码来源:regex.py

示例15: process_verses

def process_verses(chap_string, expression):
    """
    Take an entire chapters as a string and break up into verses. The new chapter index (number followed by
    a space) must be stripped out.

    :param chap_string: All verses in a chapter combined as one string.
    :param expression: A compiled regular expression with which to find new verses.
    :return: A list of strings (jagged array), with each verse as a separate string.
    """

    # find all new verses with the regular expression
    matches = expression.finditer(chap_string)

    # save start position of first verse and initiate list of verses
    try:
        start = next(matches)
    except StopIteration:
        return [chap_string]
    verses = []

    # loop through matches until StopIteration is raised at the last verse
    while True:
        try:
            end = next(matches)
            verses.append(chap_string[start.end()-1:end.start()])
            start = end

        except StopIteration:
            verses.append(chap_string[start.end()-1:])
            break

    # error correction - look for numbers in each verse and compare to verse number
    # This will differentiate between incorrectly formatted verses numbers and other numbers in the text.
    corrected_verses = []
    for index, verse in enumerate(verses):
        nums = re.finditer(u'\d{1,3} ', verse)
        good = True

        for num in nums:
            if int(num.group()) - index == 2:

                # add first verse
                corrected_verses.append(verse[:num.start()])

                # edit second verse
                second = verse[num.start():]
                second = second.replace(num.group(), num.group()[:len(num.group())])
                corrected_verses.append(second)
                good = False
                break

        if good:
            corrected_verses.append(verse)

    # strip out the * marker used to help differentiate numbers and verses
    for index, verse in enumerate(corrected_verses):
        corrected_verses[index] = verse.replace(u'*', u'')

    return corrected_verses

开发者ID:JonMosenkis，项目名称:Sefaria-Data，代码行数:59，代码来源:JPS1985.py

示例16: finditer

 def finditer(
     pattern, string, flags=0, pos=None, endpos=None, overlapped=False,
     partial=False, concurrent=None, **kwargs
 ):
     return regex.finditer(
         _apply_search_backrefs(pattern, flags), string,
         flags, pos, endpos, overlapped, partial, concurrent, **kwargs
     )

开发者ID:lupescu，项目名称:subtxt_pkg，代码行数:8，代码来源:bregex.py

示例17: find_mofit

def find_mofit(s, t):
    """
    :return: All locations of t as a substring of s.
    :param: s, t Two DNA strings
    """
    matches = finditer(t, s, overlapped=True)
    ans = [str(element.start() + 1) for element in matches]
    return " ".join(ans)

开发者ID:jd901215，项目名称:Rosalind，代码行数:8，代码来源:find_mofit.py

示例18: MultipleApproxPatternMatch

def MultipleApproxPatternMatch(inputset):
    patterns, d = inputset
    all_matched_index = []
    for eachp in patterns:
        reg = GetRE(eachp, d)
        for m in regex.finditer(reg, text, overlapped = True):
            all_matched_index.append(m.start())
    return all_matched_index

开发者ID:ajing，项目名称:BioinfoAlgorithm，代码行数:8，代码来源:MultipleApproxPatternMatch.py

示例19: glycocheck

def glycocheck(protein):
    ind = []
    import regex as re

    matches = re.finditer("N[^P][ST][^P]", protein, overlapped=True)
    for i in matches:
        ind.append(i.start() + 1)
    return ind

开发者ID:eignatenkov，项目名称:rosalind，代码行数:8，代码来源:mprt.py

示例20: get_word_groups_from_line

    def get_word_groups_from_line(self, song_line):
        word_groups = []
        # for match in re.finditer(r"(?:^|\s)([A-Za-z,'! &?.]+)(?:\s|$)", song_line):
        for match in re.finditer(r"(?:^|\s)([^-0-9;]+)(?:\s|$)", song_line):
            # print(song_line + " => '" + m.group(0).strip() + "'")
            word_group = Group(match.start(), match.group(0).strip())
            word_groups.append(word_group)

        return word_groups

开发者ID:JonAWhite，项目名称:choir_tracker，代码行数:9，代码来源:song_list_extractor.py

注：本文中的regex.finditer函数示例由纯净天空整理自Github/MSDocs等源码及文档管理平台，相关代码片段筛选自各路编程大神贡献的开源项目，源码版权归原作者所有，传播和使用请参考对应项目的License；未经允许，请勿转载。

鲜花

握手

雷人

路过

鸡蛋

该文章已有0人参与评论

请发表评论

全部评论

专题导读

More+

10-27 六六分期app的软件客服如何联系？(六六分期

11-06 可心卡盟:win10系统火狐flash插件崩溃怎么

11-06 亲亲特价:怎么删除回收站图标

11-06 济南大学虚拟社区:鲁大师节能降温的具体办

11-06 xlueops.exe:无线网络安装向导

11-06 女斗合众国:win7系统cf与主机连接不稳定怎

11-06 0xc000022-[cf烟雾头]cf怎么调烟雾头

11-06 qizideyouhuo:应用程序无法正常启动0xc0000

11-06 ipz-185:win7系统vcf文件怎么打开

11-06 傻哥蹦迪:win10系统s4怎么打开usb调试

11-06 八神浩树gtaste:回收站清空了怎么恢复

11-06 妖尾之黑色守护:win10系统电脑没有1440x900

11-06 校园至尊魔王小说:win7系统浏览网页时字体

11-06 女斗合众国:win10系统访问共享文件夹提示请

11-06 tokyo hot n0654:恢复win7系统默认字体一招

11-06 雨酷仙境:设置win7系统转移临时文件夹腾出

11-06 阿穆纳伊之杖:win7系统开始菜单在右边还原

11-06 tunespotting:win10系统火狐flash插件总是

11-06 甘尔葛分析师：计谋网站seo关键词暴涨有什

11-06 蔡贵霖: 计谋网站seo关键词暴涨有什么秘密

11-06 博益网首页:ao3网页版进入不了解决方法

11-06 漏斗子专栏: 网站数据分析小白易懂精华篇

11-06 见证双虹怎么做:win7系统开启telnet命令的

11-06 颾狐蝶蜋:系统资源不足无法完成请求的服务

11-06 国光中学校歌:提交网站到alexa查询详细步骤

11-06 西安有情天:静态网页和动态网页的区别

11-06 红木雅尚斋:外部链接构造对网站的好处

11-06 前官礼遇：防止域名劫持–增强域安全性的10

11-06 密传二转答案: 中文分词算法有哪些

11-06 金泉家园邮编:百度快照劫持的表现及应对方

Python regex.match函数代码示例发布时间：2022-05-26

Python regex.findall函数代码示例发布时间：2022-05-26

Python util.grid_equal函数代码示例

1 Python 入门教程

Python入门教程 Python 是一种解释型、面向对象、动态数据类型的高级程序设计语言。 P

阅读：13791|2022-01-22

2 Python wikiutil.getFrontPage函数代码示例

Python wikiutil.getFrontPage函数代码示例

阅读：10178|2022-05-24

3 Python 简介

Python 简介 Python 是一个高层次的结合了解释性、编译性、互动性和面向对象的脚本

阅读：4078|2022-01-22

4 Python tests.group函数代码示例

Python tests.group函数代码示例

阅读：4040|2022-05-27

5 Python util.check_if_user_has_permission

Python util.check_if_user_has_permission函数代码示例

阅读：3836|2022-05-27

6 Python 操练实例98

Python 练习实例98 Python 100例题目：从键盘输入一个字符串，将小写字母全部转换成大

阅读：3509|2022-01-22

7 Python 环境搭建

Python 环境搭建本章节我们将向大家介绍如何在本地搭建 Python 开发环境。 Py

阅读：3029|2022-01-22

8 Python output.darkgreen函数代码示例

Python output.darkgreen函数代码示例

阅读：2646|2022-05-25

9 Python 基础语法

Python 基础语法 Python 语言与 Perl，C 和 Java 等语言有许多相似之处。但是，也

阅读：2639|2022-01-22

10 Python 中文编码

Python 中文编码前面章节中我们已经学会了如何用 Python 输出 Hello, World!，英文没

阅读：2295|2022-01-22

客服电话

电子邮件

Python regex.finditer函数代码示例

示例1: _match_by_edit_distance

示例2: _sustituirReg

示例3: search_strand

示例4: plx_wrapper

示例5: __init__

示例6: tokenize

示例7: determine_match

示例8: tokenize

示例9: _match

示例10: parse_ind_vars

示例11: parse

示例12: findall_p_in_s

示例13: _find

示例14: prune_by_precision

示例15: process_verses

示例16: finditer

示例17: find_mofit

示例18: MultipleApproxPatternMatch

示例19: glycocheck

示例20: get_word_groups_from_line

请发表评论

全部评论

上一篇：

下一篇：

Python util.grid_equal函数代码示例

Python util.get_worker_name函数代码示例

Python util.get_webmention_target函数代

Python util.get_uuid函数代码示例

Python util.get_type_by_name函数代码示例

Python util.grid_equal函数代码示例

Python util.get_worker_name函数代码示例

Python util.get_webmention_target函数代

Python util.get_uuid函数代码示例

Python util.get_type_by_name函数代码示例

Python util.get_stdout函数代码示例

关于我们

产品与服务

解决方案

139-2527-9053

示例5: init