Python sigil_bs4.BeautifulSoup类代码示例

OGeek|极客世界-中国程序员成长平台 › 门户 › 编程› Python›Python编程经验

原作者: [db:作者] 来自: [db:来源] 收藏邀请

本文整理汇总了Python中sigil_bs4.BeautifulSoup类的典型用法代码示例。如果您正苦于以下问题：Python BeautifulSoup类的具体用法？Python BeautifulSoup怎么用？Python BeautifulSoup使用的例子？那么恭喜您, 这里精选的类代码示例或许可以为您提供帮助。

在下文中一共展示了BeautifulSoup类的8个代码示例，这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞，您的评价将有助于我们的系统推荐出更棒的Python代码示例。

示例1: performPageMapUpdates

def performPageMapUpdates(data, currentdir, keylist, valuelist):
    data = _remove_xml_header(data)
    # lxml on a Mac does not seem to handle full unicode properly, so encode as utf-8
    data = data.encode('utf-8')
    # rebuild serialized lookup dictionary of xml_updates properly adjusted
    updates = {}
    for i in range(0, len(keylist)):
        updates[ keylist[i] ] = "../" + valuelist[i]
    xml_empty_tags = ["page"]
    xmlbuilder = LXMLTreeBuilderForXML(parser=None, empty_element_tags=xml_empty_tags)
    soup = BeautifulSoup(data, features=None, from_encoding="utf-8", builder=xmlbuilder)
    for tag in soup.find_all(["page"]):
        for att in ["href"]:
            if att in tag.attrs :
                ref = tag[att]
                if ref.find(":") == -1 :
                    parts = ref.split('#')
                    url = parts[0]
                    fragment = ""
                    if len(parts) > 1:
                        fragment = parts[1]
                    bookrelpath = os.path.join(currentdir, unquoteurl(url))
                    bookrelpath = os.path.normpath(bookrelpath)
                    bookrelpath = bookrelpath.replace(os.sep, "/")
                    if bookrelpath in updates:
                        attribute_value = updates[bookrelpath]
                        if fragment != "":
                            attribute_value = attribute_value + "#" + fragment
                        attribute_value = quoteurl(attribute_value)
                        tag[att] = attribute_value
    newdata = soup.decodexml(indent_level=0, formatter='minimal', indent_chars="  ")
    return newdata

开发者ID:hobo71，项目名称:Sigil，代码行数:32，代码来源:xmlprocessor.py

示例2: performNCXSourceUpdates

def performNCXSourceUpdates(data, currentdir, keylist, valuelist):
    # rebuild serialized lookup dictionary
    updates = {}
    for i in range(0, len(keylist)):
        updates[ keylist[i] ] = valuelist[i]
    xmlbuilder = LXMLTreeBuilderForXML(parser=None, empty_element_tags=ebook_xml_empty_tags)
    soup = BeautifulSoup(data, features=None, builder=xmlbuilder)
    for tag in soup.find_all("content"):
        if "src" in tag.attrs:
            src = tag["src"]
            if src.find(":") == -1:
                parts = src.split('#')
                url = parts[0]
                fragment = ""
                if len(parts) > 1:
                    fragment = parts[1]
                bookrelpath = os.path.join(currentdir, unquoteurl(url))
                bookrelpath = os.path.normpath(bookrelpath)
                bookrelpath = bookrelpath.replace(os.sep, "/")
                if bookrelpath in updates:
                    attribute_value = updates[bookrelpath]
                    if fragment != "":
                        attribute_value = attribute_value + "#" + fragment
                    attribute_value = quoteurl(attribute_value)
                    tag["src"] = attribute_value
    newdata = soup.decodexml(indent_level=0, formatter='minimal', indent_chars="  ")
    return newdata

开发者ID:uchuugaka，项目名称:Sigil，代码行数:27，代码来源:xmlprocessor.py

示例3: diagnose

def diagnose(data):
    """Diagnostic suite for isolating common problems."""
    print("Diagnostic running on Beautiful Soup %s" % __version__)
    print("Python version %s" % sys.version)

    basic_parsers = ["html.parser", "html5lib", "lxml"]
    for name in basic_parsers:
        for builder in builder_registry.builders:
            if name in builder.features:
                break
        else:
            basic_parsers.remove(name)
            print((
                "I noticed that %s is not installed. Installing it may help." %
                name))

    if 'lxml' in basic_parsers:
        basic_parsers.append(["lxml", "xml"])
        try:
            from lxml import etree
            print("Found lxml version %s" % ".".join(map(str,etree.LXML_VERSION)))
        except ImportError as e:
            print (
                "lxml is not installed or couldn't be imported.")


    if 'html5lib' in basic_parsers:
        try:
            import html5lib
            print("Found html5lib version %s" % html5lib.__version__)
        except ImportError as e:
            print (
                "html5lib is not installed or couldn't be imported.")

    if hasattr(data, 'read'):
        data = data.read()
    elif os.path.exists(data):
        print('"%s" looks like a filename. Reading data from the file.' % data)
        data = open(data).read()
    elif data.startswith("http:") or data.startswith("https:"):
        print('"%s" looks like a URL. Beautiful Soup is not an HTTP client.' % data)
        print("You need to use some other library to get the document behind the URL, and feed that document to Beautiful Soup.")
        return
    print()

    for parser in basic_parsers:
        print("Trying to parse your markup with %s" % parser)
        success = False
        try:
            soup = BeautifulSoup(data, parser)
            success = True
        except Exception as e:
            print("%s could not parse the markup." % parser)
            traceback.print_exc()
        if success:
            print("Here's what %s did with the markup:" % parser)
            print(soup.prettify())

        print("-" * 80)

开发者ID:Sigil-Ebook，项目名称:Sigil，代码行数:59，代码来源:diagnose.py

示例4: repairXML

def repairXML(data, self_closing_tags=ebook_xml_empty_tags, indent_chars="  "):
    data = _remove_xml_header(data)
    # lxml on a Mac does not seem to handle full unicode properly, so encode as utf-8
    data = data.encode('utf-8')
    xmlbuilder = LXMLTreeBuilderForXML(parser=None, empty_element_tags=self_closing_tags)
    soup = BeautifulSoup(data, features=None, from_encoding="utf-8", builder=xmlbuilder)
    newdata = soup.decodexml(indent_level=0, formatter='minimal', indent_chars=indent_chars)
    return newdata

开发者ID:EKDG-SAM，项目名称:Sigil，代码行数:8，代码来源:xmlprocessor.py

示例5: repairXML

def repairXML(data, mtype="", indent_chars="  "):
    data = _remove_xml_header(data)
    data = _make_it_sane(data)
    voidtags = get_void_tags(mtype)
    # lxml on a Mac does not seem to handle full unicode properly, so encode as utf-8
    data = data.encode('utf-8')
    xmlbuilder = LXMLTreeBuilderForXML(parser=None, empty_element_tags=voidtags)
    soup = BeautifulSoup(data, features=None, from_encoding="utf-8", builder=xmlbuilder)
    newdata = soup.decodexml(indent_level=0, formatter='minimal', indent_chars=indent_chars)
    return newdata

开发者ID:JksnFst，项目名称:Sigil，代码行数:10，代码来源:xmlprocessor.py

示例6: repairXML

def repairXML(data, mtype="", indent_chars="  "):
    newdata = _remove_xml_header(data)
    # if well-formed - don't mess with it
    if _well_formed(newdata):
        return data
    newdata = _make_it_sane(newdata)
    if not _well_formed(newdata):
        newdata = _reformat(newdata)
        if mtype == "application/oebps-package+xml":
            newdata = newdata.decode('utf-8')
            newdata = Opf_Parser(newdata).rebuild_opfxml()
    # lxml requires utf-8 on Mac, won't work with unicode
    if isinstance(newdata, str):
        newdata = newdata.encode('utf-8')
    voidtags = get_void_tags(mtype)
    xmlbuilder = LXMLTreeBuilderForXML(parser=None, empty_element_tags=voidtags)
    soup = BeautifulSoup(newdata, features=None, from_encoding="utf-8", builder=xmlbuilder)
    newdata = soup.decodexml(indent_level=0, formatter='minimal', indent_chars=indent_chars)
    return newdata

开发者ID:Sigil-Ebook，项目名称:Sigil，代码行数:19，代码来源:xmlprocessor.py

示例7: anchorNCXUpdates

def anchorNCXUpdates(data, originating_filename, keylist, valuelist):
    # rebuild serialized lookup dictionary
    id_dict = {}
    for i in range(0, len(keylist)):
        id_dict[ keylist[i] ] = valuelist[i]
    xmlbuilder = LXMLTreeBuilderForXML(parser=None, empty_element_tags=ebook_xml_empty_tags)
    soup = BeautifulSoup(data, features=None, builder=xmlbuilder)
    original_filename_with_relative_path = TEXT_FOLDER_NAME  + "/" + originating_filename
    for tag in soup.find_all("content"):
        if "src" in tag.attrs:
            src = tag["src"]
            if src.find(":") == -1:
                parts = src.split('#')
                if (parts is not None) and (len(parts) > 1) and (parts[0] == original_filename_with_relative_path) and (parts[1] != ""):
                    fragment_id = parts[1]
                    if fragment_id in id_dict:
                        attribute_value = TEXT_FOLDER_NAME + "/" + quoteurl(id_dict[fragment_id]) + "#" + fragment_id
                        tag["src"] = attribute_value
    newdata = soup.decodexml(indent_level=0, formatter='minimal', indent_chars="  ")
    return newdata

开发者ID:uchuugaka，项目名称:Sigil，代码行数:20，代码来源:xmlprocessor.py

示例8: repairXML

def repairXML(data, self_closing_tags=ebook_xml_empty_tags, indent_chars="  "):
    xmlbuilder = LXMLTreeBuilderForXML(parser=None, empty_element_tags=self_closing_tags)
    soup = BeautifulSoup(data, features=None, builder=xmlbuilder)
    newdata = soup.decodexml(indent_level=0, formatter='minimal', indent_chars=indent_chars)
    return newdata

开发者ID:uchuugaka，项目名称:Sigil，代码行数:5，代码来源:xmlprocessor.py

注：本文中的sigil_bs4.BeautifulSoup类示例由纯净天空整理自Github/MSDocs等源码及文档管理平台，相关代码片段筛选自各路编程大神贡献的开源项目，源码版权归原作者所有，传播和使用请参考对应项目的License；未经允许，请勿转载。

鲜花

握手

雷人

路过

鸡蛋

该文章已有0人参与评论

请发表评论

全部评论

专题导读

More+

10-27 六六分期app的软件客服如何联系？(六六分期

11-06 可心卡盟:win10系统火狐flash插件崩溃怎么

11-06 亲亲特价:怎么删除回收站图标

11-06 济南大学虚拟社区:鲁大师节能降温的具体办

11-06 xlueops.exe:无线网络安装向导

11-06 女斗合众国:win7系统cf与主机连接不稳定怎

11-06 0xc000022-[cf烟雾头]cf怎么调烟雾头

11-06 qizideyouhuo:应用程序无法正常启动0xc0000

11-06 ipz-185:win7系统vcf文件怎么打开

11-06 傻哥蹦迪:win10系统s4怎么打开usb调试

11-06 八神浩树gtaste:回收站清空了怎么恢复

11-06 妖尾之黑色守护:win10系统电脑没有1440x900

11-06 校园至尊魔王小说:win7系统浏览网页时字体

11-06 女斗合众国:win10系统访问共享文件夹提示请

11-06 tokyo hot n0654:恢复win7系统默认字体一招

11-06 雨酷仙境:设置win7系统转移临时文件夹腾出

11-06 阿穆纳伊之杖:win7系统开始菜单在右边还原

11-06 tunespotting:win10系统火狐flash插件总是

11-06 甘尔葛分析师：计谋网站seo关键词暴涨有什

11-06 蔡贵霖: 计谋网站seo关键词暴涨有什么秘密

11-06 博益网首页:ao3网页版进入不了解决方法

11-06 漏斗子专栏: 网站数据分析小白易懂精华篇

11-06 见证双虹怎么做:win7系统开启telnet命令的

11-06 颾狐蝶蜋:系统资源不足无法完成请求的服务

11-06 国光中学校歌:提交网站到alexa查询详细步骤

11-06 西安有情天:静态网页和动态网页的区别

11-06 红木雅尚斋:外部链接构造对网站的好处

11-06 前官礼遇：防止域名劫持–增强域安全性的10

11-06 密传二转答案: 中文分词算法有哪些

11-06 金泉家园邮编:百度快照劫持的表现及应对方

Python sigmoid.sigmoid函数代码示例发布时间：2022-05-27

Python settings.create_settings函数代码示例发布时间：2022-05-27

Python util.grid_equal函数代码示例

1 Python 入门教程

Python入门教程 Python 是一种解释型、面向对象、动态数据类型的高级程序设计语言。 P

阅读：13811|2022-01-22

2 Python wikiutil.getFrontPage函数代码示例

Python wikiutil.getFrontPage函数代码示例

阅读：10203|2022-05-24

3 Python 简介

Python 简介 Python 是一个高层次的结合了解释性、编译性、互动性和面向对象的脚本

阅读：4092|2022-01-22

4 Python tests.group函数代码示例

Python tests.group函数代码示例

阅读：4044|2022-05-27

5 Python util.check_if_user_has_permission

Python util.check_if_user_has_permission函数代码示例

阅读：3845|2022-05-27

6 Python 操练实例98

Python 练习实例98 Python 100例题目：从键盘输入一个字符串，将小写字母全部转换成大

阅读：3515|2022-01-22

7 Python 环境搭建

Python 环境搭建本章节我们将向大家介绍如何在本地搭建 Python 开发环境。 Py

阅读：3032|2022-01-22

8 Python output.darkgreen函数代码示例

Python output.darkgreen函数代码示例

阅读：2655|2022-05-25

9 Python 基础语法

Python 基础语法 Python 语言与 Perl，C 和 Java 等语言有许多相似之处。但是，也

阅读：2651|2022-01-22

10 Python 中文编码

Python 中文编码前面章节中我们已经学会了如何用 Python 输出 Hello, World!，英文没

阅读：2303|2022-01-22

客服电话

电子邮件

Python sigil_bs4.BeautifulSoup类代码示例

示例1: performPageMapUpdates

示例2: performNCXSourceUpdates

示例3: diagnose

示例4: repairXML

示例5: repairXML

示例6: repairXML

示例7: anchorNCXUpdates

示例8: repairXML

请发表评论

全部评论

上一篇：

下一篇：

Python util.grid_equal函数代码示例

Python util.get_worker_name函数代码示例

Python util.get_webmention_target函数代

Python util.get_uuid函数代码示例

Python util.get_type_by_name函数代码示例

Python util.grid_equal函数代码示例

Python util.get_worker_name函数代码示例

Python util.get_webmention_target函数代

Python util.get_uuid函数代码示例

Python util.get_type_by_name函数代码示例

Python util.get_stdout函数代码示例

关于我们

产品与服务

解决方案

139-2527-9053