在线时间:8:00-16:00
迪恩网络APP
随时随地掌握行业动态
扫描二维码
关注迪恩网络微信公众号
开源软件名称:html5lib/html5lib-python开源软件地址:https://github.com/html5lib/html5lib-python开源编程语言:Python 68.9%开源软件介绍:html5libhtml5lib is a pure-python library for parsing HTML. It is designed to conform to the WHATWG HTML specification, as is implemented by all major web browsers. UsageSimple usage follows this pattern: import html5lib
with open("mydocument.html", "rb") as f:
document = html5lib.parse(f) or: import html5lib
document = html5lib.parse("<p>Hello World!") By default, the Two other tree types are supported: import html5lib
with open("mydocument.html", "rb") as f:
lxml_etree_document = html5lib.parse(f, treebuilder="lxml") When using with from contextlib import closing
from urllib2 import urlopen
import html5lib
with closing(urlopen("http://example.com/")) as f:
document = html5lib.parse(f, transport_encoding=f.info().getparam("charset")) When using with from urllib.request import urlopen
import html5lib
with urlopen("http://example.com/") as f:
document = html5lib.parse(f, transport_encoding=f.info().get_content_charset()) To have more control over the parser, create a parser object explicitly. For instance, to make the parser raise exceptions on parse errors, use: import html5lib
with open("mydocument.html", "rb") as f:
parser = html5lib.HTMLParser(strict=True)
document = parser.parse(f) When you're instantiating parser objects explicitly, pass a treebuilder
class as the import html5lib
parser = html5lib.HTMLParser(tree=html5lib.getTreeBuilder("dom"))
minidom_document = parser.parse("<p>Hello World!") More documentation is available at https://html5lib.readthedocs.io/. Installationhtml5lib works on CPython 2.7+, CPython 3.5+ and PyPy. To install: $ pip install html5lib The goal is to support a (non-strict) superset of the versions that pip supports. Optional DependenciesThe following third-party libraries may be used for additional functionality:
BugsPlease report any bugs on the issue tracker. TestsUnit tests require the Test data are contained in a separate html5lib-tests repository and included as a submodule, thus for git checkouts they must be initialized: $ git submodule init $ git submodule update If you have all compatible Python implementations available on your
system, you can run tests on all of them using the Questions?There's a mailing list available for support on Google Groups, html5lib-discuss, though you may get a quicker response asking on IRC in #whatwg on irc.freenode.net. |
2023-10-27
2022-08-15
2022-08-17
2022-09-23
2022-08-13
请发表评论