Python utils.get_encodings_from_content函数代码示例

OGeek|极客世界-中国程序员成长平台 › 门户 › 编程› Python›Python编程经验

原作者: [db:作者] 来自: [db:来源] 收藏邀请

本文整理汇总了Python中requests.utils.get_encodings_from_content函数的典型用法代码示例。如果您正苦于以下问题：Python get_encodings_from_content函数的具体用法？Python get_encodings_from_content怎么用？Python get_encodings_from_content使用的例子？那么恭喜您, 这里精选的函数代码示例或许可以为您提供帮助。

在下文中一共展示了get_encodings_from_content函数的13个代码示例，这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞，您的评价将有助于我们的系统推荐出更棒的Python代码示例。

示例1: test_precedence

 def test_precedence(self):
     content = '''
     <?xml version="1.0" encoding="XML"?>
     <meta charset="HTML5">
     <meta http-equiv="Content-type" content="text/html;charset=HTML4" />
     '''.strip()
     assert get_encodings_from_content(content) == ['HTML5', 'HTML4', 'XML']

开发者ID:PoNote，项目名称:requests，代码行数:7，代码来源:test_utils.py

示例2: find_encoding

def find_encoding(content, headers=None):
    # content is unicode
    if isinstance(content, unicode):
        return 'unicode'

    encoding = None

    # Try charset from content-type
    if headers:
        encoding = get_encoding_from_headers(headers)
        if encoding == 'ISO-8859-1':
            encoding = None

    # Try charset from content
    if not encoding:
        encoding = get_encodings_from_content(content)
        encoding = encoding and encoding[0] or None

    # Fallback to auto-detected encoding.
    if not encoding and chardet is not None:
        encoding = chardet.detect(content)['encoding']

    if encoding and encoding.lower() == 'gb2312':
        encoding = 'gb18030'

    return encoding or 'latin_1'

开发者ID:Crooky，项目名称:qiandao，代码行数:26，代码来源:utils.py

示例3: guess_response_encoding

def guess_response_encoding(resp):
    '''
    Guess the content encoding of a requests response.

    Note: there's a performance issue due to chardet.
    '''
    # first try the encoding supplied by responce header and content
    encs = get_encodings_from_content(resp.content) or []
    for enc in encs:
        try:
            resp.content.decode(enc)
            LOG.info('Detected encoding %s from response content.', enc)
            return enc
        except UnicodeDecodeError:
            LOG.debug('Encoding from response content doesn\'t work.')

    enc = get_encoding_from_headers(resp.headers)
    if enc:
        try:
            resp.content.decode(enc)
            LOG.info('Detected encoding %s from response header.', enc)
            return enc
        except UnicodeDecodeError:
            LOG.debug('Encoding from response header doesn\'t work.')

    # neither encoding works, we have to go the hard way.
    start = clock()
    g = detect(resp.content)
    LOG.info('Detected encoding %s with cofidence of %g in %gs.' % (g['encoding'], g['confidence'], clock() - start))
    return g['encoding']

开发者ID:dirtysalt，项目名称:dirtysalt.github.io，代码行数:30，代码来源:utils.py

示例4: encoding

    def encoding(self):
        if hasattr(self, '_encoding'):
            return self._encoding

        # content is unicode
        if isinstance(self.content, unicode):
            return 'unicode'

        # Try charset from content-type
        encoding = get_encoding_from_headers(self.headers)
        if encoding == 'ISO-8859-1':
            encoding = None

        # Try charset from content
        if not encoding:
            encoding = get_encodings_from_content(self.content)
            encoding = encoding and encoding[0] or None

        # Fallback to auto-detected encoding.
        if not encoding and chardet is not None:
            encoding = chardet.detect(self.content)['encoding']

        if encoding and encoding.lower() == 'gb2312':
            encoding = 'gb18030'

        self._encoding = encoding or 'utf-8'
        return self._encoding

开发者ID:5aket，项目名称:pyspider，代码行数:27，代码来源:response.py

示例5: encoding

def encoding(rsp):
    """
    encoding of Response.content.

    if Response.encoding is None, encoding will be guessed
    by header or content or chardet if avaibable.
    """
    # content is unicode
    if isinstance(rsp.content, six.text_type):
        return 'unicode'

    # Try charset from content-type
    encoding = get_encoding_from_headers(rsp.headers)
    if encoding == 'ISO-8859-1':
        encoding = None

    # Try charset from content
    if not encoding and get_encodings_from_content:
        encoding = get_encodings_from_content(rsp.content)
        encoding = encoding and encoding[0] or None

    # Fallback to auto-detected encoding.
    if not encoding and chardet is not None:
        encoding = chardet.detect(rsp.content)['encoding']

    if encoding and encoding.lower() == 'gb2312':
        encoding = 'gb18030'

    encoding = encoding or 'utf-8'
    return encoding

开发者ID:zymtech，项目名称:parse_newspage，代码行数:30，代码来源:parserstandalone.py

示例6: _fetchContent

    def _fetchContent(self):
        r = requests.get(self.url)

        if get_encodings_from_content(r.content):
            self.encoding = get_encodings_from_content(r.content)[0]
        else:
            from contextlib import closing
            from urllib2 import urlopen
            with closing(urlopen(self.url)) as f:
                self.encoding = f.info().getparam("charset")

        # Set System default Codeing
        reload(sys)
        sys.setdefaultencoding(self.encoding)

        content = r.content.decode(self.encoding)

        return content

开发者ID:Lab-317，项目名称:NewsParser，代码行数:18，代码来源:NewsParser.py

示例7: encoding

    def encoding(self):
        """
        encoding of Response.content.

        if Response.encoding is None, encoding will be guessed
        by header or content or chardet if available.
        """
        if hasattr(self, '_encoding'):
            return self._encoding

        # content is unicode
        if isinstance(self.content, six.text_type):
            return 'unicode'

        # Try charset from content-type
        encoding = get_encoding_from_headers(self.headers)
        if encoding == 'ISO-8859-1':
            encoding = None

        # Try charset from content
        if not encoding and get_encodings_from_content:
            if six.PY3:
                encoding = get_encodings_from_content(utils.pretty_unicode(self.content[:100]))
            else:
                encoding = get_encodings_from_content(self.content)
            encoding = encoding and encoding[0] or None

        # Fallback to auto-detected encoding.
        if not encoding and chardet is not None:
            encoding = chardet.detect(self.content[:600])['encoding']

        if encoding and encoding.lower() == 'gb2312':
            encoding = 'gb18030'

        self._encoding = encoding or 'utf-8'
        return self._encoding

开发者ID:01jiagnwei01，项目名称:pyspider，代码行数:36，代码来源:response.py

示例8: procdata_getencoding

def procdata_getencoding(seed,headers,content):

	code = utils.get_encoding_from_headers(headers)
	if code:
		if code.lower() == 'gbk' or code.lower() == 'gb2312':
			code = 'gbk'
		elif code.lower() == 'utf-8':
			code = 'utf-8'
		else:
			code = None

	if code == None:
		code = utils.get_encodings_from_content(content)
		print "content",seed,code
		if code:
			code = code[0]
			if code.lower() == 'gbk' or code.lower() == 'gb2312':
				code = 'gbk'

	return code

开发者ID:salmonx，项目名称:fengbei，代码行数:20，代码来源:daemon.py

示例9: guess_content_encoding

def guess_content_encoding(content):
    '''
    Guess the encoding for plain content.

    Note: there's a performance issue due to chardet.
    '''
    # first try the encoding supplied by content
    encs = get_encodings_from_content(content) or []
    for enc in encs:
        try:
            content.decode(enc)
            LOG.info('Detected encoding %s from content.', enc)
            return enc
        except UnicodeDecodeError:
            LOG.debug('Encoding from content doesn\'t work.')

    # neither encoding works, we have to go the hard way.
    start = clock()
    g = detect(content)
    LOG.info('Detected encoding %s with cofidence of %g in %gs.' % (g['encoding'], g['confidence'], clock() - start))
    return g['encoding']

开发者ID:dirtysalt，项目名称:dirtysalt.github.io，代码行数:21，代码来源:utils.py

示例10: filter_encoding

    def filter_encoding(self,seed, headers,content):

        code = utils.get_encoding_from_headers(headers)
        if code:
            if code.lower() == 'gbk' or code.lower() == 'gb2312':
                code = 'gbk'
                return True
            elif code.lower() == 'utf-8' or code.lower() == 'utf8':
                code = 'utf8'
                # as for utf8, we should check the content
            else: #  'ISO-8859-1' and so on, 
                code = None

        # chinese website may also miss the content-encoding header, so detect the content
        if code == None:
            codes = utils.get_encodings_from_content(content)
            if codes:
                for code in codes:
                    if code.lower() in [ 'gbk','gb2312']:
                        return True
                    elif code.lower() == 'utf8' or code.lower() == 'utf-8':
                        code = 'utf8'
                        break
       
        if code != 'utf8':
            return False
 
        # here handle utf8
        # to detect any chinese char win
        try:
            ucon = content.decode('utf8')
            for uchar in ucon:
                i = ord(uchar)
                if i >= 0x4e00 and i <= 0x9fa5:
                    return True
        except Exception, e:
            print url, e
            pass

开发者ID:salmonx，项目名称:fengbei，代码行数:38，代码来源:worker_filter.py

示例11: test_pragmas

 def test_pragmas(self, content):
     encodings = get_encodings_from_content(content)
     assert len(encodings) == 1
     assert encodings[0] == 'UTF-8'

开发者ID:PoNote，项目名称:requests，代码行数:4，代码来源:test_utils.py

示例12: test_none

 def test_none(self):
     encodings = get_encodings_from_content('')
     assert not len(encodings)

开发者ID:PoNote，项目名称:requests，代码行数:3，代码来源:test_utils.py

示例13: on_incoming

	def on_incoming(self, msg):
		if not msg.type == msg.CHANNEL:
			return

		# Catching all exceptions without alerting, as there is just so much crap that can go wrong with web stuff. Also, I'm lazy.
		try:
			urls = self.url_re.findall(msg.body)
			for url in urls:
				# Catch edge case where url is in brackets
				while url.startswith('(') and url.endswith(')'):
					url = url[1:-1]

				head = requests.head(url, allow_redirects=True)
				# work on the URL we were redirected to, if any
				url = head.url

				message = ""
				content_type = head.headers['content-type']

				# HTML websites
				if 'text/html' in content_type:
					# Set up any required request headers
					req_headers = {}
					# TODO: Accept-Language header from config

					req = requests.get(url, headers=req_headers, timeout=5)

					if 'charset' not in content_type:
						# requests only looks at headers to detect the encoding, we must find the charset ourselves
						# we can't use req.content because regex doesn't work on bytestrings apparently
						encodings = get_encodings_from_content(req.text)
						if encodings:
							req.encoding = encodings[0]

					soup = BeautifulSoup(req.text)

					# Look for the <title> tag or an <h1>, whichever is first
					title = soup.find(['title', 'h1'])
					if title is None:
						return
					title = self.utils.tag_to_string(title)
					title = ' '.join(title.split())
					message = "Title: " + title

				# Other resources
				else:
					content_length = head.headers.get('content-length', '')
					if content_length.isdigit():
						size = self.sizeof_fmt(int(content_length))
					else:
						size = "Unknown size"

					# Searches for the last segment of the URL (the filename)
					filename = re.search(r'/([^/]+)/?$', url).groups(1)[0]

					message = "{}: {} ({})".format(filename, content_type, size)

				self.bot.privmsg(msg.channel, message)

		except Exception as exception:
			print("Link Info Exception!")
			print(type(exception), exception)

开发者ID:ackwell，项目名称:ninjabot，代码行数:62，代码来源:linkinfo.py

注：本文中的requests.utils.get_encodings_from_content函数示例由纯净天空整理自Github/MSDocs等源码及文档管理平台，相关代码片段筛选自各路编程大神贡献的开源项目，源码版权归原作者所有，传播和使用请参考对应项目的License；未经允许，请勿转载。

鲜花

握手

雷人

路过

鸡蛋

该文章已有0人参与评论

请发表评论

全部评论

专题导读

More+

10-27 六六分期app的软件客服如何联系？(六六分期

11-06 可心卡盟:win10系统火狐flash插件崩溃怎么

11-06 亲亲特价:怎么删除回收站图标

11-06 济南大学虚拟社区:鲁大师节能降温的具体办

11-06 xlueops.exe:无线网络安装向导

11-06 女斗合众国:win7系统cf与主机连接不稳定怎

11-06 0xc000022-[cf烟雾头]cf怎么调烟雾头

11-06 qizideyouhuo:应用程序无法正常启动0xc0000

11-06 ipz-185:win7系统vcf文件怎么打开

11-06 傻哥蹦迪:win10系统s4怎么打开usb调试

11-06 八神浩树gtaste:回收站清空了怎么恢复

11-06 妖尾之黑色守护:win10系统电脑没有1440x900

11-06 校园至尊魔王小说:win7系统浏览网页时字体

11-06 女斗合众国:win10系统访问共享文件夹提示请

11-06 tokyo hot n0654:恢复win7系统默认字体一招

11-06 雨酷仙境:设置win7系统转移临时文件夹腾出

11-06 阿穆纳伊之杖:win7系统开始菜单在右边还原

11-06 tunespotting:win10系统火狐flash插件总是

11-06 甘尔葛分析师：计谋网站seo关键词暴涨有什

11-06 蔡贵霖: 计谋网站seo关键词暴涨有什么秘密

11-06 博益网首页:ao3网页版进入不了解决方法

11-06 漏斗子专栏: 网站数据分析小白易懂精华篇

11-06 见证双虹怎么做:win7系统开启telnet命令的

11-06 颾狐蝶蜋:系统资源不足无法完成请求的服务

11-06 国光中学校歌:提交网站到alexa查询详细步骤

11-06 西安有情天:静态网页和动态网页的区别

11-06 红木雅尚斋:外部链接构造对网站的好处

11-06 前官礼遇：防止域名劫持–增强域安全性的10

11-06 密传二转答案: 中文分词算法有哪些

11-06 金泉家园邮编:百度快照劫持的表现及应对方

Python utils.get_environ_proxies函数代码示例发布时间：2022-05-26

Python utils.get_encoding_from_headers函数代码示例发布时间：2022-05-26

Python util.grid_equal函数代码示例

1 Python 入门教程

Python入门教程 Python 是一种解释型、面向对象、动态数据类型的高级程序设计语言。 P

阅读：13793|2022-01-22

2 Python wikiutil.getFrontPage函数代码示例

Python wikiutil.getFrontPage函数代码示例

阅读：10178|2022-05-24

3 Python 简介

Python 简介 Python 是一个高层次的结合了解释性、编译性、互动性和面向对象的脚本

阅读：4078|2022-01-22

4 Python tests.group函数代码示例

Python tests.group函数代码示例

阅读：4041|2022-05-27

5 Python util.check_if_user_has_permission

Python util.check_if_user_has_permission函数代码示例

阅读：3839|2022-05-27

6 Python 操练实例98

Python 练习实例98 Python 100例题目：从键盘输入一个字符串，将小写字母全部转换成大

阅读：3509|2022-01-22

7 Python 环境搭建

Python 环境搭建本章节我们将向大家介绍如何在本地搭建 Python 开发环境。 Py

阅读：3029|2022-01-22

8 Python output.darkgreen函数代码示例

Python output.darkgreen函数代码示例

阅读：2648|2022-05-25

9 Python 基础语法

Python 基础语法 Python 语言与 Perl，C 和 Java 等语言有许多相似之处。但是，也

阅读：2640|2022-01-22

10 Python 中文编码

Python 中文编码前面章节中我们已经学会了如何用 Python 输出 Hello, World!，英文没

阅读：2295|2022-01-22

客服电话

电子邮件

Python utils.get_encodings_from_content函数代码示例

示例1: test_precedence

示例2: find_encoding

示例3: guess_response_encoding

示例4: encoding

示例5: encoding

示例6: _fetchContent

示例7: encoding

示例8: procdata_getencoding

示例9: guess_content_encoding

示例10: filter_encoding

示例11: test_pragmas

示例12: test_none

示例13: on_incoming

请发表评论

全部评论

上一篇：

下一篇：

Python util.grid_equal函数代码示例

Python util.get_worker_name函数代码示例

Python util.get_webmention_target函数代

Python util.get_uuid函数代码示例

Python util.get_type_by_name函数代码示例

Python util.grid_equal函数代码示例

Python util.get_worker_name函数代码示例

Python util.get_webmention_target函数代

Python util.get_uuid函数代码示例

Python util.get_type_by_name函数代码示例

Python util.get_stdout函数代码示例

关于我们

产品与服务

解决方案

139-2527-9053