Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
282 views
in Technique[技术] by (71.8m points)

Why doesn't Python recognize my utf-8 encoded source file?

Here is a little tmp.py with a non ASCII character:

if __name__ == "__main__":
    s = '?'
    print(s)

Running it I get the following error:

Traceback (most recent call last):
  File ".mp.py", line 3, in <module>
    print(s)
  File "C:Python32libencodingscp866.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character 'xdf' in position 0: character maps to <undefined>

The Python docs says:

By default, Python source files are treated as encoded in UTF-8...

My way of checking the encoding is to use Firefox (maybe someone would suggest something more obvious). I open tmp.py in Firefox and if I select View->Character Encoding->Unicode (UTF-8) it looks ok, that is the way it looks above in this question (wth ? symbol).

If I put:

# -*- encoding: utf-8 -*-

as the first string in tmp.py it does not change anything—the error persists.

Could someone help me to figure out what am I doing wrong?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The encoding your terminal is using doesn't support that character:

>>> 'xdf'.encode('cp866')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/encodings/cp866.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character 'xdf' in position 0: character maps to <undefined>

Python is handling it just fine, it's your output encoding that cannot handle it.

You can try using chcp 65001 in the Windows console to switch your codepage; chcp is a windows command line command to change code pages.

Mine, on OS X (using UTF-8) can handle it just fine:

>>> print('xdf')
?

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...