Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
632 views
in Technique[技术] by (71.8m points)

mysql - Fixing encodings

I have ended up with messed up character encodings in one of our mysql columns.

Typically I have

√? instead of é
√? instead of ?
√≠ instead of í

and so on...

Fairly certain that someone here would know what happened and how to fix.

UPDATE: Based on bobince's answer and since I had this data in a file I did the following

#!/user/bin/env python
import codecs
f = codecs.open('./file.csv', 'r', 'utf-8')
f2 = codecs.open('./file-fixed.csv', 'w', 'utf-8')
for line in f:
    f2.write(line.encode('macroman').decode('utf-8')),

after which

load data infile 'file-fixed.csv' 
into table list1 
fields terminated by ',' 
optionally enclosed by '"' 
ignore 1 lines;

properly imported the data.

UPDATE2: Hammerite, just for completeness here are the requested details...

mysql> SHOW VARIABLES LIKE 'character_set%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | latin1                     | 
| character_set_connection | latin1                     | 
| character_set_database   | latin1                     | 
| character_set_filesystem | binary                     | 
| character_set_results    | latin1                     | 
| character_set_server     | latin1                     | 
| character_set_system     | utf8                       | 
| character_sets_dir       | /usr/share/mysql/charsets/ | 
+--------------------------+----------------------------+

The SHOW CREATE TABLE for the table I am importing to has DEFAULT CHARSET=utf8

EDIT3:

Actually with the above settings the load didn't do the right thing (I could not compare to existing utf8 fields and my loaded data only looked as if it was loaded correctly; I assume because of the wrong, but matching client, connection and results charsets), so I updated the settings to:

+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       | 
| character_set_connection | utf8                       | 
| character_set_database   | utf8                       | 
| character_set_filesystem | binary                     | 
| character_set_results    | utf8                       | 
| character_set_server     | utf8                       | 
| character_set_system     | utf8                       | 
| character_sets_dir       | /usr/share/mysql/charsets/ | 
+--------------------------+----------------------------+

uploaded data again and then finally I got the data loaded correctly (comparable with existing data).

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Your text has been encoded to UTF-8 and then re-decoded, erroneously, as Mac Roman.

You won't be able to fix this inside the database, as MySQL doesn't know the Mac Roman encoding. You might write a script to walk each row of each affected table and fix up the text by reversing the encode/decode cycle. Python is a possibility, having a good range of encodings:

>>> print u'√?'.encode('macroman').decode('utf-8')
é
>>> print u'√?'.encode('macroman').decode('utf-8')
?
>>> print u'í'.encode('macroman').decode('utf-8')
í

Alternatively, if there is no non-ASCII content that is unaffected by this problem, you could try using mysqladmin to export an SQL dump, then convert it all at once, either using a script as above or, if you have a text editor (presumably on the Mac) that can do it, loading the script as UTF-8 then saving it as Mac Roman. Finally re-import the dump using mysql < dump.sql.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...