Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
129 views
in Technique[技术] by (71.8m points)

php - strange character encoding of stored data , old script is showing them fine new one doesn't

I'm trying to rewrite an old website .
it's in persian which uses perso/arabic characters .

CREATE DATABASE `db` DEFAULT CHARACTER SET utf8 COLLATE utf8_persian_ci;
USE `db`;

Almost all my table/columns COLLATE are set to utf8_persian_ci

I'm using codeigniter for my new script and i have

'char_set' => 'utf8',
'dbcollat' => 'utf8_persian_ci',

In the database settings , so there is no problem there .

So here is the strange part

The old script is using some sort of database engine called TUBADBENGINE or TUBA DB ENGINE ... nothing special .

When i enter some data in the database (in persian) using the old script , when i look into database , characters are stored like ?1ù…?±?§ù? .

The old script fetch/shows that data fine , but the new script shows them with the same weird font/charset as database

So when i enter ???? , database stored data looks like ?1ù…?±?§ù , when i fetch it in the new script i see ?1ù…?±?§ù but in the old script i see ????

CREATE TABLE IF NOT EXISTS `tnewsgroups` (
  `ID` int(11) NOT NULL AUTO_INCREMENT,
  `fName` varchar(200) COLLATE utf8_persian_ci DEFAULT NULL,
  PRIMARY KEY (`ID`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 COLLATE=utf8_persian_ci AUTO_INCREMENT=11 ;

--
-- Dumping data for table `tnewsgroups`
--

INSERT INTO `tnewsgroups` (`ID`, `fName`) VALUES
(1, '?1ù…?±?§ù?'),
(2, 'ù…?1ù…?§?±??'),
(3, '?¨?±ù?'),
(4, 'ù…ú??§ù???ú?'),
(5, 'test'),
(6, 'test2');

In the other hand when i enter ????? directly in the database

Of course i have the same ???? stored in the database

The new script is showing it fine

But in the old script i get ????

Can anyone make any sense of this ?

Here is the tuba engin

https://github.com/maxxxir/mz-codeigniter-crud/blob/master/tuba.php

Usage example from old script :

define("database_type" , "MYSQL");
define("database_ip" , "localhost");
define("database_un" , "root");
define("database_pw" , "");
define("database_name" , "nezam2");
define("database_connectionstring" , "");
$db = new TUBADBENGINE(database_type , database_ip , database_un , database_pw , database_name , database_connectionstring);
$db->Select("SELECT * FROM tnews limit 3");
if ($db->Lasterror() != "") { echo "<B><Font color=red>??? ! á?Y? ?ì???? êá?? ??í?";  exit(); }
for ($i = 0 ; $i < $db->Count() ; $i++) {
    $row = $db->Next();
    var_dump($row);
}
Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

In short, because this has been discussed a thousand times before:

  1. PHP holds a string, say "漢字", encoded in UTF-8. The bytes for this are E6 BC A2 E5 AD 97.
  2. It sends this string over a database connection which is set to latin1.
  3. The database receives the bytes E6 BC A2 E5 AD 97, thinking those represent latin1 characters.
  4. The database stores the characters ??¢?- (the characters that E6 BC A2 E5 AD 97 maps to in latin1).
  5. The same process reversed makes PHP receive the same bytes, which it then treats as UTF-8. The roundtrip works fine for PHP, even though the database doesn't treat the characters as it should.

So the problem here was that the database connection was set incorrectly when the data was entered into the database. You'll have to convert the data in the database to the correct characters. Try this:

SELECT CONVERT(BINARY CONVERT(field_name USING latin1) USING utf8) FROM table_name

Maybe utf8 isn't what you need here, experiment. If that works, change this into an UPDATE statement to update the data permanently.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...