Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
270 views
in Technique[技术] by (71.8m points)

htmlentities in PHP but preserving html tags

I want to convert all texts in a string into html entities but preserving the HTML tags, for example this:

<p><font style="color:#FF0000">Camión espa?ol</font></p>

should be translated into this:

<p><font style="color:#FF0000">Cami&oacute;n espa&ntilde;ol</font></p>

any ideas?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can get the list of correspondances character => entity used by htmlentities, with the function get_html_translation_table ; consider this code :

$list = get_html_translation_table(HTML_ENTITIES);
var_dump($list);

(You might want to check the second parameter to that function in the manual -- maybe you'll need to set it to a value different than the default one)

It will get you something like this :

array
  ' ' => string '&nbsp;' (length=6)
  '?' => string '&iexcl;' (length=7)
  '¢' => string '&cent;' (length=6)
  '£' => string '&pound;' (length=7)
  '¤' => string '&curren;' (length=8)
  ....
  ....
  ....
  '?' => string '&yuml;' (length=6)
  '"' => string '&quot;' (length=6)
  '<' => string '&lt;' (length=4)
  '>' => string '&gt;' (length=4)
  '&' => string '&amp;' (length=5)

Now, remove the correspondances you don't want :

unset($list['"']);
unset($list['<']);
unset($list['>']);
unset($list['&']);

Your list, now, has all the correspondances character => entity used by htmlentites, except the few characters you don't want to encode.

And now, you just have to extract the list of keys and values :

$search = array_keys($list);
$values = array_values($list);

And, finally, you can use str_replace to do the replacement :

$str_in = '<p><font style="color:#FF0000">Camión espa?ol</font></p>';
$str_out = str_replace($search, $values, $str_in);
var_dump($str_out);

And you get :

string '<p><font style="color:#FF0000">Cami&Atilde;&sup3;n espa&Atilde;&plusmn;ol</font></p>' (length=84)

Which looks like what you wanted ;-)


Edit : well, except for the encoding problem (damn UTF-8, I suppose -- I'm trying to find a solution for that, and will edit again)

Second edit couple of minutes after : it seem you'll have to use utf8_encode on the $search list, before calling str_replace :-(

Which means using something like this :

$search = array_map('utf8_encode', $search);

Between the call to array_keys and the call to str_replace.

And, this time, you should really get what you wanted :

string '<p><font style="color:#FF0000">Cami&oacute;n espa&ntilde;ol</font></p>' (length=70)


And here is the full portion of code :

$list = get_html_translation_table(HTML_ENTITIES);
unset($list['"']);
unset($list['<']);
unset($list['>']);
unset($list['&']);

$search = array_keys($list);
$values = array_values($list);
$search = array_map('utf8_encode', $search);

$str_in = '<p><font style="color:#FF0000">Camión espa?ol</font></p>';
$str_out = str_replace($search, $values, $str_in);
var_dump($str_in, $str_out);

And the full output :

string '<p><font style="color:#FF0000">Camión espa?ol</font></p>' (length=58)
string '<p><font style="color:#FF0000">Cami&oacute;n espa&ntilde;ol</font></p>' (length=70)

This time, it should be ok ^^
It doesn't really fit in one line, is might not be the most optimized solution ; but it should work fine, and has the advantage of allowing you to add/remove any correspondance character => entity you need or not.

Have fun !


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...