Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
224 views
in Technique[技术] by (71.8m points)

unicode - JSON and escaping characters

I have a string which gets serialized to JSON in Javascript, and then deserialized to Java.

It looks like if the string contains a degree symbol, then I get a problem.

I could use some help in figuring out who to blame:

  • is it the Spidermonkey 1.8 implementation? (this has a JSON implementation built-in)
  • is it Google gson?
  • is it me for not doing something properly?

Here's what happens in JSDB:

js>s='15u00f8C'
15°C
js>JSON.stringify(s)
"15°C"

I would have expected "15u00f8C' which leads me to believe that Spidermonkey's JSON implementation isn't doing the right thing... except that the JSON homepage's syntax description (is that the spec?) says that a char can be

any-Unicode-character- except-"-or--or- control-character"

so maybe it passes the string along as-is without encoding it as u00f8... in which case I would think the problem is with the gson library.

Can anyone help?

I suppose my workaround is to use either a different JSON library, or manually escape strings myself after calling JSON.stringify() -- but if this is a bug then I'd like to file a bug report.

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

This is not a bug in either implementation. There is no requirement to escape U+00B0. To quote the RFC:

2.5. Strings

The representation of strings is similar to conventions used in the C family of programming languages. A string begins and ends with quotation marks. All Unicode characters may be placed within the quotation marks except for the characters that must be escaped: quotation mark, reverse solidus, and the control characters (U+0000 through U+001F).

Any character may be escaped.

Escaping everything inflates the size of the data (all code points can be represented in four or fewer bytes in all Unicode transformation formats; whereas encoding them all makes them six or twelve bytes).

It is more likely that you have a text transcoding bug somewhere in your code and escaping everything in the ASCII subset masks the problem. It is a requirement of the JSON spec that all data use a Unicode encoding.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

56.9k users

...