D issues are now tracked on GitHub. This Bugzilla instance remains as a read-only archive.
Issue 12897 - std.json.toJSON doesn't translate unicode chars(>=0x80) to "\uXXXX"
Summary: std.json.toJSON doesn't translate unicode chars(>=0x80) to "\uXXXX"
Status: RESOLVED FIXED
Alias: None
Product: D
Classification: Unclassified
Component: phobos (show other issues)
Version: D2
Hardware: All All
: P1 critical
Assignee: basile-z
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-06-12 11:48 UTC by egustc
Modified: 2020-03-21 03:56 UTC (History)
1 user (show)

See Also:


Attachments
json bug (175 bytes, text/x-csrc)
2014-06-12 11:48 UTC, egustc
Details

Note You need to log in before you can comment on or make changes to this issue.
Description egustc 2014-06-12 11:48:10 UTC
Created attachment 1362 [details]
json bug

As the attachment showed, uUnicode chars GE than 0x80 (for exp.: Chinese, Japanese ) should be converted to "\uXXXX" in JSON. But Phobos doesn't. It causes problems while transmitting JSON from D to other languages.

A std.json.appendJSONChar implement can fix this bug:
private void appendJSONChar(Appender!string* dst, wchar c)
{
    if(isControl(c) || c>=0x80)
        dst.put("\\u%04x".format(c));
    else
    	dst.put(c);
}
Comment 1 Justin Whear 2014-07-11 18:13:16 UTC
Looking at the spec (http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf) it appears that while strings _may_ encode characters using the escape sequence, they are not _required_ to for any range of characters.  On the face of it it seems that std.json is conformant and other languages are not.  Which parsers are unable to handle the raw UTF-8?
Comment 2 egustc 2014-07-12 01:13:01 UTC
OK... I used Python but didn't decode first and got a problem. 

(In reply to Justin Whear from comment #1)
> Looking at the spec
> (http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf)
> it appears that while strings _may_ encode characters using the escape
> sequence, they are not _required_ to for any range of characters.  On the
> face of it it seems that std.json is conformant and other languages are not.
> Which parsers are unable to handle the raw UTF-8?
Comment 3 basile-z 2016-03-22 17:22:50 UTC
(In reply to egustc from comment #2)
> OK... I used Python but didn't decode first and got a problem. 
> 
> (In reply to Justin Whear from comment #1)
> > Looking at the spec
> > (http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf)
> > it appears that while strings _may_ encode characters using the escape
> > sequence, they are not _required_ to for any range of characters.  On the
> > face of it it seems that std.json is conformant and other languages are not.
> > Which parsers are unable to handle the raw UTF-8?

I propose a PR for this (https://github.com/D-Programming-Language/phobos/pull/4106), but it was not clear if you considered the problem as fixed or not.

Maybe it can even be closed without any modification. Let's see what people say.
Comment 4 github-bugzilla 2016-04-10 08:23:38 UTC
Commit pushed to master at https://github.com/D-Programming-Language/phobos

https://github.com/D-Programming-Language/phobos/commit/b5cd354a05033ade13ae376377bec590bef62212
Merge pull request #4106 from BBasile/issue-12897

fix issue 12897 - toJSON, add the escapeNonAsciiChars option