D issues are now tracked on GitHub. This Bugzilla instance remains as a read-only archive.
Issue 5722 - Regression(2.052): Appending code-unit from multi-unit code-point at compile-time gives wrong result.
Summary: Regression(2.052): Appending code-unit from multi-unit code-point at compile-...
Status: RESOLVED FIXED
Alias: None
Product: D
Classification: Unclassified
Component: dmd (show other issues)
Version: D2
Hardware: Other Windows
: P2 regression
Assignee: No Owner
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2011-03-08 17:46 UTC by Nick Sabalausky
Modified: 2015-06-09 05:15 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description Nick Sabalausky 2011-03-08 17:46:13 UTC
static assert( (""~"\©"[0]).length == 1 );

That passes on 2.051, but fails on 2.052 with "static assert (2u == 1u) is false"

This is likely *not* related to issue 5717, since that test case works in 2.052.
Comment 1 Nick Sabalausky 2011-03-08 17:50:42 UTC
Rather, this is *not the same as* issue 5717. They may be related.
Comment 2 Don 2011-03-11 04:58:38 UTC
Like bug 5717, this was caused by the fix to bug 4389 (char[]~dchar and wchar[]~dchar *never* worked).
The problem is in constfold.c, Cat(). 

It erroneously assumes that all concatenation is equivalent to string ~ dchar. But this isn't true for char[]~char, wchar[]~wchar, (this happens during constant-folding optimization, which is how it manifests in the test case). In such cases the dchar encoding should not occur - it should just give an encoding length of 1, and do a simple memcpy.

It applies to everything of the form (e2->op == TOKint64) in that function.

(1)        size_t len = es1->len + utf_codeLength(sz, v);
        s = mem.malloc((len + 1) * sz);
        memcpy(s, es1->string, es1->len * sz);
(2)        utf_encode(sz, (unsigned char *)s + , v);

Lines (1) and (2) are valid for hetero concatenation, but when both types are the same the lines should be:

(1)     size_t len = es1->len + 1;

(2)     memcpy((unsigned char *)s + (sz * es1->len), &v, sz);

This should definitely be factored out into a helper function -- it's far too repetitive already.