D issues are now tracked on GitHub. This Bugzilla instance remains as a read-only archive.
Issue 14459 - String literal merge bug causes incorrect runtime program behavior
Summary: String literal merge bug causes incorrect runtime program behavior
Status: RESOLVED FIXED
Alias: None
Product: D
Classification: Unclassified
Component: dmd (show other issues)
Version: D2
Hardware: All All
: P1 critical
Assignee: No Owner
URL:
Keywords: pull, wrong-code
Depends on:
Blocks:
 
Reported: 2015-04-17 22:04 UTC by Dan Olson
Modified: 2017-07-22 12:36 UTC (History)
2 users (show)

See Also:


Attachments
strlit16bug.d (1.07 KB, text/plain)
2015-04-17 22:04 UTC, Dan Olson
Details

Note You need to log in before you can comment on or make changes to this issue.
Description Dan Olson 2015-04-17 22:04:53 UTC
Created attachment 1514 [details]
strlit16bug.d

The front end propagates const char* string literals in such a way that backend must merge identical constants or else funny things may happen at runtime.  The DMD backend merges string literals but only caches 16 unique strings, so a carefully crafted program with more than 16 string literals can produce erroneous results.  See attached code, which follows this pattern:

{
    const char* s0 = "hi0";
    const(char)* p0 = s0;
    assert(p0 == s0);
    const char* s1 = "hi1";
    const(char)* p1 = s1;
...
    const char* s15 = "hi15";
    const(char)* p15 = s15;
    assert(p0 == s0);                        // ok
    const char* s16 = "hi16";
    const(char)* p16 = s16;
    assert(p0 == s0);                        // fails
}



This was uncovered while digging into an LDC issue #898 and trying to understand how DMD front end propagates string literals for const char*.

https://github.com/ldc-developers/ldc/issues/898

Output from attached strlit16bug.d

$ dmd --version
DMD64 D Compiler v2.067.0
Copyright (c) 1999-2014 by Digital Mars written by Walter Bright
$ dmd -run strlit16bug
core.exception.AssertError@strlit16bug.d(40): Assertion failure
----------------
5   dmd_runN2D2U4                       0x000000010855ee54 void strlit16bug.__assert(int) + 44
6   dmd_runN2D2U4                       0x000000010855ee21 _Dmain + 193
7   dmd_runN2D2U4                       0x0000000108573b94 D2rt6dmain211_d_run_mainUiPPaPUAAaZiZ6runAllMFZ9__lambda1MFZv + 40
8   dmd_runN2D2U4                       0x0000000108573ad9 void rt.dmain2._d_run_main(int, char**, extern (C) int function(char[][])*).tryExec(scope void delegate()) + 45
9   dmd_runN2D2U4                       0x0000000108573b39 void rt.dmain2._d_run_main(int, char**, extern (C) int function(char[][])*).runAll() + 45
10  dmd_runN2D2U4                       0x0000000108573ad9 void rt.dmain2._d_run_main(int, char**, extern (C) int function(char[][])*).tryExec(scope void delegate()) + 45
11  dmd_runN2D2U4                       0x0000000108573a4c _d_run_main + 504
12  dmd_runN2D2U4                       0x000000010855ee6c main + 20
13  libdyld.dylib                       0x00007fff8dd375c9 start + 1
14  ???                                 0x0000000000000001 0x0 + 1
$
Comment 1 yebblies 2015-04-19 02:48:53 UTC
Why do you think string literals must be merged?  I would guess it's in 'implementation defined' territory, and identity equality should not be relied on.
Comment 2 Ketmar Dark 2015-04-19 02:51:52 UTC
yet in the case when i'm assigning pointer to pointer i'm expecting that two pointers are the same. i'd never expect the following fail, under no circumstances:

    const char* s16 = "hi16";
    const(char)* p16 = s16;
    assert(p0 == s0);                        // fails
Comment 3 Dan Olson 2015-04-19 16:48:06 UTC
(In reply to yebblies from comment #1)
> Why do you think string literals must be merged?  I would guess it's in
> 'implementation defined' territory, and identity equality should not be
> relied on.

Yes I agree in general it is implementation defined for any D compiler to merge string literals.  In this case, one pointer is initialized to the the value of another, so they do need to be equal.  The dmd front end code propagates const char* literals into unique StringExps when used.  Now the backend must merge StringExps with identical literals or else bad code can be generated.  In LDC which uses dmd front end, it was worse because string literals were merged late and even simpler code was wrong.  See https://github.com/ldc-developers/ldc/issues/898 for details.

In the snippet, it leads to p0 == s0, as they should be, then a few lines later, suddenly p0 != s0.

Anyway, it is in the backend el.c stable[16] where the cache is defined.  Also, the code in the frond end that does the const propagation of string literals is fromConstInitializer() in optimize.c
Comment 4 David Nadlinger 2015-04-23 11:24:02 UTC
Indeed, p0 == s0 must never fail as they are both pointers and one is initialized from the other, irrespective of what we guarantee or don't guarantee about string literal merging.
Comment 5 Kenji Hara 2015-08-06 16:01:15 UTC
https://github.com/D-Programming-Language/dmd/pull/4866(In reply to Dan Olson from comment #3)
> Anyway, it is in the backend el.c stable[16] where the cache is defined. 
> Also, the code in the frond end that does the const propagation of string
> literals is fromConstInitializer() in optimize.c

Fixed the bug in optimize.c:
https://github.com/D-Programming-Language/dmd/pull/4866

Honestly I didn't know about this issue until now (my random bugzila walk fortunately hit). I'm curious why the detailed analysis result couldn't put out actual PR.
Comment 6 David Nadlinger 2015-08-06 16:36:51 UTC
(In reply to Kenji Hara from comment #5)
> I'm curious why the detailed analysis result couldn't put
> out actual PR.

As for me, that's because Daniel already knew how to fix the issue, but I couldn't convince him that the behavior is actually wrong at DConf (or rather, the day after).
Comment 7 Kenji Hara 2015-08-06 17:05:43 UTC
I think it's clearly a bug. Because:

1. The identiry of the string literal is saved into a variable, then copied to one another variable. The two pointer variables comparison must match.
2. But the excessive front end 'optimization' didn't save the identity. It's a problem.

A point is the variable s0 is a stack allocated variable. Its value is evaluated and determined at runtime. The optimization wrongly beyond the compile time/runtime evaluation boundary.
Comment 8 github-bugzilla 2015-08-06 19:33:36 UTC
Commits pushed to master at https://github.com/D-Programming-Language/dmd

https://github.com/D-Programming-Language/dmd/commit/3252ddcf36f53928d04b62a601c0b7bb9d911cba
fix Issue 14459 - String literal merge bug causes incorrect runtime program behavior

https://github.com/D-Programming-Language/dmd/commit/fd8e527b2a950eaa437426e8201572e0da51000f
Merge pull request #4866 from 9rnsr/fix14459

Issue 14459 - String literal merge bug causes incorrect runtime program behavior
Comment 9 github-bugzilla 2017-07-22 12:36:20 UTC
Commits pushed to dmd-cxx at https://github.com/dlang/dmd

https://github.com/dlang/dmd/commit/3252ddcf36f53928d04b62a601c0b7bb9d911cba
fix Issue 14459 - String literal merge bug causes incorrect runtime program behavior

https://github.com/dlang/dmd/commit/fd8e527b2a950eaa437426e8201572e0da51000f
Merge pull request #4866 from 9rnsr/fix14459