D issues are now tracked on GitHub. This Bugzilla instance remains as a read-only archive.
Issue 5995 - string append negative integer causes segfault
Summary: string append negative integer causes segfault
Status: RESOLVED FIXED
Alias: None
Product: D
Classification: Unclassified
Component: dmd (show other issues)
Version: D2
Hardware: All All
: P2 normal
Assignee: Lucia Cojocaru
URL:
Keywords: bootcamp
: 16545 (view as issue list)
Depends on:
Blocks:
 
Reported: 2011-05-13 01:48 UTC by Andriy
Modified: 2017-07-05 20:57 UTC (History)
11 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description Andriy 2011-05-13 01:48:27 UTC
Digital Mars D Compiler v2.052

$ dmd SpawnBug.d 
$ ./SpawnBug 
Segmentation fault (core dumped)
$ cat SpawnBug.d
void main(){
	string ret;
	int i = -1;
	ret ~= i;
}
Comment 1 kennytm 2011-05-13 02:14:37 UTC
This may or may not be expected. Appending any non-Unicode (> 0x10ffff) character will halt the program. In _d_arrayappendcd (https://github.com/D-Programming-Language/druntime/blob/master/src/rt/lifetime.d#L1762, BTW why this is in rt/lifetime.d?!):

    else if (c <= 0x10FFFF)
    {
        ...
    }
    else
        assert(0);      // invalid utf character - should we throw an exception instead?
Comment 2 Vladimir Panteleev 2011-05-13 04:14:04 UTC
Calling onUnicodeError would be more appropriate.
Comment 3 Steven Schveighoffer 2011-05-16 10:31:52 UTC
Hm... I think in general it is a design flaw to allow int to implicitly cast to dchar.

I think that is the source of the problem.

Going from (d|w)char to the appropriate width integer should be fine, but going the other way seems prone to error.

Note that in lifetime.d, the assert(0) should not lead to a segmentation fault.
Comment 4 Jonathan M Davis 2011-05-16 10:44:52 UTC
Implicitly converting to the same-size _unsigned_ integral type might be fine, but converting to a signed type would be a narrowing conversion. I'd still argue that converting between any of the character types and any of the integral types should require a cast though simply because they're not only different types, they're different types of types. The character types are for characters and the integral types are for integers. Regardless, no implicit conversion should be permitted when it's a narrowing conversion. Narrowing conversions should require casts.
Comment 5 timon.gehr 2011-05-16 11:01:47 UTC
(In reply to comment #3)
> Hm... I think in general it is a design flaw to allow int to implicitly cast to
> dchar.
> 
> I think that is the source of the problem.
> 
> Going from (d|w)char to the appropriate width integer should be fine, but going
> the other way seems prone to error.
> 
> Note that in lifetime.d, the assert(0) should not lead to a segmentation fault.

assert(0) emits asm{hlt;} when compiled in release mode. Encountering hlt _is_ a segmentation fault, so this is just fine.
Comment 6 timon.gehr 2011-05-16 11:07:36 UTC
(In reply to comment #4)
> Implicitly converting to the same-size _unsigned_ integral type might be fine,
> but converting to a signed type would be a narrowing conversion. I'd still
> argue that converting between any of the character types and any of the
> integral types should require a cast though simply because they're not only
> different types, they're different types of types. The character types are for
> characters and the integral types are for integers. Regardless, no implicit
> conversion should be permitted when it's a narrowing conversion. Narrowing
> conversions should require casts.

How is that "narrowing"? No information is lost.

@Topic:

void main(){
    uint i=-1;  //fine
    dchar c=-1; //compile time error
}
Comment 7 Jonathan M Davis 2011-05-16 11:19:03 UTC
dchar is unsigned. int is signed. They don't cover the same range of values. Converting from one to the other in either direction is a narrowing conversion. I expect that the only reason that uint i = -1; compiles is to make it easy to create the unsigned value whose equivalent is -1 or some other reason related to C code. But personally, I don't think that it should compile without a cast, because you cannot represent -1 in a uint.
Comment 8 Steven Schveighoffer 2011-05-16 11:26:52 UTC
(In reply to comment #6)
> void main(){
>     uint i=-1;  //fine
>     dchar c=-1; //compile time error
> }

Just tried this and it indeed produces an error:

Error: cannot implicitly convert expression (-1) of type int to dchar

So I wonder why this works?  Seems inconsistent:

int i = -1;
dchar c = i;

Also, the reporter's issue seems to be inconsistent with that error.
Comment 9 briancschott 2014-06-06 20:12:58 UTC
Still present in 2.065.
Comment 10 hsteoh 2014-09-11 19:04:57 UTC
Still present in git HEAD (2.067b).
Comment 11 hsteoh 2014-10-30 22:19:32 UTC
Should appending invalid codepoints append the Unicode replacement character instead?
Comment 12 Steven Schveighoffer 2014-10-31 11:59:34 UTC
(In reply to hsteoh from comment #11)
> Should appending invalid codepoints append the Unicode replacement character
> instead?

I think implicit casting of int to dchar should be invalid altogether. See my 2011 comment.
Comment 13 ET 2015-04-09 01:55:25 UTC
Still present in 2.067
Comment 14 Lucia Cojocaru 2016-11-22 14:19:44 UTC
This is a problem in the compiler.

https://github.com/dlang/dmd/blob/master/src/dcast.d#L66
https://github.com/dlang/dmd/blob/master/src/mtype.d#L4150

I will open a PR shortly to disable implicit cast of int -> dchar.
Should we disable the implicit cast of all integral types to chars? For example, is it expected to make an implicit cast from uint to dchar?

(The compiler itself seems to rely on implicit casts of unit -> dchar. Compiling the compiler with this cast disabled produces some errors.)


This is also enabled: bool -> dchar. Not sure if it is desirable.
Expression.implicitCastTo(z of type bool) => dchar
Comment 15 Steven Schveighoffer 2016-11-22 15:27:39 UTC
Lucia, I think nothing should implicitly cast to dchar. Not bool, int, or even char or wchar. But something this drastic needs approval from Walter and Andrei.

Of course, we definitely need a deprecation step before completely banning it -- this will certainly break a lot of code.
Comment 16 Andrei Alexandrescu 2016-11-22 16:16:08 UTC
This bug has a simple fix - throw a runtime exception (e.g. by onUnicodeError) instead of assert(0). We shouldn't change language rules on account of this. Thanks!
Comment 17 Steven Schveighoffer 2016-11-22 17:02:04 UTC
There are two problems, one is that the OP's code compiles, the other is that it segfaults. Arguably, fixing the first problem will fix the second. But just fixing the second leaves other problems still intact.

Also, note that this succeeds, but likely does not do what the writer wants:

string s = "123456";
s ~= 7;

Guess what this does (yes, it compiles)?

s ~= 123456;
Comment 18 Lucia Cojocaru 2016-11-24 10:59:49 UTC
As Andrei suggested, here is the quick fix:
PR https://github.com/dlang/druntime/pull/1696

Language design changes should be discussed with Walter and Andrei in depth.
Comment 19 github-bugzilla 2016-12-06 05:04:51 UTC
Commits pushed to master at https://github.com/dlang/druntime

https://github.com/dlang/druntime/commit/316e6d2607b4b22794ef75a331ad27d970717cda
fix issue 5995

https://github.com/dlang/druntime/commit/6dbbadbac4a0567ba49f0e616fccc8c597fec771
Merge pull request #1696 from somzzz/issue_5995

fix issue 5995 - string append negative integer causes segfault
Comment 23 Vladimir Panteleev 2017-07-05 20:57:43 UTC
*** Issue 16545 has been marked as a duplicate of this issue. ***