D issues are now tracked on GitHub. This Bugzilla instance remains as a read-only archive.
Issue 7942 - Appending different string types corrupts memory
Summary: Appending different string types corrupts memory
Status: RESOLVED FIXED
Alias: None
Product: D
Classification: Unclassified
Component: dmd (show other issues)
Version: D2
Hardware: All All
: P2 critical
Assignee: yebblies
URL:
Keywords: accepts-invalid, pull, wrong-code
Depends on:
Blocks:
 
Reported: 2012-04-19 03:45 UTC by James Miller
Modified: 2015-02-18 03:37 UTC (History)
6 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description James Miller 2012-04-19 03:45:23 UTC
Appending a regular `string` to a `dstring` does not cause a compile time error.

This case:

  string a = "abc";
  dstring b = "abc"d;

  b ~= a;

Causes a runtime error: array cast misalignment.

This case:

  string a = "abcd";
  dstring b = "abcd"d;

  b ~= a;
  writeln(b);

causes a segmentation fault. Given that

  string a = "abcd";
  dstring b = "abcd"d;

  a ~= b;

causes a compile time error, and many other operations do not allow implicit string -> dstring casting, it should be picked up as a type error.
Comment 1 yebblies 2013-11-26 19:56:27 UTC
I've hit this before, can't find the report.  The compiler casts the rhs to the type of the lhs, then tried to append.  Obviously this doesn't work.
Comment 2 Stewart Gordon 2013-11-27 10:31:41 UTC
This sounds like a case of a more general bug, whereby x ~= y silently casts y to the type of x before appending.  Given that array casts are a matter of reinterpreting the sequence of bytes, this doesn't make sense.  It should either error at compiletime or convert the string from UTF-8 to UTF-32.
Comment 3 yebblies 2014-07-29 17:12:31 UTC
I'm not so sure about this being accepts-invalid.  D allows implicit string decoding/encoding in the single-char case, and it seems reasonable to support it here.

These patches change it from a wrong-code to a mere performance issue.

https://github.com/D-Programming-Language/dmd/pull/3831
https://github.com/D-Programming-Language/druntime/pull/915
Comment 4 github-bugzilla 2014-07-30 13:18:17 UTC
Commits pushed to master at https://github.com/D-Programming-Language/druntime

https://github.com/D-Programming-Language/druntime/commit/d0f8bca7ad1eea1b8e9e2774fc84a1af7ce73815
Issue 7942 - Appending different string types corrupts memory

https://github.com/D-Programming-Language/druntime/commit/d0e3b25d74e1f73cd12a916d0ccf00da2f4082ab
Merge pull request #915 from yebblies/issue7942

Issue 7942 - Appending different string types corrupts memory
Comment 5 Kenji Hara 2014-07-30 14:50:16 UTC
(In reply to yebblies from comment #3)
> I'm not so sure about this being accepts-invalid.  D allows implicit string
> decoding/encoding in the single-char case, and it seems reasonable to
> support it here.

In contrast to the one character transcoding on foreach iteration, the implicit transcoding cost on appending will be bigger when appended string is very long, and it would be sometimes difficult to find it.

I think accepting it is not reasonable.
Comment 6 yebblies 2014-07-30 15:14:04 UTC
(In reply to Kenji Hara from comment #5)
> 
> In contrast to the one character transcoding on foreach iteration, the
> implicit transcoding cost on appending will be bigger when appended string
> is very long, and it would be sometimes difficult to find it.
> 
> I think accepting it is not reasonable.

Appending an array of values to an array will take longer than appending a single value.  Even with implicit re-encoding it will still be O(N), same as other arrays.

Appending an array of structs with postblits could easily cause the same sort of performance problems.
Comment 7 Martin Nowak 2014-07-30 16:02:12 UTC
It's only a performance problem if it was easy to do inadvertently.
Where would this be the case?

When done deliberately the cost implications are fairly obvious.
Comment 8 hsteoh 2014-08-06 01:07:27 UTC
I see two solutions to this bug: (1) transcode the string to be appended (or, more generally, convert each value to be appended into the target type), which is costly (but still only O(n)); (2) reject this kind of array appending at compile-time.

I'm leaning towards (1) as being more user-friendly, going along D's motto of "correct by default, efficient if you ask for it". (2) is a bit too restrictive IMO, given the amount of implicit castings that we already have.

One possible compromise is to have (2) suggest std.array.appender for concatenating strings of different element widths.
Comment 9 Vladimir Panteleev 2014-08-06 13:56:21 UTC
Another option would be to fix but deprecate the functionality.

> going along D's motto of "correct by default, efficient if you ask for it"

I don't think that means that hidden costs are OK.
Comment 10 yebblies 2014-08-06 14:41:02 UTC
(In reply to Vladimir Panteleev from comment #9)
> Another option would be to fix but deprecate the functionality.
> 
> > going along D's motto of "correct by default, efficient if you ask for it"
> 
> I don't think that means that hidden costs are OK.

O(n) append is not a hidden cost - it's the standard.  If we're ok with the single-char case, we should be fine with the string case too.
Comment 11 hsteoh 2014-08-12 04:59:57 UTC
Looks like we're at an impasse. I think the best thing we can do right now is to make appending strings of different widths illegal for now. That still leaves open the option that in the future, we could support auto-conversion. But in any case, the current situation should not continue, since it introduces memory corruption.
Comment 12 Vladimir Panteleev 2014-08-12 05:41:03 UTC
Even though I don't think all O(n)s are equal, I don't disagree with yebblies to the point where I'd object to accepting the code silently.
Comment 13 Walter Bright 2014-08-15 18:35:29 UTC
I agree with Kenji. Make it an error for the corruption case, and we'll see how that goes. Better than enabling a feature that we may regret later.
Comment 14 yebblies 2014-09-09 08:29:51 UTC
Pull to make it an error:

https://github.com/D-Programming-Language/dmd/pull/3966
Comment 15 github-bugzilla 2014-09-09 16:39:22 UTC
Commits pushed to master at https://github.com/D-Programming-Language/druntime

https://github.com/D-Programming-Language/druntime/commit/3482cf5d42a1e2f23b18c3367efdedcfeb7db2e9
Revert "Issue 7942 - Appending different string types corrupts memory"

This reverts commit d0f8bca7ad1eea1b8e9e2774fc84a1af7ce73815.

https://github.com/D-Programming-Language/druntime/commit/7a18c79f2fc027a130e802af0d2dc9ed04b6e49d
Merge pull request #953 from yebblies/revert7942

Revert "Issue 7942 - Appending different string types corrupts memory"
Comment 16 github-bugzilla 2014-09-10 11:19:50 UTC
Commits pushed to master at https://github.com/D-Programming-Language/dmd

https://github.com/D-Programming-Language/dmd/commit/07ad79d1965ec92b88f3da3296d2877dbca86aed
Fix Issue 7942 - Appending different string types corrupts memory

https://github.com/D-Programming-Language/dmd/commit/aad21a5fa4f73af8e11ea74493e579c9c66da8ba
Merge pull request #3966 from yebblies/issue7942

Issue 7942 - Appending different string types corrupts memory
Comment 18 github-bugzilla 2015-02-18 03:37:58 UTC
Commits pushed to 2.067 at https://github.com/D-Programming-Language/druntime

https://github.com/D-Programming-Language/druntime/commit/3482cf5d42a1e2f23b18c3367efdedcfeb7db2e9
Revert "Issue 7942 - Appending different string types corrupts memory"

https://github.com/D-Programming-Language/druntime/commit/7a18c79f2fc027a130e802af0d2dc9ed04b6e49d
Merge pull request #953 from yebblies/revert7942