Issue 24324 - A default-initialized variable is not identical to its init value when it contains a default-initialized member variable that is a dynamic array
Summary: A default-initialized variable is not identical to its init value when it con...
Status: NEW
Alias: None
Product: D
Classification: Unclassified
Component: dmd (show other issues)
Version: D2
Hardware: All All
: P1 normal
Assignee: No Owner
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-01-08 05:36 UTC by Jonathan M Davis
Modified: 2024-01-09 16:00 UTC (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description Jonathan M Davis 2024-01-08 05:36:53 UTC
struct S
{
    int[] arr = [1, 2, 3];
}

void main()
{
    S s1;
    S s2;

    assert(s1 is s2); // passes
    assert(s1 is S.init); //fails
}

For some reason, the member variable ends up pointing to a different block of memory in the init value than it does in default-initialized values of the struct type. The elements are identical in both the init value and the default-initialized structs, but the arrays themselves are not identical.

For plenty of code, the difference won't matter, but it does create the bizarre situation where a default-initialized struct is not identical to its init value, and you lose the ability to check whether a value of that struct type has been mutated, which you can normally do with all types with "value is typeof(value).init".
Comment 1 RazvanN 2024-01-08 11:19:17 UTC
My first impression on this was that the compiler probably stores the struct initializer in the data/rodata segment and when an instance is created, it allocates heap memory with the GC and copies over the dynamic array elements. However, looking at the compiler code, things get weirder. It seems that in this case, the compiler lowers `S.init` to a Struct Literal Expression "S([1, 2, 3])" (i.e. it does not store a literal representation in the data/rodata segment), whereas when it encounters `S s1;` is uses a global symbol to initialize s1. That is why s1 and s2 have the same bit representation but `S.init` does not.

A consequence of this is that `assert(S.init is S.init)` fails in this case. Which is kind of absurd. However, I would argue that when dynamic arrays are involved the backend is free to do whatever it sees fit with regards to the addresses of dynamic arrays, so using `is` in this case (as opposed to opEquals) is brittle.
Comment 2 Bastiaan Veelo 2024-01-08 12:08:10 UTC
I argue that it should be made an error to define an initializer of a non-shared non-static field that allocates. Thread: https://forum.dlang.org/post/ogvubzgprghefclgluce@forum.dlang.org

If you *want* the array to be shared across all instances, the field should be declared `shared static`.

In https://forum.dlang.org/post/t7vm2o$p4q$1@digitalmars.com, Steven notes that the array is shared among different threads, even, as discovered in https://issues.dlang.org/show_bug.cgi?id=2947.
Comment 3 Jonathan M Davis 2024-01-08 18:05:22 UTC
(In reply to RazvanN from comment #1)
> A consequence of this is that `assert(S.init is S.init)` fails in this case.
> Which is kind of absurd. However, I would argue that when dynamic arrays are
> involved the backend is free to do whatever it sees fit with regards to the
> addresses of dynamic arrays, so using `is` in this case (as opposed to
> opEquals) is brittle.

My expectation is that 

SomeStruct s;
assert(s is SomeStruct.init);

will pass for every type, and IMHO, if it doesn't, we should rethink what we're doing.
Comment 4 Jonathan M Davis 2024-01-08 18:12:06 UTC
(In reply to Bastiaan Veelo from comment #2)
> I argue that it should be made an error to define an initializer of a
> non-shared non-static field that allocates. Thread:
> https://forum.dlang.org/post/ogvubzgprghefclgluce@forum.dlang.org
> 
> If you *want* the array to be shared across all instances, the field should
> be declared `shared static`.
> 
> In https://forum.dlang.org/post/t7vm2o$p4q$1@digitalmars.com, Steven notes
> that the array is shared among different threads, even, as discovered in
> https://issues.dlang.org/show_bug.cgi?id=2947.

It may very well be the best call to make cases like this illegal. However, I think that requirement is going to have to take into account whether mutable indirections are involved rather than just whether any allocations take place. Otherwise, cases like SysTime's Rebindable!(immutable(TimeZone)) (which is supposed to be default-initialized to a specific time zone so that the code doesn't have to check for null, but it isn't due to another bug) become illegal. The fact that the TimeZone itself is immutable avoids the issue, whereas it would be an issue if TimeZone weren't immutable.
Comment 5 Steven Schveighoffer 2024-01-09 16:00:19 UTC
Hm... what appears to be happening here is `S.init` is treated like an enum.

If you have an enum of an array, then it is like you typed in the literal directly.

Mark main as `@nogc` and it fails (Because S.init allocates!)

Even `S s1 = S.init` allocates a new array on the heap.

This is quite unexpected, and I can't believe we haven't seen this before.

I tested via run.dlang.io, and this has happened at least as far back as 2.060.