D issues are now tracked on GitHub. This Bugzilla instance remains as a read-only archive.
Issue 6421 - Require initialization of static arrays with array literals not to allocate
Summary: Require initialization of static arrays with array literals not to allocate
Status: RESOLVED FIXED
Alias: None
Product: D
Classification: Unclassified
Component: dmd (show other issues)
Version: D2
Hardware: All All
: P2 enhancement
Assignee: No Owner
URL:
Keywords: performance
Depends on: 2356
Blocks:
  Show dependency treegraph
 
Reported: 2011-07-31 16:56 UTC by bearophile_hugs
Modified: 2020-05-15 03:53 UTC (History)
5 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description bearophile_hugs 2011-07-31 16:56:49 UTC
From a comment by Peter Alexander:

> int[3] a = [1, 2, 3]; // in D, this allocates then copies
> int a[3] = {1, 2, 3}; // in C++, this doesn't allocate
>
> Apparently, to avoid the allocation in D, you must do:
>
> static const int[3] staticA = [1, 2, 3]; // in data segment
> int[3] a = staticA; // non-allocating copy
>
> These little 'behind your back' allocations are good examples of my previous two points.

Memory allocations caused by this, inside an inner loop, have given me performance troubles.
I suggest to add an optimization to the DMD front-end to avoid this problem.


Some comments received:

Don:

> Yeah, it's not fundamental, and not even very complicated. The current
> implementation was a quick hack to provide the functionality, that
> hasn't been replaced with a proper implementation yet. All that's
> required to fix it is a bit of code in e2ir.c.


Peter Alexander:

> Also, I think it
> would be worth while adding it to the language definition so that it's
> not merely an implementation detail.


Timon Gehr:

> I think it should be more than an implementation detail, as it can severely affect
> performance.


How do you specify this in the D language definition? What are the corner cases?
Comment 1 bearophile_hugs 2011-08-01 06:36:11 UTC
This bug is related to bug 2356, the difference is this enhancement request asks for a language definition change too.
Comment 2 Andrej Mitrovic 2014-05-04 08:43:51 UTC
With recent changes we can now use this syntax for initialization of a variable:

-----
void main()
{
    float x = float(1.0);
}
-----

And in OpenGL, you can use this syntax for array initializers:

-----
float a[5] = float[5](3.4, 4.2, 5.0, 5.2, 1.1);
-----

So this got me thinking, the initializer looks very much like the new initializer in D that we introduced. With a parser fix we could implement this in D:

-----
void main()
{
    float[3] arr = float[3](1.0, 2.0, 3.0);
}
-----

Even though Issue 2356 is fixed the above might help in some other contexts, perhaps in array appends.
Comment 3 bearophile_hugs 2014-05-04 09:13:42 UTC
(In reply to Andrej Mitrovic from comment #2)

> With a parser fix we could implement this in D:
>     float[3] arr = float[3](1.0, 2.0, 3.0);

I also like this syntax (composed of two parts usable in different situations):

float[$] arr = [1.0, 2.0, 3.0]s;

Or:

auto arr = [1.0f, 2.0f, 3.0f]s;
Comment 4 rswhite4 2014-05-04 09:27:40 UTC
(In reply to bearophile_hugs from comment #3)
> (In reply to Andrej Mitrovic from comment #2)
> 
> > With a parser fix we could implement this in D:
> >     float[3] arr = float[3](1.0, 2.0, 3.0);
> 
> I also like this syntax (composed of two parts usable in different
> situations):
> 
> float[$] arr = [1.0, 2.0, 3.0]s;
> 
> Or:
> 
> auto arr = [1.0f, 2.0f, 3.0f]s;

I had PR's for both of them, but they was rejected, because no sufficient DIP exist. You could make one.
Comment 5 Andrej Mitrovic 2014-05-04 10:10:19 UTC
(In reply to bearophile_hugs from comment #3)
> (In reply to Andrej Mitrovic from comment #2)
> 
> > With a parser fix we could implement this in D:
> >     float[3] arr = float[3](1.0, 2.0, 3.0);
> 
> I also like this syntax (composed of two parts usable in different
> situations):
> 
> float[$] arr = [1.0, 2.0, 3.0]s;
> 
> Or:
> 
> auto arr = [1.0f, 2.0f, 3.0f]s;

I don't like them, they're too much of a special case. Re-using existing syntax is better IMO.
Comment 6 Kenji Hara 2014-05-04 10:34:17 UTC
(In reply to bearophile_hugs from comment #0)
> From a comment by Peter Alexander:
> 
> > int[3] a = [1, 2, 3]; // in D, this allocates then copies
> > int a[3] = {1, 2, 3}; // in C++, this doesn't allocate

This is already fixed issue 2356.

And in git-head, more than cases will be fixed.

int[3] a = [1, 2, 3];  // not allocated
a = [1, 2, 3];         // not allocated in git-head

> Some comments received:
[snip]

> How do you specify this in the D language definition? What are the corner
> cases?

I think this is the most better definition about the issue.

"If an array literal could be deduced as static array, and it won't escape from its context, it would be allocated on stack."

For example:

int[3] a = [1, 2, 3];
-> OK: The array initializer could be typed as int[3] from the variable type

a = [1, 2, 3];
-> OK: The assignment rhs should have same type with the assigned lvalue. Therefore the array literal could be typed as int[3].

void foo(int[3] a);
foo([1, 2, 3]);
-> OK: The required argument type is int[3].

int[] a = [1, 2, 3];
-> Cannot be allocated on stack, because the memory can escape via the indirection 'a'.
Comment 7 Kenji Hara 2014-05-04 10:38:14 UTC
(In reply to Andrej Mitrovic from comment #5)
> (In reply to bearophile_hugs from comment #3)
> > (In reply to Andrej Mitrovic from comment #2)
> > 
> > > With a parser fix we could implement this in D:
> > >     float[3] arr = float[3](1.0, 2.0, 3.0);
> > 
> > I also like this syntax (composed of two parts usable in different
> > situations):
> > 
> > float[$] arr = [1.0, 2.0, 3.0]s;
> > 
> > Or:
> > 
> > auto arr = [1.0f, 2.0f, 3.0f]s;
> 
> I don't like them, they're too much of a special case. Re-using existing
> syntax is better IMO.

I also think that "static array literal sytax" (eg. DIP34) is not good feature.

But "length inference" on variable declaration is a useful syntax.

float[$] arr = [1, 2, 3];          // typeof(arr) == float[3]
auto[$] arr = [1.0f, 2.0f, 3.0f];  // dito
Comment 8 Andrej Mitrovic 2014-05-04 11:06:45 UTC
(In reply to Kenji Hara from comment #7)
> But "length inference" on variable declaration is a useful syntax.
> 
> float[$] arr = [1, 2, 3];          // typeof(arr) == float[3]
> auto[$] arr = [1.0f, 2.0f, 3.0f];  // ditto

What do you think about my extension to the new type construction syntax?:

float[3] arr = float[3]([1, 2, 3]);

I'm thinking it could be a more generic solution (more composable in template/generic code) since you could do things like:

-----
float[3] arr;
arr = float[3]([1, 2, 3]);
-----

-----
float[3] arr;
arr = float[arr.length]([1, 2, 3]);
-----

-----
float[3] arr;
arr = typeof(arr)([1, 2, 3]);
-----

-----
int[] arr;
arr.length = 3;
arr[] += int[3]([1, 2, 3];
arr[] += int[3]([1, 2, 3];
assert(arr == [2, 4, 6]);
-----

-----
void foo(Arr)(ref Arr arr) if ( isStaticArray!Arr) { }
void foo(Arr)(Arr arr)     if (!isStaticArray!Arr) { }
foo(int[2]([1, 2]));  // explicitly pick overload
-----

And things like that.
Comment 9 rswhite4 2014-05-04 11:10:06 UTC
(In reply to Andrej Mitrovic from comment #8)
> (In reply to Kenji Hara from comment #7)
> > But "length inference" on variable declaration is a useful syntax.
> > 
> > float[$] arr = [1, 2, 3];          // typeof(arr) == float[3]
> > auto[$] arr = [1.0f, 2.0f, 3.0f];  // ditto
> 
> What do you think about my extension to the new type construction syntax?:
> 
> float[3] arr = float[3]([1, 2, 3]);
> 
> I'm thinking it could be a more generic solution (more composable in
> template/generic code) since you could do things like:
> 
> -----
> float[3] arr;
> arr = float[3]([1, 2, 3]);
> -----
> 
> -----
> float[3] arr;
> arr = float[arr.length]([1, 2, 3]);
> -----
> 
> -----
> float[3] arr;
> arr = typeof(arr)([1, 2, 3]);
> -----
> 
> -----
> int[] arr;
> arr.length = 3;
> arr[] += int[3]([1, 2, 3];
> arr[] += int[3]([1, 2, 3];
> assert(arr == [2, 4, 6]);
> -----
> 
> -----
> void foo(Arr)(ref Arr arr) if ( isStaticArray!Arr) { }
> void foo(Arr)(Arr arr)     if (!isStaticArray!Arr) { }
> foo(int[2]([1, 2]));  // explicitly pick overload
> -----
> 
> And things like that.

I would prefer float[3](1, 2, 3) instead of float[3]([1, 2, 3]). The latter has too many parentheses.
Comment 10 Andrej Mitrovic 2014-05-04 11:14:26 UTC
(In reply to rswhite4 from comment #9)
> I would prefer float[3](1, 2, 3) instead of float[3]([1, 2, 3]). The latter
> has too many parentheses.

Easier on the eyes, sure. But the latter is simpler to interpret with multidimensional static arrays:

float[2][3] = float[2][3]([[1, 2], [3, 4], [5, 6]]);

I'm not sure what this would look like with the former syntax.
Comment 11 rswhite4 2014-05-04 11:16:13 UTC
(In reply to Andrej Mitrovic from comment #10)
> (In reply to rswhite4 from comment #9)
> > I would prefer float[3](1, 2, 3) instead of float[3]([1, 2, 3]). The latter
> > has too many parentheses.
> 
> Easier on the eyes, sure. But the latter is simpler to interpret with
> multidimensional static arrays:
> 
> float[2][3] = float[2][3]([[1, 2], [3, 4], [5, 6]]);
> 
> I'm not sure what this would look like with the former syntax.

float[2][3]([1, 2], [3, 4], [5, 6]);
Three elements, each of them an array with two elements.
Comment 12 bearophile_hugs 2014-05-04 13:18:10 UTC
(In reply to rswhite4 from comment #9)

> The latter has too many parentheses.

But it's more uniform with the current D array syntax.
Comment 13 rswhite4 2014-05-04 13:20:06 UTC
(In reply to bearophile_hugs from comment #12)
> (In reply to rswhite4 from comment #9)
> 
> > The latter has too many parentheses.
> 
> But it's more uniform with the current D array syntax.

It is ugly and redundant.
Comment 14 bearophile_hugs 2014-05-05 11:35:57 UTC
(In reply to Kenji Hara from comment #6)

> I think this is the most better definition about the issue.
> 
> "If an array literal could be deduced as static array, and it won't escape
> from its context, it would be allocated on stack."

This rule should become part of the D language, so all conformant D compilers should respect it. So the functions that contain such cases can become @nogc.

(By they way "most better" is better written as "best".)
Comment 15 bearophile_hugs 2014-05-05 11:42:51 UTC
(In reply to Andrej Mitrovic from comment #5)

> I don't like them, they're too much of a special case. Re-using existing
> syntax is better IMO.

The $ syntax can't be replaced by the float[3](...) syntax. For longer arrays counting the items is a bug-prone chore:

auto a = ubyte[47]([9,2,6,4,3,3,4,2,3,6,6,4,1,9,1,5,8,0,9,3,2,5,4,
                    4,8,2,2,6,0,1,9,1,1,5,3,9,9,1,6,3,7,4,5,3,0,3,4]);

Vs:

ubyte[$] a = [9,2,6,4,3,3,4,2,3,6,6,4,1,9,1,5,8,0,9,3,2,5,4,
              4,8,2,2,6,0,1,9,1,1,5,3,9,9,1,6,3,7,4,5,3,0,3,4]);
Comment 16 bearophile_hugs 2014-05-05 11:56:43 UTC
(In reply to Kenji Hara from comment #7)

> I also think that "static array literal sytax" (eg. DIP34) is not good
> feature.

I still don't know what the best solution is. The $ syntax to infer the number of items seems good enough.

Regarding the []s syntax, if you have a function template like:

ForeachType!Items sum(Items)(ref Items sequence) {
    typeof(return) total = 0;
    foreach (x; sequence)
        total += x;
    return total;
}


If you call it like this it will allocate an array on the heap (it's the default behavour, I guess):
immutable tot = sum([1, 2, 3]);

If you use a fixed-size literal there is no need for heap allocation and you can use @nogc:
immutable tot = sum([1, 2, 3]s);


An advantage of the []s syntax is that it always allocates the data on the stack, so it's very easy for the @nogc to accept such literals in a function.

You can use in a line of code like:

auto t1 = tuple([1, 2]s, "values");

That defines a Tuple!(int[2], string). Currently to do it you must specify the type:

auto t2 = Tuple!(int[2], string)([1, 2], "values");

This is using the syntax suggested elsewhere in this thread:

auto t3 = tuple(int[2]([1, 2]), "values");

When you have array literals nested in other literals (or nested in other generic function calls), having the []s syntax is a clear way to tell the compiler what you want:

auto aa1 = ["key": [1, 2]s];

Instead of:

int[2][string] aa2 = ["key": [1, 2]];

If you have to pass such associative array literal to a function:

foo(["key": [1, 2]s]);

Currently you need to use a not nice and bug-prone cast:

void foo(TK, TV)(TV[TK] aa) {
    pragma(msg, TK, " ", TV);
}
void main() {
    foo(["key": cast(int[2])[1, 2]]);
}
Comment 17 Andrej Mitrovic 2014-05-05 12:30:50 UTC
(In reply to bearophile_hugs from comment #15)
> For longer arrays counting the items is a bug-prone chore.

You have another bug report opened for exactly that. I'd like to have both options on the table though. Sometimes count inference is nice, other times you may want a diagnostic if you miss the count.
Comment 18 Mathias LANG 2020-05-15 03:53:33 UTC
```
void main () @nogc
{
    int[3] a = [1, 2, 3];
    a = [4, 5, 6];
}
```

This does not allocate anymore.
The static array literal syntax would be nice, but Walter has vetoed it IIRC.
Closing as fixed.