D issues are now tracked on GitHub. This Bugzilla instance remains as a read-only archive.
Issue 8536 - OPTLINK crash with large fixed-size array
Summary: OPTLINK crash with large fixed-size array
Status: RESOLVED DUPLICATE of issue 6678
Alias: None
Product: D
Classification: Unclassified
Component: tools (show other issues)
Version: D2
Hardware: x86 Windows
: P2 enhancement
Assignee: No Owner
URL:
Keywords: Optlink
Depends on:
Blocks:
 
Reported: 2012-08-10 15:43 UTC by bearophile_hugs
Modified: 2017-01-10 01:45 UTC (History)
1 user (show)

See Also:


Attachments
Three C programs that show one effect of static 2D arrays (2.94 KB, application/octet-stream)
2012-08-10 19:43 UTC, bearophile_hugs
Details
Version 4 of the C program (1.05 KB, application/octet-stream)
2012-08-11 05:50 UTC, bearophile_hugs
Details

Note You need to log in before you can comment on or make changes to this issue.
Description bearophile_hugs 2012-08-10 15:43:19 UTC
This program:

uint[1 << 24] a;
void main() {}


Gives this error:
test.d(2): Error: index 16777216 overflow for static array



While this program:

struct Foo { uint x; }
Foo[1 << 24] a;
void main() {}


Causes an OPTLINK crash.


I sometimes translate to D some C programs that for performance reasons use some large global 2D arrays. In D using a global __gshared dynamic array of dynamic arrays is an option, but this kills some optimizations the compiler is able to perform thanks to knowing the 2D matrix sizes at compile-time. In my opinion asking for 50-100 MB static 2D arrays is not that much for a PC with 2+ GB RAM.
Comment 1 Walter Bright 2012-08-10 16:01:07 UTC
This is a well known Optlink bug, though I don't have the bugzilla number handy.

You're wrong about it impeding optimizations compared with dynamically allocating it, for a couple reasons:

1. static data is often indirectly accessed through a register anyway, either in explicit code generated by the compiler, or implicitly as how the CPU does virtual memory, or even there's no way to do it other than offsetting the program counter register

2. there is no performance penalty for offsetting a base address register versus and addressing mode with just and address.

D knows the static compile time sizes of arrays if you use static arrays. That's what they're for.
Comment 2 bearophile_hugs 2012-08-10 19:43:48 UTC
Created attachment 1138 [details]
Three C programs that show one effect of static 2D arrays
Comment 3 bearophile_hugs 2012-08-10 19:49:18 UTC
(In reply to comment #1)

> This is a well known Optlink bug, though I don't have the bugzilla number
> handy.

OK.

> You're wrong about it impeding optimizations compared with dynamically
> allocating it, for a couple reasons:

This is a discussion better fit for the D newsgroup.

In attach there are 3 nearly identical C programs, that use a 2D global cache matrix to perform a certain simple (but not stupid) computation.

The test0 uses a dynamically allocated "array" of pointers to "arrays". The test1 uses a static array of dynamically allocated rows, and the test2 uses a fully static 2D matrix. Compiling with GCC 4.7.1 with "-std=c99 -Ofast -flto -s" the run-times are 6.52, 6.07 and 4.95 seconds. The more the GCC compiler knows statically about the arrays, the more efficient binary it produces.
Comment 4 Walter Bright 2012-08-10 20:51:43 UTC
Your test is incorrectly written.

Use one array, not an array of arrays, and use a macro to compute the r*row+c index.
Comment 5 bearophile_hugs 2012-08-11 05:47:52 UTC

*** This issue has been marked as a duplicate of issue 6678 ***
Comment 6 bearophile_hugs 2012-08-11 05:50:58 UTC
Created attachment 1139 [details]
Version 4 of the C program
Comment 7 bearophile_hugs 2012-08-11 05:58:53 UTC
(In reply to comment #4)
> Your test is incorrectly written.
> 
> Use one array, not an array of arrays, and use a macro to compute the r*row+c
> index.

Using your suggestions, in attach test3.c run-time is 4.84 seconds.

In D there are no macros, so I think you have to replace:

size_t cache_nc;
#define CACHE(r, c) (cache[(r)*cache_nc + (c)])

With something like:

__gshared size_t cache_nc;
ref CACHE(in size_t r, in size_t c) nothrow {
    return cache[r * cache_nc + c];
}

Or maybe use a custom matrix with overloaded [] and avoid global variables (but keep global cache_nc, possibly as an enum, to keep allowing loop unrolling, because many static compilers don't perform unrolling if they don't statically know the loop count. JIT compilers as the Oracle Java one are able to unroll on dynamic values too).