Issue 11821 - dmd backend: redundant x86 instruction in a simple loop
Summary: dmd backend: redundant x86 instruction in a simple loop
Status: NEW
Alias: None
Product: D
Classification: Unclassified
Component: dmd (show other issues)
Version: D2
Hardware: x86 All
: P4 enhancement
Assignee: No Owner
URL: http://forum.dlang.org/thread/nfobptp...
Keywords: performance
Depends on:
Blocks:
 
Reported: 2013-12-26 02:12 UTC by Ivan Kazmenko
Modified: 2022-12-17 10:42 UTC (History)
2 users (show)

See Also:


Attachments
source code of the demonstrating example (132 bytes, application/octet-stream)
2013-12-26 02:13 UTC, Ivan Kazmenko
Details
disassembly of the demonstrating example (1.75 KB, text/plain)
2013-12-26 02:14 UTC, Ivan Kazmenko
Details

Note You need to log in before you can comment on or make changes to this issue.
Description Ivan Kazmenko 2013-12-26 02:12:34 UTC
I am trying to figure out why win32 executables compiled from D source by dmd are usually somewhat slower than similar win32 programs compiled from C++ source by, for example, mingw-gcc.

I believe I found a relatively simple case where dmd puts a redundant instruction into the object code.

I have this simple D program:

-----
immutable int MAX_N = 1_000_000;
int main () {
    int [MAX_N] a;
    foreach (i; 0..MAX_N)
        a[i] = i;
    return a[7];
}
-----

The assembly (dmd -O -release -inline -noboundscheck, then obj2asm) has the following piece corresponding to the cycle:

-----
L2C:		mov	-03D0900h[EDX*4][EBP],EDX
		mov	ECX,EDX
		inc	EDX
		cmp	EDX,0F4240h
		jb	L2C
-----

Here, the second line "mov ECX, EDX" does not seem to serve any purpose at all.  If this observation is correct, this instruction is an indication of a bug in code generation, and fixing that bug may improve performance in more general case.

The "return a[7]" part is to assure the whole loop need not be optimized out.  The ldmd2 compiler reportedly does that when no return is present.  DMD however does not, however that is irrelevant to this issue.

Previous discussion:
http://forum.dlang.org/thread/nfobptpqpiueelhehbfy@forum.dlang.org

Will attach source and disassembly in comments.

Ivan Kazmenko.
Comment 1 Ivan Kazmenko 2013-12-26 02:13:47 UTC
Created attachment 1307 [details]
source code of the demonstrating example
Comment 2 Ivan Kazmenko 2013-12-26 02:14:21 UTC
Created attachment 1308 [details]
disassembly of the demonstrating example
Comment 3 Ivan Kazmenko 2013-12-26 02:27:09 UTC
I should note that the exact compile command must be some sort of:

dmd a0.d -O -release -inline -noboundscheck -L/STACK:268435456

Otherwise, the default stack limit makes the program crash at runtime.

The "-L/STACK:268435456" does not affect the generated object file since it is used on linking stage.
Comment 4 Maxim Fomin 2013-12-26 08:08:29 UTC
This may be remainders from internally created variables. Compiler often rewrites high-level constructions to lower ones with implicitly introducing new variables. What you see from asm is their usage. 

By the way, it is not a 'code generation bug', it is poor optimization.