When compiled with no flags, the following program gives wrong results: import std.stdio; import core.simd; double2 * v(double* a) { return cast(double2*)a; } void main() { double2 a; auto p = cast(double*) &a; p[0] = 1; p[1] = 2; double2 b = v(p)[0]; v(p)[0] = b; writeln(p[0 .. 2]); // prints [1, 0] } Disassembly of the relevant part of the code: call 426344 <_D3tmp1vFPdZPNhG2d> movapd xmm0,XMMWORD PTR [rax] movapd XMMWORD PTR [rbp-0x10],xmm0 movapd xmm1,XMMWORD PTR [rbp-0x10] movsd QWORD PTR [rbp-0x40],xmm1 ; should be movapd mov rdi,QWORD PTR [rbp-0x20] call 426344 <_D3tmp1vFPdZPNhG2d> movsd xmm1,QWORD PTR [rbp-0x40] ; should be movapd movapd XMMWORD PTR [rax],xmm1 This happens with both DMD 2.060 and the latest version of 2.061 from github. It doesn't happen if I use either -O flag or -inline. It doesn't happen with LDC or GDC. I have only tested this on linux.
I managed to reduce it a bit further: import std.stdio; import core.simd; double2 * v(double2* a) { return a; } void main() { double2 a = [1, 2]; *v(&a) = a; writeln(a.array); } And the disassembly: movsd QWORD PTR [rbp-0x20],xmm1 lea rdi,[rbp-0x10] call 4263f4 <_D3tmp1vFPNhG2dZPNhG2d> movsd xmm1,QWORD PTR [rbp-0x20] movapd XMMWORD PTR [rax],xmm1
This is happening in cod3.c REGSAVE::save() and REGSAVE::restore(). Unfortunately, just changing the opcodes doesn't work because MOVAPD requires 16 bit alignment of the operands. Fixing that exposes further problems. Essentially, it'll have to wait a bit.
(In reply to comment #2) > This is happening in cod3.c REGSAVE::save() and REGSAVE::restore(). > Unfortunately, just changing the opcodes doesn't work because MOVAPD requires > 16 bit alignment of the operands. Fixing that exposes further problems. > > Essentially, it'll have to wait a bit. I know nothing about the DMD back end, so this may be an obviously bad idea, but if alignment is the main problem, wouldn't using MOVUPD work in the meantime?
MOVUPD is terribly slow.
(In reply to comment #4) > MOVUPD is terribly slow. Terribly slow is still much better than wrong-code.
Fixed here: https://github.com/D-Programming-Language/dmd/commit/c33809cc201b4697b384209eb3a7a623e8e871e9#src/backend/cod3.c