For the follwing small test program: import core.simd; import std.stdio; void main(string[] args) { float[] value1 = [1.0f, 2.0f, 3.0f, 4.0f]; float4 result = __simd(XMM.LODUPS, *cast(float4*)value1.ptr); result = __simd(XMM.ADDPS, result, result); __simd_sto(XMM.STOUPS, *cast(float4*)value1.ptr, result); writefln("%s", value1); } Dmd will generate the follwing assembly: mov rax,qword ptr [rbp-68h] movups xmm0,xmmword ptr [rax] movaps xmmword ptr [rbp-60h],xmm0 movdqa xmm0,xmmword ptr [rbp-60h] addps xmm0,xmmword ptr [rbp-60h] movaps xmmword ptr [rbp-60h],xmm0 As you can clearly see the last instruction is a movaps but it should be a movups because XMM.STOUPS was given (unaligned store)
(In reply to comment #0) > For the follwing small test program: > > import core.simd; > import std.stdio; > > void main(string[] args) > { > float[] value1 = [1.0f, 2.0f, 3.0f, 4.0f]; > float4 result = __simd(XMM.LODUPS, *cast(float4*)value1.ptr); > result = __simd(XMM.ADDPS, result, result); > __simd_sto(XMM.STOUPS, *cast(float4*)value1.ptr, result); > writefln("%s", value1); > } > > Dmd will generate the follwing assembly: > mov rax,qword ptr [rbp-68h] > movups xmm0,xmmword ptr [rax] > movaps xmmword ptr [rbp-60h],xmm0 > movdqa xmm0,xmmword ptr [rbp-60h] > addps xmm0,xmmword ptr [rbp-60h] > movaps xmmword ptr [rbp-60h],xmm0 > > As you can clearly see the last instruction is a movaps but it should be a > movups because XMM.STOUPS was given (unaligned store) Indeed. Wrong opcode >_< I never use unaligned vectors, so I missed it! Haven't widely tested the unaligned bit of std.simd yet.
I think it's worse than that actually. import core.simd; __gshared float[] value1 = [1.0f, 2.0f, 3.0f, 4.0f]; void func(); void main(string[] args) { float4 result = __simd(XMM.LODUPS, *cast(float4*)value1.ptr); result = __simd(XMM.ADDPS, result, result); __simd_sto(XMM.STOUPS, *cast(float4*)value1.ptr, result); func(); } gives mov rax, qword ptr [_D5testx6value1Af+8H] ; 0008 _ 48: 8B. 05, 00000008(rel) movups xmm0, xmmword ptr [rax] ; 000F _ 0F 10. 00 movaps xmmword ptr [rbp-10H], xmm0 ; 0012 _ 0F 29. 45, F0 movdqa xmm1, xmmword ptr [rbp-10H] ; 0016 _ 66: 0F 6F. 4D, F0 addps xmm1, xmmword ptr [rbp-10H] ; 001B _ 0F 58. 4D, F0 movaps xmmword ptr [rbp-10H], xmm1 ; 001F _ 0F 29. 4D, F0 So the value is loaded from value1, then stored into a stack slot, then loaded, added, and stored back into the stack slot. The result is never written back into value1. With -O the whole thing is dropped, despite the call to func() which could easily access value1. Elevating to critical.
I don't know anyone who is ussing core.simd because of its broken state. So its not really that critical, its just another feature that doesn't work as advertised.
Commits pushed to master at https://github.com/D-Programming-Language/dmd https://github.com/D-Programming-Language/dmd/commit/087a4ca9b5c1ccda6b6963cb9d7e06074185a90a fix Issue 10225 - core.simd wrong codegen for XMM.STOUPS with __simd_sto - add missing STO* ops to codegen https://github.com/D-Programming-Language/dmd/commit/7133d4a17ce5767a3427ccc47ec01468115e6d6c Merge pull request #5625 from MartinNowak/fix10225 fix Issue 10225 - core.simd wrong codegen for XMM.STOUPS with __simd_sto
Commits pushed to stable at https://github.com/dlang/dmd https://github.com/dlang/dmd/commit/087a4ca9b5c1ccda6b6963cb9d7e06074185a90a fix Issue 10225 - core.simd wrong codegen for XMM.STOUPS with __simd_sto https://github.com/dlang/dmd/commit/7133d4a17ce5767a3427ccc47ec01468115e6d6c Merge pull request #5625 from MartinNowak/fix10225