D issues are now tracked on GitHub. This Bugzilla instance remains as a read-only archive.
Issue 10225 - core.simd wrong codegen for XMM.STOUPS with __simd_sto
Summary: core.simd wrong codegen for XMM.STOUPS with __simd_sto
Status: RESOLVED FIXED
Alias: None
Product: D
Classification: Unclassified
Component: dmd (show other issues)
Version: D2
Hardware: All All
: P2 critical
Assignee: No Owner
URL:
Keywords: SIMD, wrong-code
Depends on:
Blocks:
 
Reported: 2013-06-01 03:05 UTC by Benjamin Thaut
Modified: 2016-10-01 11:45 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description Benjamin Thaut 2013-06-01 03:05:09 UTC
For the follwing small test program:

import core.simd;
import std.stdio;

void main(string[] args)
{
	float[] value1 = [1.0f, 2.0f, 3.0f, 4.0f];
	float4 result = __simd(XMM.LODUPS, *cast(float4*)value1.ptr);
	result = __simd(XMM.ADDPS, result, result);
	__simd_sto(XMM.STOUPS, *cast(float4*)value1.ptr, result);
	writefln("%s", value1);
}

Dmd will generate the follwing assembly:
mov         rax,qword ptr [rbp-68h]  
movups      xmm0,xmmword ptr [rax]  
movaps      xmmword ptr [rbp-60h],xmm0  
movdqa      xmm0,xmmword ptr [rbp-60h]  
addps       xmm0,xmmword ptr [rbp-60h]  
movaps      xmmword ptr [rbp-60h],xmm0  

As you can clearly see the last instruction is a movaps but it should be a movups because XMM.STOUPS was given (unaligned store)
Comment 1 Manu 2013-06-03 03:02:18 UTC
(In reply to comment #0)
> For the follwing small test program:
> 
> import core.simd;
> import std.stdio;
> 
> void main(string[] args)
> {
>     float[] value1 = [1.0f, 2.0f, 3.0f, 4.0f];
>     float4 result = __simd(XMM.LODUPS, *cast(float4*)value1.ptr);
>     result = __simd(XMM.ADDPS, result, result);
>     __simd_sto(XMM.STOUPS, *cast(float4*)value1.ptr, result);
>     writefln("%s", value1);
> }
> 
> Dmd will generate the follwing assembly:
> mov         rax,qword ptr [rbp-68h]  
> movups      xmm0,xmmword ptr [rax]  
> movaps      xmmword ptr [rbp-60h],xmm0  
> movdqa      xmm0,xmmword ptr [rbp-60h]  
> addps       xmm0,xmmword ptr [rbp-60h]  
> movaps      xmmword ptr [rbp-60h],xmm0  
> 
> As you can clearly see the last instruction is a movaps but it should be a
> movups because XMM.STOUPS was given (unaligned store)

Indeed. Wrong opcode >_<
I never use unaligned vectors, so I missed it!
Haven't widely tested the unaligned bit of std.simd yet.
Comment 2 yebblies 2013-11-21 05:33:18 UTC
I think it's worse than that actually.

import core.simd;

__gshared float[] value1 = [1.0f, 2.0f, 3.0f, 4.0f];

void func();

void main(string[] args)
{
    float4 result = __simd(XMM.LODUPS, *cast(float4*)value1.ptr);
    result = __simd(XMM.ADDPS, result, result);
    __simd_sto(XMM.STOUPS, *cast(float4*)value1.ptr, result);
    func();
}

gives

mov     rax, qword ptr [_D5testx6value1Af+8H]   ; 0008 _ 48: 8B. 05, 00000008(rel)
movups  xmm0, xmmword ptr [rax]                 ; 000F _ 0F 10. 00
movaps  xmmword ptr [rbp-10H], xmm0             ; 0012 _ 0F 29. 45, F0
movdqa  xmm1, xmmword ptr [rbp-10H]             ; 0016 _ 66: 0F 6F. 4D, F0
addps   xmm1, xmmword ptr [rbp-10H]             ; 001B _ 0F 58. 4D, F0
movaps  xmmword ptr [rbp-10H], xmm1             ; 001F _ 0F 29. 4D, F0

So the value is loaded from value1, then stored into a stack slot, then loaded, added, and stored back into the stack slot.  The result is never written back into value1.

With -O the whole thing is dropped, despite the call to func() which could easily access value1.  Elevating to critical.
Comment 3 Benjamin Thaut 2013-11-21 06:59:06 UTC
I don't know anyone who is ussing core.simd because of its broken state. So its not really that critical, its just another feature that doesn't work as advertised.
Comment 4 github-bugzilla 2016-04-03 18:47:20 UTC
Commits pushed to master at https://github.com/D-Programming-Language/dmd

https://github.com/D-Programming-Language/dmd/commit/087a4ca9b5c1ccda6b6963cb9d7e06074185a90a
fix Issue 10225 - core.simd wrong codegen for XMM.STOUPS with __simd_sto

- add missing STO* ops to codegen

https://github.com/D-Programming-Language/dmd/commit/7133d4a17ce5767a3427ccc47ec01468115e6d6c
Merge pull request #5625 from MartinNowak/fix10225

fix Issue 10225 - core.simd wrong codegen for XMM.STOUPS with __simd_sto
Comment 5 github-bugzilla 2016-10-01 11:45:57 UTC
Commits pushed to stable at https://github.com/dlang/dmd

https://github.com/dlang/dmd/commit/087a4ca9b5c1ccda6b6963cb9d7e06074185a90a
fix Issue 10225 - core.simd wrong codegen for XMM.STOUPS with __simd_sto

https://github.com/dlang/dmd/commit/7133d4a17ce5767a3427ccc47ec01468115e6d6c
Merge pull request #5625 from MartinNowak/fix10225