D issues are now tracked on GitHub. This Bugzilla instance remains as a read-only archive.
Issue 4046 - [CTFE] std.intrinsic
Summary: [CTFE] std.intrinsic
Status: RESOLVED WONTFIX
Alias: None
Product: D
Classification: Unclassified
Component: dmd (show other issues)
Version: D2
Hardware: x86 Windows
: P2 normal
Assignee: No Owner
URL:
Keywords: rejects-valid
Depends on:
Blocks:
 
Reported: 2010-04-02 14:25 UTC by bearophile_hugs
Modified: 2015-06-09 05:11 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description bearophile_hugs 2010-04-02 14:25:39 UTC
import std.intrinsic: bt;
int foo() {
    uint x = uint.max;
    return bt(&x, 5);
}
int _ = foo();
void main() {}


dmd 2.042 gives:

test.d(4): Error: cannot evaluate bt((& x),5u) at compile time
test.d(6): Error: cannot evaluate foo() at compile time
test.d(6): Error: cannot evaluate foo() at compile time


In CTFE the the compiler can replace the intrinsics with little functions with the same semantics.
Comment 1 Don 2011-09-22 06:27:33 UTC
I'm not sure why the btXX functions (bt, btc, btr, etc) exist at all.
Although they are a single instruction, they are MUCH slower than the equivalent code using shifts or AND/OR/XOR.
For example, on Core i7 (Sandy Bridge), with a memory operand, they take 6 clock cycles!!!! You can execute 24 integer instructions in that time. On AMD K10, they're even slower. On Pentium 4 they have a latency of EIGHTEEN clock cycles. They're even slow on VIA processors as well -- they're not good anywhere.

I think they should be completely removed. There's a case for the intrinsics mentioned in bug 5703, but I think this should be a WONTFIX. To support them would just encourage slow, non-portable code.
Comment 2 Dmitry Olshansky 2011-09-22 10:42:25 UTC
(In reply to comment #1)
> I'm not sure why the btXX functions (bt, btc, btr, etc) exist at all.
> Although they are a single instruction, they are MUCH slower than the
> equivalent code using shifts or AND/OR/XOR.
> For example, on Core i7 (Sandy Bridge), with a memory operand, they take 6
> clock cycles!!!! You can execute 24 integer instructions in that time. On AMD
> K10, they're even slower. On Pentium 4 they have a latency of EIGHTEEN clock
> cycles. They're even slow on VIA processors as well -- they're not good
> anywhere.

Damn, and I used them at heart of important loops in FReD ....
Thanks, I'm getting rid of them ASAP %)
This should be probably mentioned somewhere, and then there are these problematic bsr/bsf you mentioned before.

> 
> I think they should be completely removed. There's a case for the intrinsics
> mentioned in bug 5703, but I think this should be a WONTFIX. To support them
> would just encourage slow, non-portable code.
Comment 3 bearophile_hugs 2011-09-24 18:29:33 UTC
(In reply to comment #1)

> I think they should be completely removed. There's a case for the intrinsics
> mentioned in bug 5703, but I think this should be a WONTFIX. To support them
> would just encourage slow, non-portable code.

OK, then I close this
But I suggest you to open an "enhancement" request that asks to deprecate the useless/bad intrinsics :-)