Issue 24411 - [CODEGEN] bad shl codegen
Summary: [CODEGEN] bad shl codegen
Status: RESOLVED INVALID
Alias: None
Product: D
Classification: Unclassified
Component: dmd (show other issues)
Version: D2
Hardware: x86 All
: P1 major
Assignee: No Owner
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-02-26 06:19 UTC by Manu
Modified: 2024-02-28 09:10 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description Manu 2024-02-26 06:19:14 UTC
I just uncovered a very surprising bug.

I have a function like this:

bool validCode(ubyte code)
{
	enum validCodes = 0b1111100111001100111111110;
	return (1 << code) & validCodes;
}

void test()
{
    assert(validCode(76) == false); // FAILS!
}

Where `code` is an enum with some sparse values close to zero, and only specified code values are valid. I detect invalid code values by comparing the code value bit against a bitfield.

`validCodes` is 32bit, and I made the assumption that `1<<x` where x is greater than 32 would result in 0, and so the function above would return false for `code` values larger than 32.
It turns out this is NOT the case, and I get surprising results.

This code compiles to:
00007FF6D1FE781B  mov         eax,1  
00007FF6D1FE7820  movzx       ecx,byte ptr [code]  
00007FF6D1FE7824  shl         eax,cl  
00007FF6D1FE7826  test        eax,1F399FEh  

In this case, `code` is 76 (an invalid code), and it turns out that the x86 `shl` doesn't shift left by 75 (resulting in 0), what actually happens is that shl takes the lower 5 bits from cl, and shifts by that number, which happens to be 12, so the result is `1 << 12`, which coincides with a 1-bit, and this function returns TRUE in this case!

Is the << operator in the language specified to take the lower 5 bits of the operand?
I think this is a codegen bug... the language shouldn't assume that the user has clamped the value into the range required by the x86 opcode.
Comment 1 anonymous4 2024-02-27 09:41:54 UTC
Yes, that's how shift normally works: https://dlang.org/spec/expression.html#shift_expressions

I believe, the goto solution here is checked int.
Comment 2 Manu 2024-02-27 11:19:10 UTC
Okay, my bad. It's in the spec!

Surprising; dlang uses prides itself on not having surprise invisible undefined behaviour littered around your code.
This seems like a safety concern; it's conceivable an exploit could be written taking advantage of this undefined behaviour.
Comment 3 Dennis 2024-02-27 11:37:09 UTC
The key here is that it's specified as "implementation defined behavior", not "undefined behavior". It could give a bogus integer and lead to logic bugs, but it can't result in memory corruption in `@safe` code. D's 'safety' is specifically targeting memory safety, not logic bugs in general (e.g. unintentional integer overflow). It's still a systems programming language with similar performance to C. Introducing bounds checks to shift expressions is a big performance hit, especially considering shifts are usually found in bit-twiddling performance sensitive code.
Comment 4 anonymous4 2024-02-28 09:10:15 UTC
Shifts are often hardcoded. If you shift by untrusted amount, then you probably have bit arrays, and if you use bit arrays with untrusted indexes, then you need bound checking, not clear what you try to do, try https://dlang.org/phobos/std_bitmanip.html#BitArray

AFAIK most processors simply mask the shift amount. If some processor traps on overflow here, it would be safe, but probably not very useful for you.