Issue 23814 - [Codegen] Calling member function of extern(C++) class with multiple inheritance doesn't preserve the EBX register in some cases
Summary: [Codegen] Calling member function of extern(C++) class with multiple inherita...
Status: NEW
Alias: None
Product: D
Classification: Unclassified
Component: dmd (show other issues)
Version: D2
Hardware: x86 Linux
: P1 normal
Assignee: No Owner
URL:
Keywords: pull
Depends on:
Blocks:
 
Reported: 2023-03-29 17:04 UTC by naydef
Modified: 2023-06-16 10:55 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description naydef 2023-03-29 17:04:46 UTC
I'm testing with the following code (although it doesn't seem to use this register every time for bug to manifest):

------------------------------
extern(C++) interface BaseInterface1
{
public:
    const(char)* func1();
    const(char)* func2();
}

extern(C++) abstract class BaseInterface2
{
public:
    const(char)* func3() {return "func3";}
    const(char)* func4() {return "func4";}
}

extern(C++) class MainClass : BaseInterface2, BaseInterface1
{
    override const(char)* func1() {return "func1_overriden";}
    override const(char)* func2() {return "func2_overriden";}
    override const(char)* func3() {return "func3_overriden";}
    override const(char)* func4() {return "func4_overriden";}
}


void main()
{
    auto cls = new MainClass();

    import core.stdc.stdio;
    printf("We'll now call func4");

    cls.func2();
}
------------------------------

The call to func2(), which will be a virtual call, would redirect execution to something like that(from IDA):

sub     [esp+arg_0], 4
call    $+5
pop     ebx ; EBX value not saved...
add ebx, 0x1234 
jmp __somememberfunction
Comment 1 naydef 2023-03-29 17:14:52 UTC
Corrected the code (previous one will not generate the bad code for the call):

extern(C++) interface BaseInterface1
{
public:
    const(char)* func1();
    const(char)* func2();
}

extern(C++) abstract class BaseInterface2
{
public:
    const(char)* func3() {return "func3";}
    const(char)* func4() {return "func4";}
}

extern(C++) class MainClass : BaseInterface2, BaseInterface1
{
    override const(char)* func1() {return "func1_overriden";}
    override const(char)* func2() {return "func2_overriden";}
    override const(char)* func3() {return "func3_overriden";}
    override const(char)* func4() {return "func4_overriden";}
}


void main()
{
    BaseInterface1 cls = new MainClass();

    import core.stdc.stdio;
    printf("We'll now call func4");

    cls.func1();
}


Assembly code

The assembly of the callee (IDA):
-----------------------------------------------------------
.text:00034790 _THUNK0         proc near               ; DATA XREF: .data:off_87304↓o
.text:00034790
.text:00034790 arg_0           = dword ptr  4
.text:00034790
.text:00034790                 sub     [esp+arg_0], 4
.text:00034795                 call    $+5
.text:0003479A
.text:0003479A loc_3479A:                              ; DATA XREF: _THUNK0+B↓o
.text:0003479A                 pop     ebx
.text:0003479B                 add     ebx, (offset _GLOBAL_OFFSET_TABLE_ - offset loc_3479A)
.text:000347A1                 jmp     _ZN9MainClass5func1Ev ; MainClass::func1(void)
.text:000347A1 _THUNK0         endp

-----------------------------------------------------------

Code of the caller:

-----------------------------------------------------------
.text:000348C5                 lea     ecx, (aWeLlNowCallFun - 86FF4h)[eax] ; "We'll now call func4"
.text:000348CB                 push    ecx
.text:000348CC                 mov     ebx, [ebp+_LOCALGOT6]
.text:000348CF                 call    _printf
.text:000348D4                 add     esp, 10h
.text:000348D7                 sub     esp, 0Ch
.text:000348DA                 push    [ebp+cls]
.text:000348DD                 mov     ebx, [ebp+_LOCALGOT6]
.text:000348E0                 mov     edx, [ebp+cls]
.text:000348E3                 mov     eax, [edx]
.text:000348E5                 call    ds:(_GLOBAL_OFFSET_TABLE_ - 86FF4h)[eax] ;  Call to cls.func1
.text:000348E7                 add     esp, 10h

-----------------------------------------------------------


Issue appears with DMD 2.102.2 compiling on Linux with dub parameter --arch=x86
Comment 2 Dlang Bot 2023-03-31 16:46:38 UTC
@naydef created dlang/dmd pull request #15063 "Fix Issue 23814 - [Codegen] Calling member function of extern(C++) cl…" fixing this issue:

- Fix Issue 23814 - [Codegen] Calling member function of extern(C++) class with...
  
  ... multiple inheritance doesn't preserve the EBX register in some cases

https://github.com/dlang/dmd/pull/15063
Comment 3 RazvanN 2023-04-03 08:13:47 UTC
What command line options are you using? I cannot reproduce this for neither 32 or 64 bit.

32 bit output:

00000000 <_ZN9MainClass5func4Ev>:
   0:   55                      push   ebp
   1:   8b ec                   mov    ebp,esp
   3:   83 ec 08                sub    esp,0x8
   6:   e8 00 00 00 00          call   b <_ZN9MainClass5func4Ev+0xb>
   b:   58                      pop    eax
   c:   05 02 00 00 00          add    eax,0x2
  11:   89 45 fc                mov    DWORD PTR [ebp-0x4],eax
  14:   8b 4d fc                mov    ecx,DWORD PTR [ebp-0x4]
  17:   8d 81 40 00 00 00       lea    eax,[ecx+0x40]
  1d:   c9                      leave  
  1e:   c3                      ret   


Does not use ebx.

64 bit output:

0000000000000000 <_ZN9MainClass5func4Ev>:
   0:   48 8d 05 00 00 00 00    lea    rax,[rip+0x0]        # 7 <_ZN9MainClass5func4Ev+0x7>
   7:   c3                      ret

Does not use rbx.

I don't think this issue is valid.
Comment 4 naydef 2023-04-03 13:34:24 UTC
I'm using DMD64 D Compiler v2.102.2 on Linux. Compiling with: "dmd app.d -m32"

I see you check func4, while my example code calls func1.
Use the second example code. You can use a debugger and break where func1 is called in _Dmain, step into the function and you'll see the usage of EBX register in a function called _THUNK0, which at the end jumps to _ZN9MainClass5func1Ev.
Comment 5 naydef 2023-04-03 19:16:13 UTC
I've made the following example (without the patch the generated executable crashes):

app.d
--------------------------------------
extern(C++) interface BaseInterface1
{
public:
    int func1();
    int func2();
}

extern(C++) abstract class BaseInterface2
{
public:
    int func3() {return 3;}
    int func4() {return 4;}
}

extern(C++) class MainClass : BaseInterface2, BaseInterface1
{
    override int func1() {return 1;}
    override int func2() {return 2;}
}

extern(C++) void cppFunc1(BaseInterface1 obj);


void main()
{
    BaseInterface1 cls = new MainClass();
    cppFunc1(cls);
}
--------------------------------------

app2.cpp
--------------------------------------
class BaseInterface1
{
public:
    virtual int func1();
    virtual int func2();
};

class BaseInterface2
{
public:
    virtual int func3();
    virtual int func4();
};

class MainClass : BaseInterface2, BaseInterface1
{
    virtual int func1();
    virtual int func2();
};

void cppFunc1(BaseInterface1* obj)
{
    int a = obj->func1();
    int b = obj->func2();
}
--------------------------------------

The executable is generated with the following command:
gcc -m32 -O -c app2.cpp -o app2.o;dmd app.d app2.o -m32

Feel free to comment on the code.
Comment 6 Walter Bright 2023-06-16 01:11:57 UTC
(In reply to RazvanN from comment #3)
>    6:   e8 00 00 00 00          call   b <_ZN9MainClass5func4Ev+0xb>
>    b:   58                      pop    eax
>    c:   05 02 00 00 00          add    eax,0x2

What this code is doing (regardless of whether it is EAX or EBX) is:

1. CALL: to the next instruction. This has the effect of pushing the address of the next instruction on the stack

2. POP reg: puts that address into reg

3. ADD reg,xxxx: reg is now pointing to data that is relative to the code section, likely a virtual function or a "thunk" to one

Can you try it with D classes rather than C++ classes?
Comment 7 naydef 2023-06-16 10:55:12 UTC
Hm, I don't know how to try it with a D class. The reproduction code relies on a C++ compiler (GCC uses the EBX register right after calling D virtual function, DMD doesn't seem to do that). Also the content of EBX is not used in this THUNK, so nothing depends on it.

I don't understand what "register clobbering" in the question refers to. I'd assume Walter means the correct fix is to preserve the register on THUNK entry and restore on exit. If that's what's meant, then I don't know how to achieve that (I'm not familiar with DMD). Also as the example shows, I see no code relying on EBX content, instead there's a JMP to regular function, so this CALL + POP + ADD sequence seems redundant.

Yea, I'm welcome for better fix...