Issue 21919 - darwin: SEGV in core.thread tests on OSX 11
Summary: darwin: SEGV in core.thread tests on OSX 11
Status: RESOLVED FIXED
Alias: None
Product: D
Classification: Unclassified
Component: druntime (show other issues)
Version: D2
Hardware: x86_64 Mac OS X
: P1 major
Assignee: No Owner
URL:
Keywords: pull
: 22025 (view as issue list)
Depends on:
Blocks:
 
Reported: 2021-05-13 12:37 UTC by Iain Buclaw
Modified: 2021-11-22 14:14 UTC (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description Iain Buclaw 2021-05-13 12:37:17 UTC
Native configuration is x86_64-apple-darwin20

        === gdc tests ===
Running target unix
FAIL: gdc.test/runnable/eh.d   execution test
FAIL: gdc.test/runnable/eh.d -O2   execution test
FAIL: gdc.test/runnable/eh.d -O2 -fPIC   execution test
FAIL: gdc.test/runnable/eh.d -O2 -fPIC -shared-libphobos   execution test
FAIL: gdc.test/runnable/eh.d -O2 -shared-libphobos   execution test
FAIL: gdc.test/runnable/eh.d -fPIC   execution test
FAIL: gdc.test/runnable/eh.d -fPIC -shared-libphobos   execution test
FAIL: gdc.test/runnable/eh.d -shared-libphobos   execution test
FAIL: gdc.test/runnable/test4.d   execution test
FAIL: gdc.test/runnable/test4.d -shared-libphobos   execution test
FAIL: gdc.test/runnable/testdstress.d   execution test
FAIL: gdc.test/runnable/testdstress.d -shared-libphobos   execution test

        === gdc Summary ===
# of expected passes        10388
# of unexpected failures    12

        === libphobos tests ===
Running target unix
FAIL: libphobos.druntime/core/thread.d execution test
FAIL: libphobos.exceptions/chain.d execution test
FAIL: libphobos.phobos/std/concurrency.d execution test

        === libphobos Summary ===
# of expected passes        394
# of unexpected failures    3
# of unsupported tests      1
Comment 1 Iain Buclaw 2021-05-13 12:37:45 UTC
Confirmed on DMD when running the unittests.

generated/osx/release/64/unittest/test_runner core.thread.threadgroup
make[1]: *** [generated/osx/release/64/unittest/core/thread/fiber] Bus error: 10
make[1]: *** Deleting file `generated/osx/release/64/unittest/core/thread/fiber'
make[1]: *** Waiting for unfinished jobs....
generated/osx/release/64/unittest/test_runner core.thread.types
make: *** [unittest-release] Error 2
Comment 2 Iain Buclaw 2021-05-13 12:38:12 UTC
$ sw_vers
ProductName:	macOS
ProductVersion:	11.1
BuildVersion:	20C69

$ clang --version
Apple clang version 12.0.0 (clang-1200.0.32.27)

$ xcodebuild -version
Xcode 12.2
Build version 12B45b

$ uname -v
Darwin Kernel Version 20.2.0: Wed Dec  2 20:39:59 PST 2020; root:xnu-7195.60.75~1/RELEASE_X86_64

druntime: a79bb0eb0424f77159eb72e1c527db3b2ae2a57d

dmd: 97aa2ae5ee19ce6a2979ca1627479df713f99252
Comment 3 Iain Buclaw 2021-05-13 12:38:38 UTC
Process 65900 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x100c8cc70)
    frame #0: 0x00007fff203794e0 libsystem_pthread.dylib`___chkstk_darwin + 96
libsystem_pthread.dylib`___chkstk_darwin:
->  0x7fff203794e0 <+96>:  testq  %rcx, (%rcx)
    0x7fff203794e3 <+99>:  popq   %rcx
    0x7fff203794e4 <+100>: retq
libsystem_pthread.dylib`pthread_getspecific:
    0x7fff203794e5 <+0>:   movq   %gs:(,%rdi,8), %rax
Target 0: (test_runner) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x100c8cc70)
  * frame #0: 0x00007fff203794e0 libsystem_pthread.dylib`___chkstk_darwin + 96
    frame #1: 0x00007fff20379480 libsystem_pthread.dylib`thread_start + 20
    frame #2: 0x00007fff2a542a9c libunwind.dylib`libunwind::UnwindCursor<libunwind::LocalAddressSpace, libunwind::Registers_x86_64>::step() + 476
    frame #3: 0x00007fff2a5446ee libunwind.dylib`_Unwind_RaiseException + 189
    frame #4: 0x00000001001bbeb8 test_runner`_d_throwdwarf at dwarfeh.d:317
    frame #5: 0x0000000100188b84 test_runner`_D4core6thread5fiber19__unittest_L1679_C1FZ9__lambda1MFNaNbNfZv at fiber.d:1686
    frame #6: 0x000000010018fb85 test_runner`_D4core6thread7context8Callable6opCallMFZv at context.d:46
    frame #7: 0x0000000100187ac5 test_runner`_D4core6thread5fiber5Fiber3runMFZv at fiber.d:869
    frame #8: 0x000000010018749f test_runner`fiber_entryPoint at fiber.d:157
Comment 4 Iain Buclaw 2021-05-13 12:39:33 UTC
This was discovered in December, hence the git commit hashes are 5 months old.
Comment 5 Iain Buclaw 2021-05-13 14:34:27 UTC
To describe what looks like is happening:

1. A D fiber context switch occurs.
2. An exception is thrown.
3. libunwind's entry point for raising exceptions is called.
4. Segfault somewhere deep in libc/pthread.

The unittest block that matches the encoded line numbers in the function name is:
---
// Test exception handling inside fibers.
unittest
{
    enum MSG = "Test message.";
    string caughtMsg;
    (new Fiber({
        try
        {
            throw new Exception(MSG);
        }
        catch (Exception e)
        {
            caughtMsg = e.msg;
        }
    })).call();
    assert(caughtMsg == MSG);
}
Comment 6 Lionello Lunesu 2021-09-11 23:13:58 UTC
I suspect I'm running into this same bug while running the DMD 2.097.2 test suite on Big Sur:

$ test_results/runnable/test15779_0
fish: “test_results/runnable/test15779…” terminated by signal SIGBUS (Misaligned address error)

$ lldb test_results/runnable/test15779_0
(lldb) target create "test_results/runnable/test15779_0"
Current executable set to '/Users/llunesu/repos/d/dmd/test/test_results/runnable/test15779_0' (x86_64).
(lldb) r
Process 854 launched: '/Users/llunesu/repos/d/dmd/test/test_results/runnable/test15779_0' (x86_64)
Process 854 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x1001edc60)
    frame #0: 0x00007fff2031b4a8 libsystem_pthread.dylib`___chkstk_darwin + 96
libsystem_pthread.dylib`___chkstk_darwin:
->  0x7fff2031b4a8 <+96>:  testq  %rcx, (%rcx)
    0x7fff2031b4ab <+99>:  popq   %rcx
    0x7fff2031b4ac <+100>: retq

libsystem_pthread.dylib`pthread_getspecific:
    0x7fff2031b4ad <+0>:   movq   %gs:(,%rdi,8), %rax
Target 0: (test15779_0) stopped.
(lldb)
Comment 7 Lionello Lunesu 2021-09-11 23:15:21 UTC
Stack trace for previous crash:

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x1001e9c60)
  * frame #0: 0x00007fff2031b4a8 libsystem_pthread.dylib`___chkstk_darwin + 96
    frame #1: 0x00007fff2031b448 libsystem_pthread.dylib`thread_start + 20
    frame #2: 0x00007fff2a4bfb2d libunwind.dylib`libunwind::UnwindCursor<libunwind::LocalAddressSpace, libunwind::Registers_x86_64>::getInfoFromDwarfSection(unsigned long, libunwind::UnwindInfoSections const&, unsigned int) + 191
    frame #3: 0x00007fff2a4bfa01 libunwind.dylib`libunwind::UnwindCursor<libunwind::LocalAddressSpace, libunwind::Registers_x86_64>::setInfoBasedOnIPRegister(bool) + 999
    frame #4: 0x00007fff2a4c1ec9 libunwind.dylib`libunwind::UnwindCursor<libunwind::LocalAddressSpace, libunwind::Registers_x86_64>::step() + 461
    frame #5: 0x00007fff2a4c3a18 libunwind.dylib`_Unwind_RaiseException + 189
    frame #6: 0x000000010002fca5 test15779_0`_d_throwdwarf + 185
    frame #7: 0x0000000100002410 test15779_0`_D9test157793barFZ9__lambda1FNaNfZv + 80
    frame #8: 0x000000010002cc2f test15779_0`_D4core6thread7context8Callable6opCallMFZv + 27
    frame #9: 0x00000001000293a7 test15779_0`fiber_entryPoint + 99
Comment 8 Lionello Lunesu 2021-09-11 23:17:42 UTC
$ sw_vers
ProductName:	macOS
ProductVersion:	11.5.2
BuildVersion:	20G95

$ clang --version
Apple clang version 12.0.5 (clang-1205.0.22.11)
Target: x86_64-apple-darwin20.6.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

$ xcodebuild -version
Xcode 12.5.1
Build version 12E507

$ uname -v
Darwin Kernel Version 20.6.0: Wed Jun 23 00:26:31 PDT 2021; root:xnu-7195.141.2~5/RELEASE_X86_64

dmd, druntime, Phobos: tag v2.097.2
Comment 9 Iain Buclaw 2021-11-07 18:44:18 UTC
Done some prodding around, and the root cause is darwin's libunwind now overflows the Fiber's small 16kb stack.

Fix then is to bump the stack allocated for Fibers.

     version (Windows)
         // exception handling walks the stack, invoking DbgHelp.dll which
         // needs up to 16k of stack space depending on the version of DbgHelp.dll,
         // the existence of debug symbols and other conditions. Avoid causing
         // stack overflows by defaulting to a larger stack size
         enum defaultStackPages = 8;
+    else version (OSX)
+    {
+        version (X86_64)
+            enum defaultStackPages = 8;
+        else
+            enum defaultStackPages = 4;
+    }
     else
         enum defaultStackPages = 4;

Darwin x86 pagesize is 4k, whilst arm64 is 16k, so this fix should only be applied to 64-bit code.
Comment 10 Dlang Bot 2021-11-07 22:13:25 UTC
@ibuclaw updated dlang/druntime pull request #3612 "fix Issue 21919 - darwin: SEGV in core.thread tests on OSX 11" fixing this issue:

- fix Issue 21919 - darwin: SEGV in core.thread tests on OSX 11

https://github.com/dlang/druntime/pull/3612
Comment 11 Dlang Bot 2021-11-08 06:33:49 UTC
dlang/druntime pull request #3612 "fix Issue 21919 - darwin: SEGV in core.thread tests on OSX 11" was merged into stable:

- ad6583ff842694a07ecb0464aaf5efde13f5c67c by Iain Buclaw:
  fix Issue 21919 - darwin: SEGV in core.thread tests on OSX 11

https://github.com/dlang/druntime/pull/3612
Comment 12 Dlang Bot 2021-11-08 15:54:11 UTC
dlang/druntime pull request #3615 "Merge `stable` in `mater`" was merged into master:

- 17f51ab99725494c449257a89637e827721becde by Iain Buclaw:
  fix Issue 21919 - darwin: SEGV in core.thread tests on OSX 11

https://github.com/dlang/druntime/pull/3615
Comment 13 Iain Buclaw 2021-11-22 14:14:15 UTC
*** Issue 22025 has been marked as a duplicate of this issue. ***