///////////// test.d ///////////// import std.parallelism; import std.process; import std.range; void main() { foreach (i; 200.iota.parallel) execute(["true"]); } ////////////////////////////////// This program has a roughly 60% chance to deadlock and never finish executing on my machine. Inspecting the program's state with a debugger shows that the threads are generally in one of these states: Thread 11 (Thread 0x7f2a80ff9700 (LWP 424924)): #0 0x00007f2a89b82b12 in sigsuspend () from /usr/lib/libc.so.6 #1 0x0000563bd079bb08 in core.thread.thread_suspendHandler(int).op(void*) () #2 0x0000563bd079bb68 in core.thread.callWithStackShell(scope void(void*) nothrow delegate) () #3 0x0000563bd079ba95 in thread_suspendHandler () #4 <signal handler called> #5 0x00007f2a89e8da6a in read () from /usr/lib/libpthread.so.0 ... Thread 10 (Thread 0x7f2a817fa700 (LWP 424923)): #0 0x00007f2a89b82b12 in sigsuspend () from /usr/lib/libc.so.6 #1 0x0000563bd079bb08 in core.thread.thread_suspendHandler(int).op(void*) () #2 0x0000563bd079bb68 in core.thread.callWithStackShell(scope void(void*) nothrow delegate) () #3 0x0000563bd079ba95 in thread_suspendHandler () #4 <signal handler called> #5 0x00007f2a89c11414 in fork () from /usr/lib/libc.so.6 ... Thread 9 (Thread 0x7f2a81ffb700 (LWP 424922)): #0 0x00007f2a89b82b12 in sigsuspend () from /usr/lib/libc.so.6 #1 0x0000563bd079bb08 in core.thread.thread_suspendHandler(int).op(void*) () #2 0x0000563bd079bb68 in core.thread.callWithStackShell(scope void(void*) nothrow delegate) () #3 0x0000563bd079ba95 in thread_suspendHandler () #4 <signal handler called> #5 0x00007f2a89c515c9 in __lll_lock_wait_private () from /usr/lib/libc.so.6 #6 0x00007f2a89c51a88 in __run_fork_handlers () from /usr/lib/libc.so.6 #7 0x00007f2a89c113e9 in fork () from /usr/lib/libc.so.6 ... Thread 8 (Thread 0x7f2a827fc700 (LWP 424921)): #0 0x00007f2a89b82b12 in sigsuspend () from /usr/lib/libc.so.6 #1 0x0000563bd079bb08 in core.thread.thread_suspendHandler(int).op(void*) () #2 0x0000563bd079bb68 in core.thread.callWithStackShell(scope void(void*) nothrow delegate) () #3 0x0000563bd079ba95 in thread_suspendHandler () #4 <signal handler called> #5 0x00007f2a89e8e145 in nanosleep () from /usr/lib/libpthread.so.0 #6 0x0000563bd077370e in _D4core6thread6Thread5sleepFNbNiSQBf4time8DurationZv () #7 0x0000563bd07b3e2e in core.internal.spinlock.SpinLock.yield(ulong) shared () #8 0x0000563bd07b3dc4 in core.internal.spinlock.SpinLock.lock() shared () #9 0x0000563bd07c9307 in _D2gc4impl12conservativeQw14ConservativeGC__T9runLockedS_DQCeQCeQCcQCnQBs12mallocNoSyncMFNbmkKmxC8TypeInfoZPvS_DQEgQEgQEeQEp10mallocTimelS_DQFiQFiQFgQFr10numMallocslTmTkTmTxQCzZQFcMFNbKmKkKmKxQDsZQDl () #10 0x0000563bd07c1456 in _D2gc4impl12conservativeQw14ConservativeGC6qallocMFNbmkxC8TypeInfoZS4core6memory8BlkInfo_ () #11 0x0000563bd0787fe7 in gc_qalloc () ... Thread 7 (Thread 0x7f2a82ffd700 (LWP 424920)): #0 0x00007f2a89c515cb in __lll_lock_wait_private () from /usr/lib/libc.so.6 #1 0x00007f2a89bd06b3 in calloc () from /usr/lib/libc.so.6 #2 0x0000563bd07c61ad in _D2gc4impl12conservativeQw3Gcx16startScanThreadsMFNbZv () #3 0x0000563bd07c5f44 in _D2gc4impl12conservativeQw3Gcx12markParallelMFNbbZv () #4 0x0000563bd07c5862 in _D2gc4impl12conservativeQw3Gcx11fullcollectMFNbbZm () #5 0x0000563bd07c4050 in _D2gc4impl12conservativeQw3Gcx8bigAllocMFNbmKmkxC8TypeInfoZPv () #6 0x0000563bd07c935a in _D2gc4impl12conservativeQw14ConservativeGC__T9runLockedS_DQCeQCeQCcQCnQBs12mallocNoSyncMFNbmkKmxC8TypeInfoZPvS_DQEgQEgQEeQEp10mallocTimelS_DQFiQFiQFgQFr10numMallocslTmTkTmTxQCzZQFcMFNbKmKkKmKxQDsZQDl () #7 0x0000563bd07c1456 in _D2gc4impl12conservativeQw14ConservativeGC6qallocMFNbmkxC8TypeInfoZS4core6memory8BlkInfo_ () #8 0x0000563bd0787fe7 in gc_qalloc () ...
May (or may not) be related https://issues.dlang.org/show_bug.cgi?id=20256 if scanthread do not block SIGUSR1 and SIGUSR2
(In reply to igor.khasilev from comment #1) > May (or may not) be related https://issues.dlang.org/show_bug.cgi?id=20256 > if scanthread do not block SIGUSR1 and SIGUSR2 Unfortunately `digger run stable+druntime#2813 -- dmd -run test` still hangs.
I cannot reproduce locally in a VM. Does the problem go away with --DRT-gcopt=parallel:0 ?
(In reply to Rainer Schuetze from comment #3) > Does the problem go away with --DRT-gcopt=parallel:0 ? Yes.
(In reply to Rainer Schuetze from comment #3) > I cannot reproduce locally in a VM. From experimenting with taskset, it seems that there need to be at least 5 physical cores to run threads on for this bug to be reproduced. (Does not reproduce with `taskset f` but does reproduce with `taskset 1f`.)
@rainers created dlang/druntime pull request #2816 "fix Issue 20270 - [REG2.087] Deadlock in garbage collection when runn…" fixing this issue: - fix Issue 20270 - [REG2.087] Deadlock in garbage collection when running processes in parallel start scan threads while the world isn't suspended https://github.com/dlang/druntime/pull/2816
I have reproduced the issue when running the test for a higher number of times. Not sure why this doesn't appear more often. Please try https://github.com/dlang/druntime/pull/2816
Not sure why this wasn't closed by the bot when https://github.com/dlang/druntime/pull/2816 got merged.