D issues are now tracked on GitHub. This Bugzilla instance remains as a read-only archive.
Issue 13416 - dead-lock in FreeBSD suspend handler
Summary: dead-lock in FreeBSD suspend handler
Status: RESOLVED DUPLICATE of issue 15939
Alias: None
Product: D
Classification: Unclassified
Component: druntime (show other issues)
Version: D2
Hardware: All FreeBSD
: P1 blocker
Assignee: No Owner
URL:
Keywords: pull
Depends on:
Blocks:
 
Reported: 2014-09-01 22:42 UTC by Brad Roberts
Modified: 2022-12-30 23:35 UTC (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description Brad Roberts 2014-09-01 22:42:12 UTC
The unit test "obj/64/test_runner core.thread" semi-frequently deadlocks on the new build server.  It's an 8 core system vs the older boxes 2 core system.

(gdb) bt
#0  0x0000000800a0a7ac in sigsuspend () from /lib/libc.so.7
#1  0x0000000800786db5 in sigsuspend () from /lib/libthr.so.3
#2  0x000000000048c55d in core.thread.thread_suspendHandler() ()
#3  0x000000000048db2c in core.thread.callWithStackShell() ()
#4  0x000000000048c4c9 in thread_suspendHandler ()
#5  <signal handler called>
#6  0x000000080078956c in ?? () from /lib/libthr.so.3
#7  0x000000080078c5f0 in pthread_attr_get_np () from /lib/libthr.so.3
#8  0x000000000048e64d in core.thread.getStackBottom() ()
#9  0x000000000048c34a in thread_entryPoint ()
#10 0x00000008007835e1 in ?? () from /lib/libthr.so.3
#11 0x00007ffffeffa000 in ?? ()
Cannot access memory at address 0x7fffff1fa000

(gdb) thr 2
[Switching to thread 2 (Thread 800c041c0 (LWP 100229 initial thread))]
#0  0x000000080078d64c in ?? () from /lib/libthr.so.3
(gdb) bt
#0  0x000000080078d64c in ?? () from /lib/libthr.so.3
#1  0x000000080078d33c in ?? () from /lib/libthr.so.3
#2  0x00000008007894bd in ?? () from /lib/libthr.so.3
#3  0x000000080078902d in pthread_kill () from /lib/libthr.so.3
#4  0x000000000048db6b in core.thread.suspend() ()
#5  0x000000000048dd67 in thread_suspendAll ()
#6  0x00000000004e565a in gc.gc.Gcx.fullcollect() ()
#7  0x00000000004e3d5b in gc.gc.GC.fullCollect() ()
#8  0x00000000004e7df3 in gc_collect ()
#9  0x000000000048b22d in core.memory.GC.collect() ()
#10 0x000000000048fb5a in core.thread.__unittestL4780_99() ()
#11 0x00000000004900d6 in core.thread.__modtest() ()
#12 0x0000000000472d22 in test_runner.tester() ()
#13 0x000000000048b90a in runModuleUnitTests ()
#14 0x0000000000503903 in rt.dmain2._d_run_main() ()
#15 0x00000000005038b6 in rt.dmain2._d_run_main() ()
#16 0x0000000000503837 in _d_run_main ()
#17 0x0000000000472e53 in main ()
Comment 1 Brad Roberts 2014-09-02 06:17:26 UTC
Same behavior and stacktraces on the new 8 core freebsd 32 bit box as well.  Both are running freebsd 8.4, same as the other freebsd testers.
Comment 2 anonymous4 2014-09-10 11:56:55 UTC
pthread_kill hangs? Shouldn't it be asynchronous?
Comment 3 monarchdodra 2014-10-10 20:29:19 UTC
Upgraded to "BLOCKER", as this (relativelly frequently) trips up the auto-testers.
Comment 4 Martin Nowak 2014-11-22 22:52:04 UTC
That's a dead-lock in the pthread library.
Both pthread_attr_get_np and pthread_kill lock the same thread mutex.

_pthread_attr_get_np:
https://github.com/freebsd/freebsd/blob/428b45aa532260e8c6ddf0217ec31db2234d29a8/lib/libthr/thread/thr_attr.c#L154
_pthread_kill:
https://github.com/freebsd/freebsd/blob/428b45aa532260e8c6ddf0217ec31db2234d29a8/lib/libthr/thread/thr_kill.c#L64

_thr_find_thread:
https://github.com/freebsd/freebsd/blob/428b45aa532260e8c6ddf0217ec31db2234d29a8/lib/libthr/thread/thr_list.c#L351

We should try to use pthread_suspend_np or pthread_suspend_all_np instead.
Without a signal handler we'd still need to obtain the stack top.
There seems to be a function on OpenBSD pthread_stackseg_np, not sure how to do it on FreeBSD.
Comment 5 Martin Nowak 2014-12-07 00:26:57 UTC
Fairly simple to reproduce the problem.

cat > bug.d << CODE
import core.thread, core.sys.posix.pthread, core.stdc.stdio;

void loop()
{
  pthread_attr_t attr;
  pthread_attr_init(&attr);
  auto thr = pthread_self();
  while (true)
    pthread_attr_get_np(thr, &attr);
}

void main()
{
  auto thr = new Thread(&loop).start();
  while (true)
  {
      thread_suspendAll();
      thread_resumeAll();
      printf(".");
  }
}
CODE

dmd -run bug
Comment 6 Martin Nowak 2014-12-07 01:37:53 UTC
Using pthread_suspend_np didn't work out, because there is no way to get the current stack top of a suspended thread. I also tried to override SIGCANCEL which is used for pthread_suspend_np but that didn't work.

https://github.com/D-Programming-Language/druntime/pull/1061
Comment 7 github-bugzilla 2014-12-15 16:55:55 UTC
Commits pushed to master at https://github.com/D-Programming-Language/druntime

https://github.com/D-Programming-Language/druntime/commit/ad8662d65fe8f24be2c64c721eabe4da7f78b31f
fix Issue 13416 - dead-lock in FreeBSD suspend handler

- use pthread internal THR_IN_CRITICAL to retry suspend

https://github.com/D-Programming-Language/druntime/commit/513ba191f3e8b78aeb99336e27212dfdcacb39c5
Merge pull request #1061 from MartinNowak/fix13416

fix Issue 13416 - dead-lock in FreeBSD suspend handler
Comment 9 Joakim 2015-05-12 21:20:40 UTC
This fix doesn't seem to work on 9.1 i386, as the new FreeBSD test introduced in this PR hangs 90+% of the time.
Comment 10 Martin Nowak 2015-10-31 03:58:20 UTC
(In reply to Joakim from comment #9)
> This fix doesn't seem to work on 9.1 i386, as the new FreeBSD test
> introduced in this PR hangs 90+% of the time.

> I also tried to override SIGCANCEL
> which is used for pthread_suspend_np but that didn't work.

SIGCANCEL is the signal used by pthread_suspend_np internally.
The signal handler already deal with being in critical regions, hence it doesn't suffer from the deadlock. As it isn't allowed to overrride SIGCANCEL we imitated the behavior by poking in pthread guts (THR_IN_CRITICAL).
Comment 11 Martin Nowak 2015-10-31 03:59:43 UTC
(In reply to Joakim from comment #9)
> This fix doesn't seem to work on 9.1 i386, as the new FreeBSD test
> introduced in this PR hangs 90+% of the time.

Any further details? It doesn't seem like the pthread layout changed from 8.x to 9.1. Is it easy to reproduce w/ the test case of comment 5?
Comment 12 Joakim 2018-08-26 04:07:58 UTC
Sorry, only seeing your question now. I was simply checking the D tests on FreeBSD back then but I haven't used that OS in years, so can't look into it further now.
Comment 13 Dlang Bot 2022-01-18 18:23:38 UTC
@ibuclaw created dlang/druntime pull request #3682 "Issue 13416: Remove libthr hack from core.thread.osthread" mentioning this issue:

- Issue 13416: Remove libthr hack from core.thread.osthread

https://github.com/dlang/druntime/pull/3682
Comment 14 Dlang Bot 2022-01-20 12:36:15 UTC
dlang/druntime pull request #3682 "Issue 13416: Remove libthr hack from core.thread.osthread" was merged into master:

- 3ac665c49d7aae1893c4e4535f60d1b4e2d427a3 by Iain Buclaw:
  Issue 13416: Remove libthr hack from core.thread.osthread

https://github.com/dlang/druntime/pull/3682
Comment 15 Iain Buclaw 2022-12-30 23:35:02 UTC
Suspend signals changed to SIGRTMIN.

https://github.com/dlang/druntime/pull/3617

*** This issue has been marked as a duplicate of issue 15939 ***