Issue 20434 - Drastic GC slowdown with repeated large allocations
Summary: Drastic GC slowdown with repeated large allocations
Status: RESOLVED INVALID
Alias: None
Product: D
Classification: Unclassified
Component: druntime (show other issues)
Version: D2
Hardware: x86_64 Linux
: P1 normal
Assignee: No Owner
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-12-06 00:10 UTC by Gregor Mückl
Modified: 2021-08-14 19:57 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description Gregor Mückl 2019-12-06 00:10:19 UTC
I accidentally found a huge slowdown in the GC when running this synthetic benchmark. I ran this program on Windows and Linux as 32 and 64 bit program with largely the same result:

----
module gcbench;

import core.memory;
import core.time;
import std.array;
import std.datetime;
import std.random;
import std.stdio;

int main()
{
	for(int i = 0; i < 160; i++) {
		SysTime startTime = Clock.currTime();
		for(int j = 0; j < 1000; j++) {
			int size = uniform!"[]"(1, 20);
			int[] trash = uninitializedArray!(int[])(i * 1024*1024);
			if(j % 10 == 0) GC.collect();
		}
		SysTime endTime = Clock.currTime();
		Duration duration = endTime - startTime;
		double msecs = duration.total!"msecs"();
		writeln(i, " ", msecs);
	}
	return 0;
}
----

Runtime of the inner loop increases roughly linearly with each new iteration. It should become constant after some time. Instead, runtime can increase by 100 times and more. perf seems to identify a plausible culprit:

    55.65%  gcbench  gcbench             [.] _D2gc4impl12conservativeQw15LargeObjectPool10allocPagesMFNbmZm
    12.46%  gcbench  gcbench             [.] _D2gc4impl12conservativeQw3Gcx12collectRootsMFNbNlPvQcZv
    11.33%  gcbench  libc-2.30.so        [.] __memset_avx2_erms
     4.52%  gcbench  gcbench             [.] _D2gc4impl12conservativeQw3Gcx5sweepMFNbZm
     1.66%  gcbench  libpthread-2.30.so  [.] pthread_cond_timedwait@@GLIBC_2.3.2
     1.43%  gcbench  libpthread-2.30.so  [.] __pthread_mutex_unlock_usercnt
     1.39%  gcbench  libpthread-2.30.so  [.] __lll_lock_wait

Looking at trace for LargeObjectPool.allocPages, it shows that a single small loop is responsible. Unfortunately, perf doesn't show the corresponding source lines.

I suspect that large arrays are kept alive across GC runs. Maybe Issue #13558 is related because I do not see any arrays getting freed at all, That issue is still marked as NEW.

Zeroing the arrays doesn't change the result.
Comment 1 Steven Schveighoffer 2021-08-04 14:43:09 UTC
> 			int[] trash = uninitializedArray!(int[])(i * 1024*1024);

Should this be `size * 1024*1024` instead? I would expect doing 1000 allocations of +1MB each iteration to get slower over time.
Comment 2 Gregor Mückl 2021-08-14 19:57:29 UTC
Steven, 

You are right. The example I posted was wrong. This bug is just plain invalid. Sorry for wasting time with it.