Issue 6498 - [CTFE] copy-on-write is slow and causes huge memory usage
Summary: [CTFE] copy-on-write is slow and causes huge memory usage
Status: NEW
Alias: None
Product: D
Classification: Unclassified
Component: dmd (show other issues)
Version: D2
Hardware: All All
: P2 critical
Assignee: No Owner
URL:
Keywords: bounty, CTFE
Depends on:
Blocks: 7442
  Show dependency treegraph
 
Reported: 2011-08-15 00:54 UTC by Don
Modified: 2022-06-10 11:53 UTC (History)
9 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description Don 2011-08-15 00:54:10 UTC
This is the main reason why CTFE is so slow.

int bug6498(int x)
{
    int n = 0;
    while (n < x)
        ++n;
    return n;
}
static assert(bug6498(10_000_000)==10_000_000);

--> Fails with an 'out of memory' error.
Comment 1 Don 2012-11-26 07:14:57 UTC
Upgrading severity. I've done several commits to move towards a solution but I still need to do more restructuring to properly fix this.
Comment 2 camille 2014-02-11 17:24:58 UTC
There is a $105 bounty on this issue at Bountysource: https://www.bountysource.com/issues/1325927.
Comment 3 Per Nordlöw 2014-06-28 13:43:38 UTC
Don: Is there a Github PR or branch for your changes or are these things normally kept secret because this issue has a bounty?
Comment 4 Iain Buclaw 2014-06-28 16:08:41 UTC
FYI, all PR's have been merged in.

I won't bother listing them all (there's a lot that was done over 2012/2013).  There has been no work on this since June 2013 IIRC.

https://github.com/D-Programming-Language/dmd/pull/1778#issuecomment-19964496


What should be focused on (thanks to Walter's idea of allocating but not freeing memory) is to limit just how much memory is allocated from CTFE.  By possibly find ways to re-use and not re-allocate memory, or maybe giving CTFE its own allocator (it is a backend in its own right, afterall).
Comment 5 RazvanN 2022-06-09 14:27:33 UTC
This seems to have been fixed. On my machine it takes 5 seconds to run this and it appears to use 2-3% of my 16 GB RAM. Should we close this?
Comment 6 mhh 2022-06-10 05:16:30 UTC
The memory usage has improved a lot but this is still ridiculously slow.

Compare with a soon to be upstream-ed -preview=newCTFE: https://asciinema.org/a/zTHuVmXbsZ4ryWGfCd2bXoJG5 (roughly 10x faster)

SDC does this in about 0.04 sec on my machine so 50x to 80x faster
Comment 7 Iain Buclaw 2022-06-10 11:45:45 UTC
Metrics of the code in this report ran by v2.080:
---
Command being timed: "./generated/linux/release/64/dmd issue6498.d -c"
User time (seconds): 6.44
System time (seconds): 0.29
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:06.75 
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 1104116 
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 274715
Voluntary context switches: 1
Involuntary context switches: 256
Swaps: 0
File system inputs: 246
File system outputs: 6
Socket messages sent: 0 
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
---

As of v2.085.0 - when most of dinterpret had been converted over to returning UnionExp on the stack.
---
Command being timed: "./generated/linux/release/64/dmd issue6498.d -c"
User time (seconds): 6.64
System time (seconds): 0.19
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:06.84
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 636044
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 1
Minor (reclaiming a frame) page faults: 157878
Voluntary context switches: 1
Involuntary context switches: 231
Swaps: 0
File system inputs: 386
File system outputs: 6
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
---

As of v2.089.0 - when a ctfeRegion allocator was introduced to free memory after exiting an interpret "scope".
---
Command being timed: "./generated/linux/release/64/dmd issue6498.d -c"
User time (seconds): 6.88
System time (seconds): 0.14
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:07.03
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 637204
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 158019
Voluntary context switches: 1
Involuntary context switches: 17
Swaps: 0
File system inputs: 474
File system outputs: 6
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
---

As of v2.100.0
---
Command being timed: "./generated/linux/release/64/dmd issue6498.d -c"
User time (seconds): 7.13
System time (seconds): 0.07
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:07.22
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 482504
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 119238
Voluntary context switches: 1
Involuntary context switches: 223
Swaps: 0
File system inputs: 833
File system outputs: 6
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
---


With -lowmem.
---
Command being timed: "./generated/linux/release/64/dmd issue6498.d -c -lowmem"
User time (seconds): 7.64
System time (seconds): 0.05
Percent of CPU this job got: 103%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:07.42
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 28760
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 1
Minor (reclaiming a frame) page faults: 5679
Voluntary context switches: 2376
Involuntary context switches: 774
Swaps: 0
File system inputs: 833
File system outputs: 6
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
---
Comment 8 Iain Buclaw 2022-06-10 11:53:21 UTC
(In reply to Iain Buclaw from comment #7)
> v2.080:
> Maximum resident set size (kbytes): 1104116  
> v2.085.0:
> Maximum resident set size (kbytes): 636044
> v2.089.0:
> Maximum resident set size (kbytes): 637204
> v2.100.0:
> Maximum resident set size (kbytes): 482504
> -lowmem (as of v2.090):
> Maximum resident set size (kbytes): 28760
It's still nearly 500MB, so only 2x better than where we were 4 years ago, and still a far cry away from the possible 30MB we could instead by managing with.

I also note that the compiler has slowed down by 1 second since v2.080 as well, so CTFE is not getting faster at all...