Issue 14937 - Slow code compared to ldc/gdc on calculation with real variables
Summary: Slow code compared to ldc/gdc on calculation with real variables
Status: NEW
Alias: None
Product: D
Classification: Unclassified
Component: dmd (show other issues)
Version: D2
Hardware: x86_64 Linux
: P4 enhancement
Assignee: No Owner
URL:
Keywords: performance
Depends on:
Blocks:
 
Reported: 2015-08-19 19:26 UTC by secondaryAccount
Modified: 2022-12-17 10:32 UTC (History)
1 user (show)

See Also:


Attachments
benchmarked code (1.06 KB, text/plain)
2015-08-19 19:26 UTC, secondaryAccount
Details
compressed benchmark input file (968.42 KB, application/x-bzip)
2015-08-19 20:37 UTC, secondaryAccount
Details
full benchmark code for reduced input file (2.20 KB, text/plain)
2015-08-19 20:45 UTC, secondaryAccount
Details

Note You need to log in before you can comment on or make changes to this issue.
Description secondaryAccount 2015-08-19 19:26:26 UTC
Created attachment 1542 [details]
benchmarked code

http://forum.dlang.org/thread/mqv2ct$1cpp$1@digitalmars.com?page=9#post-mr2ef5:241e2a:241:40digitalmars.com

The code is in the attachment.

Timings with some test input described below:
dmd -O -inline -release -noboundscheck              3700-3800 ms
gdc -O3 -march=native -frelease -fno-bounds-check   ~1000 ms
ldc2 -O3 -release -disable-boundscheck              ~800 ms
                                          
versions:
dmd 2.068
gdc based on 4.9.2
ldc 0.15.2beta1

OS: linux X86_64


Some notes:
- the benchmark is calling cosineSimilarity 1 million times with different input and sum the return values (large text input file + IO functions omitted here. I can add them if helpful.)
- timing with std.datetime around the loop - no IO included.
- pragma(inline, true) shows that dmd is unable to inline scalarProduct and normSquared. disabling inlining for ldc causes no noticeable slowdown.
- elements of SparseVector are sorted by index.
- SparseVector.length is usually between 50 and 100 and maximal index is 47,000
- v1 and v2 are not pointing to the same data.
- gap between dmd and ldc/gdc is much smaller when replacing "real" with double.
Comment 1 secondaryAccount 2015-08-19 20:37:07 UTC
Created attachment 1543 [details]
compressed benchmark input file

reduced benchmark input file to pass size limit.
Comment 2 secondaryAccount 2015-08-19 20:45:30 UTC
Created attachment 1544 [details]
full benchmark code for reduced input file

The attached input file contains 1400 example vectors. This benchmark programm calls cosineSimilarity 700 x 700 times (controlled by slices / foreach loops in main). Sufficient to reproduce.

The timings in the bug report are based on 1000 x 1000 calls of cosineSimilarity and 2000 example vectors.

Path to input file is hard coded (same directory) -> line 11