Created attachment 1542 [details] benchmarked code http://forum.dlang.org/thread/mqv2ct$1cpp$1@digitalmars.com?page=9#post-mr2ef5:241e2a:241:40digitalmars.com The code is in the attachment. Timings with some test input described below: dmd -O -inline -release -noboundscheck 3700-3800 ms gdc -O3 -march=native -frelease -fno-bounds-check ~1000 ms ldc2 -O3 -release -disable-boundscheck ~800 ms versions: dmd 2.068 gdc based on 4.9.2 ldc 0.15.2beta1 OS: linux X86_64 Some notes: - the benchmark is calling cosineSimilarity 1 million times with different input and sum the return values (large text input file + IO functions omitted here. I can add them if helpful.) - timing with std.datetime around the loop - no IO included. - pragma(inline, true) shows that dmd is unable to inline scalarProduct and normSquared. disabling inlining for ldc causes no noticeable slowdown. - elements of SparseVector are sorted by index. - SparseVector.length is usually between 50 and 100 and maximal index is 47,000 - v1 and v2 are not pointing to the same data. - gap between dmd and ldc/gdc is much smaller when replacing "real" with double.
Created attachment 1543 [details] compressed benchmark input file reduced benchmark input file to pass size limit.
Created attachment 1544 [details] full benchmark code for reduced input file The attached input file contains 1400 example vectors. This benchmark programm calls cosineSimilarity 700 x 700 times (controlled by slices / foreach loops in main). Sufficient to reproduce. The timings in the bug report are based on 1000 x 1000 calls of cosineSimilarity and 2000 example vectors. Path to input file is hard coded (same directory) -> line 11