D issues are now tracked on GitHub. This Bugzilla instance remains as a read-only archive.
Issue 4125 - std.numeric.gcd can use a binary GCD
Summary: std.numeric.gcd can use a binary GCD
Status: RESOLVED FIXED
Alias: None
Product: D
Classification: Unclassified
Component: phobos (show other issues)
Version: D2
Hardware: All All
: P2 enhancement
Assignee: No Owner
URL:
Keywords: bootcamp
Depends on:
Blocks:
 
Reported: 2010-04-24 16:57 UTC by bearophile_hugs
Modified: 2017-01-16 23:25 UTC (History)
5 users (show)

See Also:


Attachments
Faster GCD (6.39 KB, application/octet-stream)
2014-01-07 18:04 UTC, bearophile_hugs
Details

Note You need to log in before you can comment on or make changes to this issue.
Description bearophile_hugs 2010-04-24 16:57:35 UTC
std.numeric.gcd can use a faster Binary GCD algorithm, especially when the input type is unsigned. This page has both C code (and asm, but the C code is probably enough in many situations):

http://en.wikipedia.org/wiki/Binary_GCD_algorithm
Comment 1 Marco Leise 2012-01-31 12:11:35 UTC
Replace uint with ulong for the longer version ;)
I use this and it is notably faster than what I used before.

uint gcd(uint u, uint v)
{
	int shift;
	if (u == 0 || v == 0) return u | v;
	for (shift = 0; ((u | v) & 1) == 0; ++shift) {
		u >>= 1;
		v >>= 1;
	}
	while ((u & 1) == 0) u >>= 1;
	do {
		while ((v & 1) == 0) v >>= 1;
		if (u < v) {
			v -= u;
		} else {
			uint diff = u - v;
			u = v;
			v = diff;
		}
		v >>= 1;
	} while (v != 0);
	return u << shift;
}
Comment 2 Peter Alexander 2013-01-05 11:15:04 UTC
(In reply to comment #1)
> Replace uint with ulong for the longer version ;)
> I use this and it is notably faster than what I used before.

I implemented this (exactly as you have it) and it was slower than the algorithm that is already there. I tested on all pairs of integers below 10,000, and also on the pairs (x^2, y) for all x,y < 10,000. At best it was 50% slower, at worst 3x slower.

All tests used dmd -O -release -inline

I suspect the reason for the performance reduction is due to poor pipelining. The binary version involves a lot more branching, and more loop iterations than the standard algorithm. Also, the branches taken are highly unpredictable.

Maybe I'll look at this again in the future to try and make it faster, but it's pretty low on my priority list.
Comment 3 bearophile_hugs 2013-01-06 03:44:33 UTC
(In reply to comment #2)

> I implemented this (exactly as you have it) and it was slower than the
> algorithm that is already there. I tested on all pairs of integers below
> 10,000, and also on the pairs (x^2, y) for all x,y < 10,000. At best it was 50%
> slower, at worst 3x slower.
> ...
> Maybe I'll look at this again in the future to try and make it faster, but it's
> pretty low on my priority list.

Thank you for doing some experiments. Once the experiments are conclusive, this enhancement can be closed.

(Then at the moment a more important function for Phobos is an efficient GCD for bigints.)
Comment 4 Don 2013-01-08 02:37:41 UTC
FWIW, you can get rid of most of the conditional branches by using:

min(u,v) = v + ( (cast(int)(u-v)) >> (8*int.sizeof - 1)) & (u-v)

the shift smears the sign bit of u-v so that it makes a mask either 0x0000_0000 or 0xFFFF_FFFF.

I think the general consensus is that (at least if you use asm), binary GCD is faster on all known processors, but not necessarily by a large amount.
Comment 5 Artem Tarasov 2013-01-08 06:42:51 UTC
(In reply to comment #0)
> std.numeric.gcd can use a faster Binary GCD algorithm, especially when the
> input type is unsigned. This page has both C code (and asm, but the C code is
> probably enough in many situations):
> 
> http://en.wikipedia.org/wiki/Binary_GCD_algorithm

Maybe instead of reinventing the wheel LibTomMath library should be used? It is in public domain, has decent performance, and is stable enough to provide implementation of big integers in TCL and Rubinius.
Comment 6 bearophile_hugs 2014-01-07 18:04:17 UTC
Created attachment 1313 [details]
Faster GCD

Code for a binary GCD that on LDC2 is about twice faster on uint values, and it's rather faster with dmd too.

Timings ("gcd" is the Phobos one):

DMD 2.061alpha (32 bit):
              gcd, time = 1924
    gcd_recursive, time = 1937
gcd_iterative_sub, time = 2931
gcd_iterative_mod, time = 1948
       gcd_binary, time = 3357
   gcd_binary_mod, time = 2136
     gcd_binary_2, time = 1406
     gcd_binary_2, time = 1391
Results sum: 1284258816


LDC2 (32 bit):
              gcd, time = 1930
    gcd_recursive, time = 1924
gcd_iterative_sub, time = 3148
gcd_iterative_mod, time = 1926
       gcd_binary, time = 2635
   gcd_binary_mod, time = 2036
     gcd_binary_2, time = 1461
     gcd_binary_3, time = 1026
Results sum: 1284258816
Comment 7 Alexandru Razvan Caciulescu 2016-12-09 15:46:15 UTC
After conducting some benchmarks we arrived to the conclusion that currently the dmd compiler works best with Euclid's algorithm for GCD, otherwise we use Stein's algorithm for ldc or gdc.

I tested both the previously shared benchmarks on the forum and a couple of my own  and noted that gcd_binary2 implemented by bearophile_hugs@eml.cc has the best results, so I used that.

PR: https://github.com/dlang/phobos/pull/4940
Comment 8 github-bugzilla 2016-12-12 14:20:02 UTC
Commits pushed to master at https://github.com/dlang/phobos

https://github.com/dlang/phobos/commit/6b4c2585fe5d25488a2df6a7b9b7c49031c63253
Fix Issue 4125 - std.numeric.gcd can use a binary GCD

https://github.com/dlang/phobos/commit/19445fc71e8aabdbd42f0ad8a571a57601a5ff39
Merge pull request #4940 from Darredevil/issue-4125

Fix Issue 4125 - std.numeric.gcd can use a binary GCD
Comment 9 github-bugzilla 2017-01-07 03:03:03 UTC
Commits pushed to stable at https://github.com/dlang/phobos

https://github.com/dlang/phobos/commit/6b4c2585fe5d25488a2df6a7b9b7c49031c63253
Fix Issue 4125 - std.numeric.gcd can use a binary GCD

https://github.com/dlang/phobos/commit/19445fc71e8aabdbd42f0ad8a571a57601a5ff39
Merge pull request #4940 from Darredevil/issue-4125
Comment 10 github-bugzilla 2017-01-16 23:25:44 UTC
Commits pushed to newCTFE at https://github.com/dlang/phobos

https://github.com/dlang/phobos/commit/6b4c2585fe5d25488a2df6a7b9b7c49031c63253
Fix Issue 4125 - std.numeric.gcd can use a binary GCD

https://github.com/dlang/phobos/commit/19445fc71e8aabdbd42f0ad8a571a57601a5ff39
Merge pull request #4940 from Darredevil/issue-4125