D issues are now tracked on GitHub. This Bugzilla instance remains as a read-only archive.
Issue 6788 - std.algorithm.combinations
Summary: std.algorithm.combinations
Status: RESOLVED DUPLICATE of issue 7128
Alias: None
Product: D
Classification: Unclassified
Component: phobos (show other issues)
Version: D2
Hardware: All All
: P2 enhancement
Assignee: No Owner
URL:
Keywords: bootcamp
: 13039 (view as issue list)
Depends on:
Blocks:
 
Reported: 2011-10-07 16:19 UTC by bearophile_hugs
Modified: 2022-04-03 19:01 UTC (History)
5 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description bearophile_hugs 2011-10-07 16:19:10 UTC
This D2 program contains a very common iteration patten:


import std.stdio, std.typecons;
void main() {
  auto data = [1, 2, 3, 4];
  foreach (i, x1; data)
    foreach (x2; data[i+1 .. $])
      writeln(tuple(x1, x2)); // do something with x1 and x2
}


It prints (dmd 2.056head):

Tuple!(int,int)(1, 2)
Tuple!(int,int)(1, 3)
Tuple!(int,int)(1, 4)
Tuple!(int,int)(2, 3)
Tuple!(int,int)(2, 4)
Tuple!(int,int)(3, 4)


As you see it is quite different from "zip".

So I suggest to add a std.range.pairwise range that does the same thing (same output):


import std.stdio, std.range;
void main() {
    auto data = [1, 2, 3, 4];
    foreach (tup; pairwise(data))
        writeln(tup);
}


This coding pattern is very common. It happens when you have a sequence of items, and you want to work (apply a function) on all the pairs, and your diadic function (function with two arguments) doesn't care for its arguments order (symmetric function).

Often O(n ** 2) algorithms have to see all the pairs. But often the diadic function is symmetric. So you need to scan only half of the matrix, an upper triangle, or lower one. This is where pairwise is useful for. It's not a necessary range (even zip is not necessary in D), because two loops are enough to replace it, but it helps a bit because it avoids mistakes with the array indexes. I have done similar mistakes more than once in past.

pairwise(sequence) is mostly meant to be used with a random-access input sequence. It's not hard to make it work with forward ranges too, but it becomes less efficient.


In Python I use a similar generator, defined just like this:


from itertools import combinations
def pairwise(seq):
    return combinations(seq, 2)


Usage examples:

>>> list( pairwise([]) )
[]
>>> list(pairwise([1,2]))
[(1, 2)]
>>> list( pairwise("abc") )
[('a', 'b'), ('a', 'c'), ('b', 'c')]
>>> list(pairwise([0, 1, 2, 3]))
[(0, 1), (0, 2), (0, 3), (1, 2), (1, 3), (2, 3)]
>>> list(pairwise(range(5)))
[(0, 1), (0, 2), (0, 3), (0, 4), (1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)]


I expect Phobos to eventually have two very useful lazy ranges that generate permutations and combinations. But pairwise is supposed to be very efficient, here I'd like the compiler to generate code similar to two loops inside each other, like in the first code example. This is probably hard to do if pairwise is implemented with a call to a combinations() function.


Why is pairwise just about pairs? Isn't a more general solution better?


[20:33]	bearophile	A general function that takes n ranges and yields all possible unsorted n-tuples is possible, but in practice this is a very uncommon thing in my code. This is why I have limited it to pairs. It is just about pairs because this is the most common case. 

A potential problem: some users might wrongly think pairwise(sequence) is the same as std.range.chunks(sequence,2). The documentation has to specify they are two quite different things, to avoid some of such mistakes.


This is an alternative design, similar to how lockstep works:

import std.stdio, std.range;
void main() {
    auto data = [1, 2, 3, 4];
    foreach (x1, x2; pairwise(data))
        writeln(x1, " ", x2);
}
Comment 1 bearophile_hugs 2013-02-07 09:33:12 UTC
See also Issue 7128
Comment 2 hsteoh 2013-02-07 09:41:04 UTC
This is the same as the cartesianProduct of two ranges, which is already checked into git HEAD.

*** This issue has been marked as a duplicate of issue 7128 ***
Comment 3 bearophile_hugs 2013-02-07 10:03:25 UTC
(In reply to comment #2)
> This is the same as the cartesianProduct of two ranges, which is already
> checked into git HEAD.
> 
> *** This issue has been marked as a duplicate of issue 7128 ***

See this test code:

import std.stdio, std.algorithm;
void main() {
    auto data = [1, 2, 3, 4];
    foreach (xy; cartesianProduct(data, data))
        writeln(xy);
}

Tuple!(int, int)(1, 1)
Tuple!(int, int)(2, 1)
Tuple!(int, int)(3, 1)
Tuple!(int, int)(4, 1)
Tuple!(int, int)(1, 2)
Tuple!(int, int)(2, 2)
Tuple!(int, int)(3, 2)
Tuple!(int, int)(4, 2)
Tuple!(int, int)(1, 3)
Tuple!(int, int)(2, 3)
Tuple!(int, int)(3, 3)
Tuple!(int, int)(4, 3)
Tuple!(int, int)(1, 4)
Tuple!(int, int)(2, 4)
Tuple!(int, int)(3, 4)
Tuple!(int, int)(4, 4)



import std.stdio, std.range;
void main() {
    auto data = [1, 2, 3, 4];
    foreach (tup; pairwise(data))
        writeln(tup);
}


Should print:

Tuple!(int,int)(1, 2)
Tuple!(int,int)(1, 3)
Tuple!(int,int)(1, 4)
Tuple!(int,int)(2, 3)
Tuple!(int,int)(2, 4)
Tuple!(int,int)(3, 4)
Comment 4 bearophile_hugs 2013-03-12 12:04:54 UTC
Basic version for a random access range:


import std.range: ForeachType, isRandomAccessRange, hasLength;
import std.traits: Unqual, isNarrowString;
import std.typecons: Tuple;

struct Pairwise(Range) {
    alias R = Unqual!Range;
    alias Pair = Tuple!(ForeachType!R, "a", ForeachType!R, "b");
    R _input;
    size_t i, j;

    this(Range r_) {
        this._input = r_;
        j = 1;
    }

    @property bool empty() {
        return j >= _input.length;
    }

    @property Pair front() {
        return Pair(_input[i], _input[j]);
    }

    void popFront() {
        if (j >= _input.length - 1) {
            i++;
            j = i + 1;
        } else {
            j++;
        }
    }
}

Pairwise!Range pairwise(Range)(Range r)
if (isRandomAccessRange!Range && hasLength!Range && !isNarrowString!Range) {
    return typeof(return)(r);
}

import std.stdio: writeln;

void main() {
    (new int[0]).pairwise.writeln;
    [10].pairwise.writeln;
    [10, 20].pairwise.writeln;
    [10, 20, 30].pairwise.writeln;
    [10, 20, 30, 40].pairwise.writeln;
}



Once combinations() is implemented that code can be replaced with something like:

auto pairwise(Range)(Range r) if (...) {
    alias R = Unqual!Range;
    alias Pair = Tuple!(ForeachType!R, "a", ForeachType!R, "b");    
    return combinations(r, 2).map(t => Pair(t.tupleof));
}
Comment 5 hsteoh 2014-12-05 21:14:19 UTC
Hmm. Sounds like what you want is n-element subsets, rather than tuples specifically. I'll see if I can find any good algorithms for n-subset enumeration...
Comment 6 bearophile_hugs 2014-12-05 23:39:03 UTC
(In reply to hsteoh from comment #5)
> Hmm. Sounds like what you want is n-element subsets, rather than tuples
> specifically. I'll see if I can find any good algorithms for n-subset
> enumeration...

Isn't the code in comment #4 good enough?

(A pairwise range needs to be very efficient, so people can use it instead of two nested loops in more cases).
Comment 7 hsteoh 2014-12-06 18:34:26 UTC
If order doesn't matter, then a series of nested loops should do what you want. However, we can't actually use nested loops for a generic algorithm, since (1) the nesting level depends on how many elements you want in the subset, and (2) we have to lazy-fy the loops to conform to the range API anyway. So I've to think of some equivalent way of enumerating these subsets...

I'm thinking that a forward range should be sufficient for such an enumeration. So if we have a stack of .save'd ranges, that should do the trick. First, to generate k-element subsets, we prime the stack with .save'd copies of the range up to the k'th element, then .front simply walks the stack and retrieves .front from each item on the stack. In popFront(), we simply advance the range at the top of the stack. If it's empty, we pop it off, and advance the previous range on the stack, then fill up the stack again with the subsequent element. Each time, if advancing a range makes it empty, we pop it off and advance the previous range on the stack instead, then fill up the stack again. Yes, I think this will work... PR time! :-)
Comment 8 bearophile_hugs 2014-12-06 19:00:52 UTC
(In reply to hsteoh from comment #7)

> If order doesn't matter,

The order matters, this:

foreach (tup; iota(1, 5).pairwise)
    tup.writeln;

Should print only in this order:

Tuple!(int,int)(1, 2)
Tuple!(int,int)(1, 3)
Tuple!(int,int)(1, 4)
Tuple!(int,int)(2, 3)
Tuple!(int,int)(2, 4)
Tuple!(int,int)(3, 4)


> However, we can't actually use nested loops for a generic algorithm,
> since (1) the nesting level depends on how many elements you want in the
> subset, and (2) we have to lazy-fy the loops to conform to the range API
> anyway.

The function name "pairwise" refers to "pairs", because 95% of the times I only need two indexes. And code like "[1, 2, 3, 4].pairwise" doesn't contain a "2" to specify two indexes.

So if you want to reduce your implementation work, you can just use the version that yields pairs, like in #Comment 4.

But if you want to also implement the generation of 3-tuples or more, then I guess you can use a syntax like:

iota(1, 5).pairwise!3

But now the name "pairwise" sounds like a falsity.


> I'm thinking that a forward range should be sufficient for such an
> enumeration.

We can make the range more powerful later, if necessary.


> So if we have a stack of .save'd ranges, that should do the
> trick. First, to generate k-element subsets, we prime the stack with .save'd
> copies of the range up to the k'th element, then .front simply walks the
> stack and retrieves .front from each item on the stack. In popFront(), we
> simply advance the range at the top of the stack. If it's empty, we pop it
> off, and advance the previous range on the stack, then fill up the stack
> again with the subsequent element. Each time, if advancing a range makes it
> empty, we pop it off and advance the previous range on the stack instead,
> then fill up the stack again. Yes, I think this will work... PR time! :-)

I suggest to add a compile-time optimization for the most general case (n=2) to make it fast, because it's the most common and it should be very fast.
Comment 9 bearophile_hugs 2014-12-06 20:18:57 UTC
This shows a simple use case of pairwise, to compute the pairs of closest points of an array 2D points.

The function closestPair1 uses just a pair of loops.

closestPair2 uses cartesianProduct of all index pairs plus a filter to avoid comparing already seen pairs (because a distance is comutative), the code is slow, it looks bad and it's noisy.

closestPair3 uses pairwise, and it's much nicer and simpler to write.

closestPair4 is similar, but it uses the optional "key" function argument (as in the Python min/max functions) to further simplify the code. I'd really like min/max of Phobos to be extended to support such optional key function.

Currently some /*in*/ are commented out to avoid troubles with immutable seeds of reduce.



import std.algorithm, std.traits, std.typecons, std.range;

struct Pairwise(Range) {
    alias R = Unqual!Range;
    alias Pair = Tuple!(ForeachType!R, "a", ForeachType!R, "b");
    R _input;
    size_t i, j;

    this(Range r_) {
        this._input = r_;
        j = 1;
    }

    @property bool empty() {
        return j >= _input.length;
    }

    @property Pair front() {
        return Pair(_input[i], _input[j]);
    }

    void popFront() {
        if (j >= _input.length - 1) {
            i++;
            j = i + 1;
        } else {
            j++;
        }
    }
}

Pairwise!Range pairwise(Range)(Range r)
if (isRandomAccessRange!Range && hasLength!Range && !isNarrowString!Range) {
    return typeof(return)(r);
}

//-----------------------

struct V2 { double x, y; }

double distance(in V2 p1, in V2 p2) pure nothrow @nogc @safe {
    return (p1.x - p2.x) ^^ 2 + (p1.y - p2.y) ^^ 2;
}

auto closestPair1(in V2[] points) pure nothrow @nogc @safe {
    auto minD = Unqual!(typeof(V2.x)).infinity;
    V2 minP1, minP2;
    foreach (immutable i, const p1; points.dropBackOne)
        foreach (const p2; points[i + 1 .. $]) {
            immutable d = distance(p1, p2);
            if (d < minD) {
                minD = d;
                minP1 = p1;
                minP2 = p2;
            }
        }
    return tuple(minP1, minP2);
}

auto closestPair2(/*in*/ V2[] points) pure nothrow @nogc @safe {
    auto pairs = cartesianProduct(points.length.iota, points.length.iota)
                 .filter!(ij => ij[1] > ij[0]);
    immutable ij = reduce!((ij1, ij2) =>
            distance(points[ij1[0]], points[ij1[1]]) <
            distance(points[ij2[0]], points[ij2[1]]) ? ij1 : ij2)
        (pairs.front, pairs.dropOne);
    return tuple(points[ij[0]], points[ij[1]]);
}

auto closestPair3(/*in*/ V2[] points) pure @safe {
    return points
           .pairwise
           .reduce!((t1, t2) => t1[].distance < t2[].distance ? t1 : t2);
}


template min(alias F) {
    T min(T)(T x, T y) {
        return F(x) < F(y) ? x : y;
    }
}

auto closestPair4(/*in*/ V2[] points) pure @safe {
    return points.pairwise.reduce!(min!(t => t[].distance));
}

void main() {
    import std.stdio, std.random;

    auto points = new V2[10];
    foreach (ref p; points)
        p = V2(uniform01, uniform01);

    points.writeln;
    points.closestPair1.writeln;
    points.closestPair2.writeln;
    points.closestPair3.writeln;
    points.closestPair4.writeln;
}
Comment 10 bearophile_hugs 2015-01-05 12:23:37 UTC
Elsewhere I have suggested to add to cartesianProduct an optional template argument that specifies the "repetitions", just like the Python version of product:

https://docs.python.org/2/library/itertools.html#itertools.product

Python code:

product('ABCD', 'xy') --> Ax Ay Bx By Cx Cy Dx Dy

product(range(2), repeat=3) --> 000 001 010 011 100 101 110 111


Equivalently for the D code (the "repeat" argument "3" needs to be a template argument because cartesianProduct yields tuples, that must have their structure known at compile-time):

cartesianProduct!3([0, 1]) => 000 001 010 011 100 101 110 111

Now I've seen the same optional repeat template argument is useful for pairwise too, because if you are inside a UFCS chain and you want to perform a pairwise on the sequence items, you don't need to break the chain:


auto seq = 10.iota
           .map!(...)
           .filter!(...);

auto result = pairwise(seq, seq)
              .filter!(...);


Becomes a single nicer chain:

auto result = 10.iota
              .map!(...)
              .filter!(...)
              .pairwise!2
              .filter!(...);


Such use case "pairwise(seq,seq)" with repeated argument is probably common.
Comment 11 bearophile_hugs 2015-01-05 13:03:20 UTC
(In reply to bearophile_hugs from comment #10)

>               .pairwise!2

pairwise takes only one argument sequence, so that comment was bad. The template argument is used to denote how many items you want in your "pairs" (where 2 is the default).

So:

[1, 2, 3, 4].pairwise
Or:
[1, 2, 3, 4].pairwise!2

should yield:

(1, 2)
(1, 3)
(1, 4)
(2, 3)
(2, 4)
(3, 4)


[1, 2, 3, 4].pairwise!3

should yield:

(1, 2, 3)
(1, 2, 4)
(1, 3, 4)
(2, 3, 4)


Like in this lower level code:

void main() {
    import std.stdio, std.range, std.algorithm;
    auto a = [1, 2, 3, 4];

    foreach (immutable i; 0 .. a.length)
        foreach (immutable j; i + 1 .. a.length)
            writefln("(%d, %d)", a[i], a[j]);

    writeln;

    foreach (immutable i; 0 .. a.length)
        foreach (immutable j; i + 1 .. a.length)
            foreach (immutable k; j + 1 .. a.length)
                writefln("(%d, %d, %d)", a[i], a[j], a[k]);
}
Comment 12 Seb 2016-10-15 11:50:10 UTC
@andralex: I think there is also a decision-blocked PR pending in the Phobos queue:

https://github.com/dlang/phobos/pull/4027
Comment 13 Andrei Alexandrescu 2017-11-07 17:11:36 UTC
I see in https://docs.python.org/2/library/itertools.html that the recipe given for pairwise would produce the pairs (1, 2), (2, 3), and (3, 4) starting from [1, 2, 3, 4]. So defining it with different semantics is potentially confusing.

The functionality proposed is really "combinations of 2 elements in the given range". In general we'd be looking at combinations of k elements of the given range, yielding a range of k-tuples. See https://en.wikipedia.org/wiki/Combination

Fittingly enough, Python offers https://docs.python.org/2/library/itertools.html#itertools.combinations and https://docs.python.org/2/library/itertools.html#itertools.combinations_with_replacement

I think ws should define a function as follows:

auto combinations(uint k, Flag!"Duplicates" f = Duplicates.no, R)(R range);

The function returns a range of k-tuples containing combinations of elements in the range. Depending on the flag, duplicates are present or not.
Comment 14 Seb 2018-02-09 11:54:47 UTC
> I see in https://docs.python.org/2/library/itertools.html that the recipe given for pairwise would produce the pairs (1, 2), (2, 3), and (3, 4) starting from [1, 2, 3, 4]. So defining it with different semantics is potentially confusing.


Yes, especially now that we have pairwise as slides.

> I think ws should define a function as follows:
> auto combinations(uint k, Flag!"Duplicates" f = Duplicates.no, R)(R range);

> The function returns a range of k-tuples containing combinations of elements in the range. Depending on the flag, duplicates are present or not.

+1 (renamed accordingly)


@ students - see also: http://docs.mir.dlang.io/latest/mir_combinatorics.html (this is my module, it can be ported)
Comment 15 Seb 2018-02-09 12:07:02 UTC
*** Issue 13039 has been marked as a duplicate of this issue. ***
Comment 16 tyckesak 2022-04-03 19:01:52 UTC
Obviated by the introduction of `std.algorithm.setops.cartesianProduct`
into the standard library.

*** This issue has been marked as a duplicate of issue 7128 ***