D issues are now tracked on GitHub. This Bugzilla instance remains as a read-only archive.
Issue 6718 - "nWayUnion" => "nWayMerge", plus true nWayUnion
Summary: "nWayUnion" => "nWayMerge", plus true nWayUnion
Status: RESOLVED FIXED
Alias: None
Product: D
Classification: Unclassified
Component: phobos (show other issues)
Version: D2
Hardware: All All
: P2 normal
Assignee: No Owner
URL:
Keywords: bootcamp
Depends on:
Blocks:
 
Reported: 2011-09-22 14:44 UTC by bearophile_hugs
Modified: 2018-01-05 13:30 UTC (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description bearophile_hugs 2011-09-22 14:44:22 UTC
This is a bug report plus an optional enhancement request.

In std.algorithm there is a section of set functions that work on ordered iterables, like nWayUnion:
http://www.d-programming-language.org/phobos/std_algorithm.html#nWayUnion

This is an example of nWayUnion usage:


import std.algorithm;
void main() {
    auto data = [[1, 2, 3], [1, 3, 5]];
    auto expected = [1, 1, 2, 3, 3, 5];
    assert(equal(nWayUnion(data), expected));
}


But this is not a set operation, and the result is not a set. A set never contains repeated items (a certain definition of "duplication" is possible only if you define your own equality relation).

So in my opinion a nWayUnion function has to return:

auto data = [[1, 2, 3], [1, 3, 5]];
auto expected = [1, 2, 3, 5];
assert(equal(nWayUnion(data), expected));


While the current nWayUnion function has to be renamed "nWayMerge".

auto data = [[1, 2, 3], [1, 3, 5]];
auto expected = [1, 1, 2, 3, 3, 5];
assert(equal(nWayMerge(data), expected));

In my opinion both nWayUnion and nWayMerge are very useful operations useful in various situations, but they are different things. In my opinion confusing the two leads to bugs in programs that use arrays to represent sets (I have just hit a but caused by this).

nWayUnion is probably uniq(nWayMerge). If you don't want to add the "nWayUnion" function, then I suggest to just rename "nWayUnion" => "nWayMerge" (and put it outside the documentation section about set functions) to avoid some mistakes. Names are important, and classifications too.
Comment 1 Andrei Alexandrescu 2013-02-27 05:32:27 UTC
Here "union" has the meaning in relational algebra, not in set theory. I'll keep this opened until I add a mention to the documentation.
Comment 2 bearophile_hugs 2013-02-27 09:54:36 UTC
(In reply to comment #1)
> Here "union" has the meaning in relational algebra, not in set theory. I'll
> keep this opened until I add a mention to the documentation.

Thank you for the answer. So all those set functions turn out not being about _sets_. Adding a mention in the documentation is not enough.

So:

1) I suggest to rename "nWayUnion" as "nWayMerge", to avoid any confusion in D programmers like me. In the Merge Sort that algorithm is named "Merge", so calling it "merge" has a very long tradition in computer science (http://en.wikipedia.org/wiki/Merge_algorithm ).


2) In this page:
http://dlang.org/phobos/std_algorithm.html
The "Set operations" column is misnamed because those are NOT sets. In Computer Science and in Mathematics those are named bags or multisets (http://en.wikipedia.org/wiki/Multiset ). But those functions work with a comparison operator and work on ordered sequences, so I suggest to call that column "Multiset operations" or "Ordered bags operations".


3) setIntersection is equally badly named. And it should be renamed "bagsIntersection", or something similar.
Also the examples given in the site for setIntersection show only true sets, so this contribute to the confusion, it's even more easy to miss that it is actually a bags intersection:
    int[] a = [ 1, 2, 4, 5, 7, 9 ];
    int[] b = [ 0, 1, 2, 4, 7, 8 ];
    int[] c = [ 0, 1, 4, 5, 7, 8 ];
    assert(equal(setIntersection(a, a), a));
    assert(equal(setIntersection(a, b), [1, 2, 4, 7][]));
    assert(equal(setIntersection(a, b, c), [1, 4, 7][]));

Those examples need to show clearly some repetitions, like this:
    int[] a = [1, 1, 2, 4, 5, 5, 5, 5, 7, 9];
    int[] b = [0, 1, 2, 4, 7, 7, 8];
    int[] c = [0, 1, 1, 1, 1, 4, 4, 5, 7, 8];


Extra) Is it worth to add a true nWayUnion (in set theory sense) to std.algorithm? It's a useful operation. A nWayUnion is possibly computed as:
seqs.nWayMerge.uniq.array
So maybe it's not worth it, I don't know.
Comment 3 bearophile_hugs 2013-02-27 09:55:47 UTC
Raised importance to Major, because this naming cancer in std.algorithm is worse than I expected in 2011.
Comment 4 Andrei Alexandrescu 2013-02-27 10:02:00 UTC
No need to raise importance out of proportion. Thanks.
Comment 5 bearophile_hugs 2013-02-27 10:06:41 UTC
(In reply to comment #4)
> No need to raise importance out of proportion. Thanks.

OK, sorry.
Comment 6 Denis Shelomovskii 2013-09-30 07:00:41 UTC
Bearophile, what about to start NG thread about it? Bad names are always causes of user code bugs so this issue should be thoroughly considered.

Also restored "normal" importance as terminology isn't a minor issue.
Comment 7 Andrei Alexandrescu 2016-10-14 23:00:20 UTC
nWayUnion shall stay, but we can add new and better names and alias nWayUnion to one of them.
Comment 8 RazvanN 2017-07-13 10:39:34 UTC
So, how should we proceed here? Rename nWayUnion to nWayMerge and alias nWayUnion to it? If we do that we should probably deprecate nWayUnion.
Comment 9 Andrei Alexandrescu 2017-07-16 19:21:20 UTC
(In reply to RazvanN from comment #8)
> So, how should we proceed here? Rename nWayUnion to nWayMerge and alias
> nWayUnion to it? If we do that we should probably deprecate nWayUnion.

@RazvanN, let's do this:

* rename nWayUnion to multiwayMerge

* rename struct NWayUnion to MultiwayMerge

* leave nWayUnion as an undocumented alias of multiwayMerge. Mention in the documentation that "...for backward compatibility, `multiwayMerge` is also available under the name `nWayUnion`, which will be deprecated. New code should use `multiwayMerge`." Coordinate with Sebastian and Vladimir about the exact text and layout.

That would be one PR. The second PR:

* add function multiwayUnion that actually performs the union (no repeats). A simple implementation is to pipe the result of multiwayMerge through uniq.

* add function multiwayIntersection that performs the intersection of multiple sets.

These pull requests would close this issue.
Comment 10 RazvanN 2017-07-17 09:24:49 UTC
(In reply to Andrei Alexandrescu from comment #9)
> (In reply to RazvanN from comment #8)
> > So, how should we proceed here? Rename nWayUnion to nWayMerge and alias
> > nWayUnion to it? If we do that we should probably deprecate nWayUnion.
> 
> @RazvanN, let's do this:
> 
> * rename nWayUnion to multiwayMerge
> 
> * rename struct NWayUnion to MultiwayMerge
> 
> * leave nWayUnion as an undocumented alias of multiwayMerge. Mention in the
> documentation that "...for backward compatibility, `multiwayMerge` is also
> available under the name `nWayUnion`, which will be deprecated. New code
> should use `multiwayMerge`." Coordinate with Sebastian and Vladimir about
> the exact text and layout.
> 
> That would be one PR. The second PR:
> 
> * add function multiwayUnion that actually performs the union (no repeats).
> A simple implementation is to pipe the result of multiwayMerge through uniq.
> 
> * add function multiwayIntersection that performs the intersection of
> multiple sets.
> 
> These pull requests would close this issue.

We already have setIntersection which, as bearophile pointed out, computes the bags intersection. Maybe we should also rename setIntersection to multiwayIntersection and use setIntersection for the newly added function. I know that this may break code but setIntersection is very misleading
Comment 11 Andrei Alexandrescu 2017-07-17 11:56:30 UTC
>Maybe we should also rename setIntersection to multiwayIntersection

setIntersection has two inputs. "multiwayIntersection" would have multiple inputs (a range of ranges) and would be a distinct function with important uses and its own theory and practice (http://www.cs.toronto.edu/~tl/papers/wea06.pdf is a good overview).
Comment 12 RazvanN 2017-07-17 12:10:26 UTC
(In reply to Andrei Alexandrescu from comment #11)
> >Maybe we should also rename setIntersection to multiwayIntersection
> 
> setIntersection has two inputs. "multiwayIntersection" would have multiple
> inputs (a range of ranges) and would be a distinct function with important
> uses and its own theory and practice
> (http://www.cs.toronto.edu/~tl/papers/wea06.pdf is a good overview).

If I understand correctly, setIntersection is going to do a bags intersection for just 2 ranges, while multiwayIntersection is going to be a generalization of set intersection (i.e. it will still do bags intersection)
Comment 13 RazvanN 2017-07-17 12:26:32 UTC
(In reply to Andrei Alexandrescu from comment #11)
> >Maybe we should also rename setIntersection to multiwayIntersection
> 
> setIntersection has two inputs. "multiwayIntersection" would have multiple
> inputs (a range of ranges) and would be a distinct function with important
> uses and its own theory and practice
> (http://www.cs.toronto.edu/~tl/papers/wea06.pdf is a good overview).

Actually, looking at the implementation, it seems that setIntersection also has a variable number of inputs : https://dlang.org/phobos/std_algorithm_setops.html#setIntersection
Comment 14 Andrei Alexandrescu 2017-07-17 12:42:45 UTC
(In reply to RazvanN from comment #13)
> (In reply to Andrei Alexandrescu from comment #11)
> > >Maybe we should also rename setIntersection to multiwayIntersection
> > 
> > setIntersection has two inputs. "multiwayIntersection" would have multiple
> > inputs (a range of ranges) and would be a distinct function with important
> > uses and its own theory and practice
> > (http://www.cs.toronto.edu/~tl/papers/wea06.pdf is a good overview).
> 
> Actually, looking at the implementation, it seems that setIntersection also
> has a variable number of inputs :
> https://dlang.org/phobos/std_algorithm_setops.html#setIntersection

It has a number of inputs known during compilation. True "multi" set intersection takes a range of ranges, i.e. an arbitrary number of sets.
Comment 15 github-bugzilla 2017-07-17 19:08:29 UTC
Commits pushed to master at https://github.com/dlang/phobos

https://github.com/dlang/phobos/commit/9efa504bdca2b2644375e86d3b1527170512b726
Fix Issue 6718 - nWayUnion => nWayMerge, plus true nWayUnion

https://github.com/dlang/phobos/commit/bdae5f08f3cf4ed153063ad1d9a07fbb5aa12668
Merge pull request #5620 from RazvanN7/Issue_6718

[WIP] Fix Issue 6718 - nWayUnion => nWayMerge, plus true nWayUnion
merged-on-behalf-of: Andrei Alexandrescu <andralex@users.noreply.github.com>
Comment 16 github-bugzilla 2017-08-16 13:24:12 UTC
Commits pushed to stable at https://github.com/dlang/phobos

https://github.com/dlang/phobos/commit/9efa504bdca2b2644375e86d3b1527170512b726
Fix Issue 6718 - nWayUnion => nWayMerge, plus true nWayUnion

https://github.com/dlang/phobos/commit/bdae5f08f3cf4ed153063ad1d9a07fbb5aa12668
Merge pull request #5620 from RazvanN7/Issue_6718
Comment 17 github-bugzilla 2018-01-05 13:30:30 UTC
Commits pushed to dmd-cxx at https://github.com/dlang/phobos

https://github.com/dlang/phobos/commit/9efa504bdca2b2644375e86d3b1527170512b726
Fix Issue 6718 - nWayUnion => nWayMerge, plus true nWayUnion

https://github.com/dlang/phobos/commit/bdae5f08f3cf4ed153063ad1d9a07fbb5aa12668
Merge pull request #5620 from RazvanN7/Issue_6718