Issue 9842 - std.algorithm.hashGroup
Summary: std.algorithm.hashGroup
Status: NEW
Alias: None
Product: D
Classification: Unclassified
Component: phobos (show other issues)
Version: D2
Hardware: All All
: P4 enhancement
Assignee: No Owner
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-03-30 14:32 UTC by bearophile_hugs
Modified: 2024-12-01 16:17 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description bearophile_hugs 2013-03-30 14:32:38 UTC
A little Strems/LINQ challenge, "Grouping by a Criterium":

Group the elements of a collection of strings by their length.

A C# LINQ solution:

string[] names = {"Sam", "Samuel", "Samu", "Ravi", "Ratna",  "Barsha"};
var groups = names.GroupBy(c => c.Length);


Using the groupBy written by Andrei (https://github.com/D-Programming-Language/phobos/pull/1186 ) a D solution is:

    auto names = ["Sam", "Samuel", "Samu", "Ravi", "Ratna", "Barsha"];
    auto groups = names
                  .schwartzSort!q{ a.length }
                  .groupBy!q{ a.length == b.length };


But group/groupBy work by sorting. Hash-based O(n) hashGroup/hashGroupBy are also conceivable, potentially faster, and leading to simpler code, because you don't need to sort the items first:

    auto names = ["Sam", "Samuel", "Samu", "Ravi", "Ratna", "Barsha"];
    auto groups = names.hashGroupBy!q{ a.length == b.length };


(Alternative names for hashGroup/hashGroupBy are possible.)
Comment 1 hsteoh 2014-11-05 05:49:57 UTC
Isn't this just the same as data.hashSort.group? For hash sort to work, the key must map to an integral value and must be bounded by known min/max values.
Comment 2 bearophile_hugs 2014-11-05 10:01:42 UTC
(In reply to hsteoh from comment #1)
> Isn't this just the same as data.hashSort.group? For hash sort to work, the
> key must map to an integral value and must be bounded by known min/max
> values.

The output and working is different. The hashGroupBy doesn't return a range of pairs as the function "group".

If you write:

auto names = ["Sam", "Samuel", "Samu", "Ravi", "Ratna", "Barsha"];
auto grps = names.hashGroupBy!(n => n.length);

Now grps is:

assert(grps == [3:["Sam"], 4:["Samu", "Ravi"], 5:["Ratna"], 6:["Samuel", "Barsha"]]);

The keys can be anything, not just integers in an interval.
Comment 3 bearophile_hugs 2014-12-17 09:54:27 UTC
Another example usage. Here the "properDivs" function returns the proper divisors of the given number n. The function "classify" returns one of the three classes (Deficient number, Perfect number, Abundant number). Then a hashGroup is handy to create an associative array to count how many numbers there are of each class in the [1,10000] interval:


void main() {
    import std.stdio, std.algorithm, std.range;
 
    static immutable properDivs = (in uint n) pure nothrow @safe =>
        iota(1, (n + 1) / 2 + 1).filter!(x => n % x == 0 && n != x);
 
    enum Class { deficient, perfect, abundant }
 
    static Class classify(in uint n) pure nothrow @safe {
        immutable p = properDivs(n).sum;
        with (Class)
            return (p < n) ? deficient : ((p == n) ? perfect : abundant);
    }
 
    iota(1, 10000).map!classify.hashGroup.writeln;
}
Comment 4 bearophile_hugs 2015-01-06 16:13:14 UTC
Now I think a "hashGroup" suffices (hashGroup can also be used without template argument, it uses an x=>x identity function on default).

In Phobos we have std.array.assocArray that allows to build a built-in associative array from a range of key-value Phobos Tuples (it will become mostly legacy when we will have a good associative array in Phobos and built-in tuples in D, that is the opposite of the data structures we have today).

But there is another very common pattern of creation of associative arrays, where you want to "accumulate" inside the values.

The basic form of accumulation is counting:

//A
const a = [1, 2, 1, 3, 5, 1, 2];
uint[int] aa1;
foreach (x; a)
    aa1[x]++;
aa1.writeln; // [1:3, 5:1, 2:2, 3:1]

Another example is just emumeration in an array:

//B
const b = [-1, 2, 1, 3, 5, -1, -2];
int[][int] aa2;
foreach (x; b)
    aa2[x.abs] ~= x;
aa2.writeln; // [1:[-1, 1, -1], 5:[5], 2:[2, -2], 3:[3]]

Or even in a set (here implemented with another associative array of int-bools):

//C
bool[int][int] aa3;
foreach (x; b)
    aa3[x.abs][x] = true;
aa3.writeln; // [1:[1:true, -1:true], 5:[5:true], 2:[2:true, -2:true], 3:[3:true]]


There are ways to perform those operations with Phobos today, but the code is so bad looking that it's better to skip this.


So I suggest to add a std.array.hashGroup function to Phobos that as argument takes a range, and as template argument takes one or two functions.

The first function is a mapping key unary function.

The second function is optional and takes as argument the sub-range of grouped values.

The patterns above become:

//A
a.hashGroup!(id, count).writeln;

//B
a.hashGroup!abs.writeln;

//C
a.hashGroup!(abs, items => items.zip(true.repeat).assocArray)).writeln;


Where "id" is the identity function (not present in Phobos):

//A
a.hashGroup!(x => x, count).writeln;

Once Phobos has a Set data structure with a constructor helper function that accepts a range as argument you can also write just:

//C
a.hashGroup!(abs, set).writeln;
Comment 5 dlangBugzillaToGithub 2024-12-01 16:17:12 UTC
THIS ISSUE HAS BEEN MOVED TO GITHUB

https://github.com/dlang/phobos/issues/9966

DO NOT COMMENT HERE ANYMORE, NOBODY WILL SEE IT, THIS ISSUE HAS BEEN MOVED TO GITHUB