D issues are now tracked on GitHub. This Bugzilla instance remains as a read-only archive.
Issue 5968 - std.algorithm.group by key function + groupFull
Summary: std.algorithm.group by key function + groupFull
Status: RESOLVED FIXED
Alias: None
Product: D
Classification: Unclassified
Component: phobos (show other issues)
Version: D2
Hardware: All All
: P2 enhancement
Assignee: No Owner
URL:
Keywords: bootcamp
: 11097 (view as issue list)
Depends on:
Blocks:
 
Reported: 2011-05-09 03:14 UTC by bearophile_hugs
Modified: 2017-05-07 10:18 UTC (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description bearophile_hugs 2011-05-09 03:14:50 UTC
Andrej Mitrovic has asked to split the following array in three arrays/ranges, according to the splitting predicate x<32:

[64, 64, 64, 32, 31, 16, 32, 33, 64]


A solution using group():


import std.stdio, std.algorithm;

void main() {
    auto arr = [64, 64, 64, 32, 31, 16, 32, 33, 64];

    int last = 0;
    foreach (g; group!q{ (a < 32) == (b < 32) }(arr)) {
        writeln(arr[last .. last+g[1]]);
        last += g[1];
    }
}


Output:
[64, 64, 64, 32]
[31, 16]
[32, 33, 64]


Andrei has suggested the second item of the tuples that group() yields to be a lazy range instead just of counter (untested code). This is an improvement, and it makes group() closer to the Python itertools.groupby(). With this change the code becomes simpler:


import std.stdio, std.algorithm;

void main() {
    auto arr = [64, 64, 64, 32, 31, 16, 32, 33, 64];

    foreach (g; group!q{ (a < 32) == (b < 32) }(arr))
        writeln(g[1]); // g[1] is lazy  
}


In Python groupby uses a key mapping function, like D schwartzSort(), that's more handy:

>>> from itertools import groupby
>>> arr = [64, 64, 64, 32, 31, 16, 32, 33, 64]
>>> [list(g) for h,g in groupby(arr, key = lambda x: x < 32)]
[[64, 64, 64, 32], [31, 16], [32, 33, 64]]


I suggest to change D group() to follow the Python groupby() design on this too. With this the D code improves further (untested):

import std.stdio, std.algorithm;

void main() {
    auto arr = [64, 64, 64, 32, 31, 16, 32, 33, 64];

    foreach (g; group!q{ a < 32 }(arr))
        writeln(g[1]);
}


Implementation note: unlike schwartzSort() there isn't a need to memorize the results of all the key mapping functions, this avoids slow memory allocations.


With tuple unpacking syntax sugar:


import std.stdio, std.algorithm;

void main() {
    auto arr = [64, 64, 64, 32, 31, 16, 32, 33, 64];

    foreach ((h, g); group!q{ a < 32 }(arr))
        writeln(g);
}
Comment 1 bearophile_hugs 2013-02-12 17:42:44 UTC
I suggest to introduce a new function std.algorithm.groupFull that yields tuples where the second field of the tuple is a lazy range of all the grouped items, as in Python groupby.

This Python2 program returns all the longest words that have ordered chars:


from itertools import groupby
o = (w for w in map(str.strip, open("words.txt")) if sorted(w)==list(w))
print list(next(groupby(sorted(o, key=len, reverse=True), key=len))[1])


A similar program using groupFull:


import std.stdio, std.algorithm, std.range, std.file, std.string;

void main() {
    "words.txt"
    .readText()
    .splitter()
    .filter!isSorted()
    .array()
    .sort!q{a.length > b.length}()
    .groupFull!q{a.length == b.length}()
    .front[1]
    .writeln();
}
Comment 2 Peter Alexander 2013-09-22 07:17:04 UTC
*** Issue 11097 has been marked as a duplicate of this issue. ***
Comment 3 Ulrich Küttler 2017-05-07 10:18:25 UTC
There is chunkBy now.