Issue 20184 - String maxsplit
Summary: String maxsplit
Status: NEW
Alias: None
Product: D
Classification: Unclassified
Component: phobos (show other issues)
Version: D2
Hardware: All All
: P4 enhancement
Assignee: No Owner
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-08-31 16:17 UTC by srpen6
Modified: 2024-12-01 16:35 UTC (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description srpen6 2019-08-31 16:17:43 UTC
D seems to have no way to limit the number of splits done on a string. This is
possible with Go:

    strings.SplitN("one two three", " ", 2)

also Nim:

   "one two three".split(maxsplit = 1)

also Python:

    'one two three'.split(maxsplit = 1)

also PHP:

    explode(' ', 'one two three', 2);

also Ruby:

    'one two three'.split(nil, 2)
Comment 1 Jon Degenhardt 2019-09-01 20:16:52 UTC
This can be achieved using 'splitter' and 'take' or another range iteration algorithm that limits the number of candidates selected.

e.g.

assert("a|bc|def".splitter('|').take(4).equal([ "a", "bc", "def" ]));
assert("a|bc|def".splitter('|').take(3).equal([ "a", "bc", "def" ]));
assert("a|bc|def".splitter('|').take(2).equal([ "a", "bc" ]));
assert("a|bc|def".splitter('|').take(1).equal([ "a" ]));

'splitter' (from std.algorithm) is a lazy version of 'split', which is eager. It produces an input range. 'take' (from std.range) takes the first N elements from an input range. 'take' is also lazy. To convert it to a fully realized array similar to the result of 'split' use 'array' (from std.array) or another range "eager" range algorithm. e.g.

auto x = "a|bc|def".splitter('|').take(2).array;
assert(x.length == 2);
assert (x[0] == "a");
assert (x[1] == "bc");
Comment 2 srpen6 2019-09-01 20:24:46 UTC
(In reply to Jon Degenhardt from comment #1)
> This can be achieved using 'splitter' and 'take' or another range iteration
> algorithm that limits the number of candidates selected.
> 
> e.g.
> 
> assert("a|bc|def".splitter('|').take(4).equal([ "a", "bc", "def" ]));
> assert("a|bc|def".splitter('|').take(3).equal([ "a", "bc", "def" ]));
> assert("a|bc|def".splitter('|').take(2).equal([ "a", "bc" ]));

It seems you have a profound misunderstand of what split limiting is. Here is a
result with Python:

    >>> 'one two three'.split(maxsplit = 1)
    ['one', 'two three']

as you can see, it doesnt discard any part of the original input, instead it
stops splitting after the specified amount, and puts the rest of the string as
the final element.
Comment 3 Jon Degenhardt 2019-09-01 21:43:11 UTC
(In reply to svnpenn from comment #2)
> (In reply to Jon Degenhardt from comment #1)
> Here is a result with Python:
> 
>     >>> 'one two three'.split(maxsplit = 1)
>     ['one', 'two three']
> 
> as you can see, it doesnt discard any part of the original input, instead it
> stops splitting after the specified amount, and puts the rest of the string
> as the final element.

Thanks for clarify what you are looking for. This is a useful refinement of the original description, which is:

> D seems to have no way to limit the number of splits done on a string.

D does have a way to limit the number of splits, but as you point out, this mechanism doesn't preserve the remainder of the string in the fashion available in a number of other libraries.
Comment 4 Alex 2019-09-02 06:56:02 UTC
As a workaround, this is possible: 

´´´
import std;

void main()
{
    "one two three four".fun1(1).writeln; 
    "one two three four".fun2(2).writeln; 
}

auto fun1(string s, size_t num)
{
    size_t summe; 
    auto r = s.splitter(' ').take(num).tee!(a => summe += a.length + 1).array;  
    return r ~ s[summe .. $];
}

auto fun2(string s, size_t num)
{
    auto i = s.splitter(' ').take(num);
    return i.array ~ s[i.map!(el => el.length).sum + num .. $];
}
´´´

If the splitter construct allowed public access to its underlying range, more convenient solutions were possible.
Comment 5 srpen6 2019-09-07 19:33:29 UTC
Here is a better workaround:

    import std.format, std.stdio;
    void main() {
       string s1 = "one two three", s2, s3;
       s1.formattedRead("%s %s", s2, s3);
       writeln(s2);
       writeln(s3);
    }
Comment 6 Berni 2019-09-19 18:55:04 UTC
I've had a look at this. I think it's not feasable to add an other parameter "maxsplit" to split. Internally split uses splitter and splitter works with BidirectionalRange. That means, for implementing back, splitter has to go through all elements from the front to find the correct breakpoint. That breaks lazyness, which in my eyes is not desirable.

Therefore I think it would be better to implement separate functions splitN and splitterN. splitterN would then be restricted to ForwardRange.
Comment 7 dlangBugzillaToGithub 2024-12-01 16:35:30 UTC
THIS ISSUE HAS BEEN MOVED TO GITHUB

https://github.com/dlang/phobos/issues/10385

DO NOT COMMENT HERE ANYMORE, NOBODY WILL SEE IT, THIS ISSUE HAS BEEN MOVED TO GITHUB