D issues are now tracked on GitHub. This Bugzilla instance remains as a read-only archive.
Issue 1772 - (D1 only) regexp.split behavior with captures needs to be documented
Summary: (D1 only) regexp.split behavior with captures needs to be documented
Status: RESOLVED WONTFIX
Alias: None
Product: D
Classification: Unclassified
Component: phobos (show other issues)
Version: D1 (retired)
Hardware: All All
: P2 normal
Assignee: No Owner
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-01-07 23:03 UTC by Bill Baxter
Modified: 2015-11-03 17:45 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description Bill Baxter 2008-01-07 23:03:31 UTC
I want to split columns out of a row of numbers.  They may be separated by comas or by just white space.  So I tried this:

The splitter regexp
    auto re_splitter = new RegExp(r"(\s+|\s*,\s*)");
    char[][] numbers = re_splitter.split(line);

if input is a line like:
410.90711,352.879

The output from that is the array
[410.90711,,,352.879]

After a bit of debugging, it turns out the problem is the grouping in the regexp.
Removing the parens fixes the problem in this case, but there are cases where you need parens for grouping and not for the capturing side effect.  So I think this is a bug.  Only match 0 should be considered significant for splitting, not the submatches.
Comment 1 Bill Baxter 2008-01-07 23:04:14 UTC
fixed summary
Comment 2 Bill Baxter 2008-01-08 14:12:41 UTC
It seems I've been duped by writefln's output.
Further investigation shows that this:
   [410.90711,,,352.879]
is actually this:
   ["410.90711",  ",",  "352.879"]
and not a 4-element list with two empty strings as I thought.

I discovered this because I checked what python does with captures, and it is this:
"""
If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list. 
"""
So that made me think maybe D could be trying to do something similar.

Apparently it is.  So please just document it.

Comment 3 Andrei Alexandrescu 2011-06-04 17:48:10 UTC
Reassigning to Dmitry.
Comment 4 Dmitry Olshansky 2012-03-12 03:28:16 UTC
https://github.com/D-Programming-Language/phobos/pull/491
Comment 5 github-bugzilla 2012-03-14 18:46:11 UTC
Commits pushed to master at https://github.com/D-Programming-Language/phobos

https://github.com/D-Programming-Language/phobos/commit/62b464b48d61b076c89f7585dc0ac7632f57ba49
fix Issue 1772 - regexp.split behavior with captures needs to be documented

A documentation clarification, the report itself is largely outdated.

https://github.com/D-Programming-Language/phobos/commit/6d782c6efd9ba6a7b7a314002a52d3455fa00d8c
Merge pull request #491 from blackwhale/issue-1772

fix Issue 1772 - regexp.split behavior with captures needs to be documen...
Comment 6 yebblies 2012-03-23 04:14:18 UTC
Is this fixed/D1 only now?
Comment 7 Dmitry Olshansky 2012-03-23 05:17:17 UTC
I'm no expert D1 stuff, but I belive issue is still applicable for D1.
Come to think of, I closed few D1 issues like this in the past, maybe we should close this one too (marked as D1 for now).
D1/D2 regexp is broken in many ways and nobody is doing any work on Phobos/D1 to fix it AFIAK, Tango folks have their own regex anyway.
Comment 8 yebblies 2012-03-23 07:06:04 UTC
I guess it can be closed when D1 is discontinued at the end of the year.
Comment 9 Andrei Alexandrescu 2015-11-03 17:45:49 UTC
It's unlikely this D1 issue will get worked on, if anyone plans to work on it feel free to reopen.