D issues are now tracked on GitHub. This Bugzilla instance remains as a read-only archive.
Issue 5674 - AssertError in std.regex
Summary: AssertError in std.regex
Status: RESOLVED WONTFIX
Alias: None
Product: D
Classification: Unclassified
Component: phobos (show other issues)
Version: D2
Hardware: Other Mac OS X
: P2 normal
Assignee: No Owner
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-03-01 08:14 UTC by Jacob Carlborg
Modified: 2012-02-24 12:04 UTC (History)
3 users (show)

See Also:


Attachments
This patch fixes the problems with unmatched groups in a match. (3.56 KB, patch)
2011-04-06 12:50 UTC, Matt Peterson
Details | Diff

Note You need to log in before you can comment on or make changes to this issue.
Description Jacob Carlborg 2011-03-01 08:14:43 UTC
The following code results in an AssertError or RangeError (don't know if the RangeError is expected behavior) :

import std.regex;
import std.stdio;

void main ()
{
    auto m = "abc".match(`a(\w)b`);

    writeln(m.hit); // AssertError in regex.d:1795
    writeln(m.captures); // RangeError in regex.d:1719
}

Can't "hit" just return an empty string and "captures" an empty range?
Comment 1 Magnus Lie Hetland 2011-03-31 07:09:07 UTC
I have similar problems with stuff like this:

import std.stdio, std.regex;
void main() {
    foreach (m; match("abc", "a|(x)")) {
        foreach (e; m.captures) {
            writeln(e);
        }
    }
}

Here it prints out "a" and then I get a range violation. Whether or not m.captures[1] exists, iterating over m.captures should be possible?

Also: Checking whether m.captures[1] exists would be highly useful -- to see what has matched. (Doing this by length wouldn't work in general, of course.)
Comment 2 Matt Peterson 2011-04-06 11:41:18 UTC
After some debugging, it looks like Captures is looking for the first unmatched group and stopping there when giving the length of the captures, which I believe is the cause of the assert error.

The second problem is that when a group is unmatched the startIdx and endIdx are stored as size_t.max, and when Captures.front/opIndex as well as RegexMatch.hit try to slice the input with those numbers causes a range violation. Most regex engines handle this by returning null if a group is unmatched.

I'll try to submit a patch soon if I get it working.
Comment 3 Matt Peterson 2011-04-06 12:50:32 UTC
Created attachment 939 [details]
This patch fixes the problems with unmatched groups in a match.
Comment 4 Dmitry Olshansky 2011-04-20 04:04:53 UTC
(In reply to comment #3)
> Created an attachment (id=939) [details]
> This patch fixes the problems with unmatched groups in a match.

Acctually I'm working on fixing all of the issues of std.regex, see this pull request https://github.com/D-Programming-Language/phobos/pull/22

There is a litle problem with your patch.
If the match is empty (there are such regexes) or there is not match
RegexMatch.hit still happily returns "", maybe it's better to let it hit assert on no match just like it was to enforce checking of empty.
Comment 5 Dmitry Olshansky 2012-02-24 12:04:19 UTC
Things got mixed here a bit, but initial issue is a clean won't fix as it works as designed. One should test RegexMatch for empty just like any other range.

The second issue here was fixed with pull 22 for the previous version of std.regex, and never existed in a new one.