D issues are now tracked on GitHub. This Bugzilla instance remains as a read-only archive.
Issue 1750 - RegExp: lack of support for wchar, dchar; lack of lookingAt() method
Summary: RegExp: lack of support for wchar, dchar; lack of lookingAt() method
Status: RESOLVED FIXED
Alias: None
Product: D
Classification: Unclassified
Component: phobos (show other issues)
Version: D2
Hardware: All All
: P2 enhancement
Assignee: Dmitry Olshansky
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-12-26 10:26 UTC by Marcin Kuszczak
Modified: 2015-06-09 01:14 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description Marcin Kuszczak 2007-12-26 10:26:08 UTC
1. RegExp should work for at least wchar & dchar. Maybe also for integral array types (e.g. int[]).

2. There is no bool lookingAt() method which tries to match string at its beginning and if it doesn't match return. For reference: http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Matcher.html
Currently it is very ineffective to match pattern in incoming stream of data.
Solution with lookingAt() will be much faster.
Comment 1 Andrei Alexandrescu 2010-09-26 11:37:42 UTC
The new RegEx supports wchar and dchar. Regarding lookingAt(), I'm unclear: how is it different from searching for a pattern starting with the anchor "^"?
Comment 2 Marcin Kuszczak 2010-09-27 11:02:35 UTC
lookingAt() can be used on streams without a need for getting whole string from stream. Also ^ can not be used for matching some specific pattern in stream. You just can not assume that your input is starting after line end. Input can even not be splitted into lines.
Comment 3 Andrei Alexandrescu 2011-06-04 17:45:52 UTC
Reassigning to GSoC student Dmitry. Dmitry, please close when you think the issue has been addressed. Thanks!
Comment 4 Dmitry Olshansky 2012-03-12 01:45:41 UTC
Ok. Meant to do it for ages.
The second point rised in this bug report has no proof, and, in fact, is invalid.
Truth of the matter is that looking through all of Java's regex documentation I observe:
1. There is no such thing as regex on stream in Java, all objects it works on are  3 variants of character buffers i.e. wrapped arrays and it's ilk.
2. lookingAt is indeed equivalent to appending '^' to a regex pattern, and as far as performance concerns go both versions should use the same optimization, namely "no search" optimization. And at least current std.regex does optimize for '^' _somewhere_ at start e.g. sily things like "(^...)..." still get optimized.
3. Due to implementation details of Java-style regex there is no way it can to work directly on stream and keep all it's syntax features, even if tried to do so, the problem common to all backtracking engines. And yes, in some cases it has to walk the entire input to make sure it matched what it should match.

Marking as fixed as the first point of the report was solved long ago, the second isinvalid as is. It also rises a good point on however that was accounted for already.