D issues are now tracked on GitHub. This Bugzilla instance remains as a read-only archive.
Issue 8084 - std.stdio.ByLine is not true input range
Summary: std.stdio.ByLine is not true input range
Status: RESOLVED WONTFIX
Alias: None
Product: D
Classification: Unclassified
Component: phobos (show other issues)
Version: D2
Hardware: All All
: P2 normal
Assignee: No Owner
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-05-11 10:55 UTC by Kenji Hara
Modified: 2012-05-12 19:50 UTC (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description Kenji Hara 2012-05-11 10:55:07 UTC
version(A) and version(B) should output same result, but A is completely broken.

import std.array;
import std.stdio;
void main()
{
    string fname = __FILE__;
    version(A)
    {
        auto lines = File(fname).byLine().array();
        foreach (ln; lines)
            writeln(ln);
    }
    version(B)
    {
        foreach (ln; File(fname).byLine())
            writeln(ln);
    }
}
Comment 1 Steven Schveighoffer 2012-05-11 12:10:30 UTC
In order for byLine to be efficient, it must reuse the buffer it's reading.  In order for the above code to be equivlent, it must dup every line, which is hugely inefficient if you aren't going to use them beyond the scope of the foreach statement.
Comment 2 timon.gehr 2012-05-12 07:11:54 UTC
This is true, but byLine is not a true InputRange, therefore it is questionable whether or not it should expose an InputRange interface. int opApply(scope int delegate(scope string ln)) would be more descriptive.
Comment 3 Brad Roberts 2012-05-12 09:49:25 UTC
What part of the definition of an InputRange does it violate?
Comment 4 Kenji Hara 2012-05-12 09:55:49 UTC
(In reply to comment #1)
> In order for byLine to be efficient, it must reuse the buffer it's reading.  In
> order for the above code to be equivlent, it must dup every line, which is
> hugely inefficient if you aren't going to use them beyond the scope of the
> foreach statement.

OK. ByLine.front returns the slice of internal line buffer, so it specifies 'temporary' data. We should get dup in each iteration.
Comment 5 Jonathan M Davis 2012-05-12 11:38:24 UTC
> OK. ByLine.front returns the slice of internal line buffer, so it specifies
'temporary' data.

So? It's returning a reference type, so it could change on you. It _is_ potentially problematic in some cases, but I don't see how that makes it so that it isn't an input range

> We should get dup in each iteration.

It's specifically _avoiding_ that for efficiency purposes.

Now, maybe having ByLine's front return the same array with changes values rather than a new array makes it so that it doesn't work with a lot of range-based functions, effectively making it useless as a range, even if it _does_ follow the API. If that's the case, maybe we should make it use opApply for the loop case (having it use the current non-duping behavior) and have front do a dup (and if front as pure, that result could even be implicitly converted to an immutable result, avoiding having to dup it twice to get an immutable copy).
Comment 6 Steven Schveighoffer 2012-05-12 19:38:27 UTC
ByLine is a true input range, it has empty, popFront and front.

If it was not a true input range, foreach would not work on it.

I also think isInputRange would pass.

The major issue I see with ByLine is that its default type of front is char[].  It really should be const(char)[].  And in fact, I don't think char[] should be possible.

Perhaps, this bug can be redone to address that.

(In reply to comment #4)
> OK. ByLine.front returns the slice of internal line buffer, so it specifies
> 'temporary' data. We should get dup in each iteration.

If you mean ByLine should dup for each iteration, that is extremely wasteful.  What if you only need the data during the loop and not afterward?
Comment 7 Kenji Hara 2012-05-12 19:50:30 UTC
(In reply to comment #5)
(In reply to comment #6)
> (In reply to comment #4)
> > OK. ByLine.front returns the slice of internal line buffer, so it specifies
> > 'temporary' data. We should get dup in each iteration.

I mean we should dup the front of ByLine in *user code* if it is used after whole iteration, not ByLine should return dup-ed front.

And I can agree that ByLine is *true* input range because it has empty, front, and popFront. Therefore, the summary was wrong.

Now I think it is correct that this issue was marked as 'resolved wontfix'.