D issues are now tracked on GitHub. This Bugzilla instance remains as a read-only archive.
Issue 13829 - std.uni.byCodePoint for strings has length
Summary: std.uni.byCodePoint for strings has length
Status: RESOLVED FIXED
Alias: None
Product: D
Classification: Unclassified
Component: phobos (show other issues)
Version: D2
Hardware: All All
: P1 normal
Assignee: Dmitry Olshansky
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-12-07 13:42 UTC by Marc Schütz
Modified: 2017-10-16 09:57 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description Marc Schütz 2014-12-07 13:42:03 UTC
import std.uni;
    static assert(__traits(compiles, "é".byCodePoint.length));
    pragma(msg, typeof("é".byCodePoint)); // => string

The problem is that `byCodePoint(w?string.init)` returns its argument (string/wstring) which of course defines `length`, instead of a wrapper that doesn't.

The reason is once again auto-decoding. In std/uni.d(6644):

    Range byCodePoint(Range)(Range range)
        if(isInputRange!Range && is(Unqual!(ElementType!Range) == dchar))
    {
        return range;
    }

`Unqual!(ElementType!string)` is of course `dchar`.

Brought up in this discussion:
http://forum.dlang.org/thread/ovzcetxbrdblpmyizdjr@forum.dlang.org#post-ovzcetxbrdblpmyizdjr:40forum.dlang.org
Comment 1 Marc Schütz 2014-12-07 13:48:55 UTC
In case it wasn't clear:

For strings and wstrings, determining the actual number of code points is an O(n) operation and should therefore not be available via length at all. The current implementation returns the number of code units, not of code points.
Comment 2 hsteoh 2014-12-10 15:38:05 UTC
The documentation of byCodePoint states that it's the identity function when given a range of code points, and currently, strings are ranges of code points (due to autodecoding), so it simply returns the string as-is.

Should this be changed so that it returns a wrapper around the string that suppresses .length instead?
Comment 3 Peter Alexander 2014-12-14 19:19:28 UTC
In a perfect world, I think it should return a different range, but it's now a breaking change, and even breaks its documented behaviour. So I'm voting that this shouldn't be fixed.

Note: hasLength will still return false.
Comment 4 Marc Schütz 2014-12-17 14:18:45 UTC
(In reply to Peter Alexander from comment #3)
> In a perfect world, I think it should return a different range, but it's now
> a breaking change, and even breaks its documented behaviour. So I'm voting
> that this shouldn't be fixed.

I strongly disagree with this. The status quo is clearly wrong.
Comment 5 Dmitry Olshansky 2017-09-11 09:01:49 UTC
(In reply to Peter Alexander from comment #3)
> In a perfect world, I think it should return a different range, but it's now
> a breaking change, and even breaks its documented behaviour. So I'm voting
> that this shouldn't be fixed.
> 
> Note: hasLength will still return false.

Let's us not replicate the broken 'string has no length except it does' stuff even more.
If the user says byCodePoint he definetely expects a proper range.
I'll change the documentation to reflect this.
Comment 6 Dmitry Olshansky 2017-09-12 14:20:30 UTC
https://github.com/dlang/phobos/pull/5733
Comment 7 github-bugzilla 2017-09-25 16:47:11 UTC
Commits pushed to master at https://github.com/dlang/phobos

https://github.com/dlang/phobos/commit/d46bd62bcaa080ea1bfa19fc8d80359226f304a6
Fix issue 13829 - byCodePoint has length

https://github.com/dlang/phobos/commit/4cc17371b0994fe5aa494b800105dcae30ada674
Merge pull request #5733 from DmitryOlshansky/fix-issue-13829

Fix issue 13829 - byCodePoint has length
merged-on-behalf-of: Dmitry Olshansky <dmitry@olshansky.me>
Comment 8 github-bugzilla 2017-10-16 09:57:56 UTC
Commits pushed to stable at https://github.com/dlang/phobos

https://github.com/dlang/phobos/commit/d46bd62bcaa080ea1bfa19fc8d80359226f304a6
Fix issue 13829 - byCodePoint has length

https://github.com/dlang/phobos/commit/4cc17371b0994fe5aa494b800105dcae30ada674
Merge pull request #5733 from DmitryOlshansky/fix-issue-13829