Issue 19428 - std.string.indexOf wrong result with bad unicode
Summary: std.string.indexOf wrong result with bad unicode
Status: NEW
Alias: None
Product: D
Classification: Unclassified
Component: phobos (show other issues)
Version: D2
Hardware: All All
: P3 normal
Assignee: No Owner
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-11-23 22:39 UTC by Vladimir Panteleev
Modified: 2024-12-01 16:34 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description Vladimir Panteleev 2018-11-23 22:39:35 UTC
//////////////////// test.d ///////////////////
import std.algorithm.comparison;
import std.range;
import std.string;

void main()
{
    assert(indexOf(
            only('\uFFFD', '\uFFFD', '\uFFFD'),
            "\x83\x84\x85",
            CaseSensitive.yes) == -1);
}
///////////////////////////////////////////////

Looks like it's replacing bad Unicode with replacement characters under the hood.

This becomes worse when something causes the same thing to happen to the haystack, as in this unit test:

https://github.com/dlang/phobos/blob/9bfc82130c0e4af4d1dc95bb261570c6e4f6f5d8/std/string.d#L887-L903

Note that this unittest is incorrectly annotated as nothrow/@nogc. We can't use the kind of decoding that substitutes errors with replacement characters, as that will introduce bugs like these.
Comment 1 dlangBugzillaToGithub 2024-12-01 16:34:33 UTC
THIS ISSUE HAS BEEN MOVED TO GITHUB

https://github.com/dlang/phobos/issues/9766

DO NOT COMMENT HERE ANYMORE, NOBODY WILL SEE IT, THIS ISSUE HAS BEEN MOVED TO GITHUB