Issue 6125 - to!string doesn't throw on invalid UTF sequence
Summary: to!string doesn't throw on invalid UTF sequence
Status: NEW
Alias: None
Product: D
Classification: Unclassified
Component: phobos (show other issues)
Version: D2
Hardware: Other All
: P3 normal
Assignee: No Owner
URL:
Keywords: bootcamp
Depends on:
Blocks:
 
Reported: 2011-06-08 11:41 UTC by Andrej Mitrovic
Modified: 2024-12-01 16:14 UTC (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description Andrej Mitrovic 2011-06-08 11:41:02 UTC
I'm not sure if this is a bug or wanted behavior:
    auto x = to!string(cast(char)255);

That won't throw. But this will:
    auto x = to!string(cast(char)255);  // or try 128
    auto z = toUTF8(x);  // throws

I've had this example code translated from C:

    foreach (y; 0 .. 16)
    foreach (x; 0 .. 16)
    {
        auto buffer = to!string(cast(char)(16 * x + y));
        auto result = buffer.toUTF16z;  // call to utf16z for the winapi
    }

Essentially the code builds a table of characters that it prints out. But it doesn't seem to take into account invalid UTF8 code points.

This leads me to another question, how does one iterate through valid UTF code points, starting from 0? Is there a Phobos function that does that?
Comment 1 Andrej Mitrovic 2016-08-27 21:55:57 UTC
-----
import std.conv;
import std.stdio;

void main()
{
    auto x = to!string(cast(char)255);
    writeln(x);
}
-----

Outputs:
[Decode error - output not utf-8]

I think the to!() routines should be UTF safe so the call to to!string above should throw an exception. Is this right Andrei?
Comment 2 Andrei Alexandrescu 2016-10-14 16:55:25 UTC
Well since it doesn't throw we may as well make it nothrow :o) and use the replacement char, or add an overload. I'll bootcamp this.
Comment 3 Lucia Cojocaru 2016-11-21 13:36:26 UTC
Is this a Windows specific bug?

I tested the following on Linux 64:
  1 import std.conv;
  2 import std.stdio;
  3 import std.utf;
  4 
  5 void main()
  6 {
  7     auto x = to!string(cast(char)191);
  8     auto z = toUTF8(x);
  9     writeln(x);
 10 
 11 
 12     foreach (y; 0 .. 16)
 13         foreach (r; 0 .. 16)
 14         {
 15             auto buffer = to!string(cast(char)(16 * r + y));
 16             auto b = toUTF8(buffer);
 17             writeln(b);
 18 //            auto result = buffer.toUTF16z;  // call to utf16z for the winapi
 19         }
 20 }


Only the commented line throws:
core.exception.UnicodeException@src/rt/util/utf.d(292): invalid UTF-8 sequence
Comment 4 berni44 2019-12-11 14:18:40 UTC
The original bug isn't windows specific. I don't know if the example from Lucia Cojocaru can be considered the same bug...
Comment 5 dlangBugzillaToGithub 2024-12-01 16:14:11 UTC
THIS ISSUE HAS BEEN MOVED TO GITHUB

https://github.com/dlang/phobos/issues/9906

DO NOT COMMENT HERE ANYMORE, NOBODY WILL SEE IT, THIS ISSUE HAS BEEN MOVED TO GITHUB