D issues are now tracked on GitHub. This Bugzilla instance remains as a read-only archive.
Issue 8384 - std.conv.to should allow conversion between any pair of string/wstring/dstring/char*/wchar*/dchar*
Summary: std.conv.to should allow conversion between any pair of string/wstring/dstrin...
Status: NEW
Alias: None
Product: D
Classification: Unclassified
Component: phobos (show other issues)
Version: D2
Hardware: All All
: P4 enhancement
Assignee: No Owner
URL:
Keywords:
: 6157 (view as issue list)
Depends on:
Blocks:
 
Reported: 2012-07-13 05:23 UTC by Vladimir Panteleev
Modified: 2024-12-01 16:15 UTC (History)
5 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description Vladimir Panteleev 2012-07-13 05:23:29 UTC
import std.conv;
import std.string;

unittest
{
	static void test(T)(T lp)
	{
		assert(format("%s", lp) == "Hello, world!");
		assert(to!string(lp)    == "Hello, world!");
	}

	test("Hello, world!" .ptr);
	test("Hello, world!"w.ptr);
	test("Hello, world!"d.ptr);
}

wchar* conversion is commonly needed for Windows programming, as UTF-16 is the native encoding for Unicode Windows API functions.
Comment 1 Jonathan M Davis 2012-07-13 12:00:53 UTC
So, you expect %s on a pointer to give you the string that it points to? Why? It's pointer, not a string. It's going to convert the pointer. That works as expected.

to!string should take null-terminated string and give you a string, and it does that. This code passes:

import std.conv;
import std.string;

void main()
{
    static void test(T)(T lp)
    {
        assert(to!string(lp), "hello world");
    }

    test("Hello, world!" .ptr);
    test("Hello, world!"w.ptr);
    test("Hello, world!"d.ptr);
}

So, I'd say that as far as your code goes, there's nothing wrong with it. It functions exactly as expected. What _doesn't_ work is this:

import std.conv;
import std.string;

void main()
{
    static void test(T)(T lp)
    {
        assert(to!wstring(lp), "hello world");
        assert(to!dstring(lp), "hello world");
    }

    test("Hello, world!" .ptr);
    test("Hello, world!"w.ptr);
    test("Hello, world!"d.ptr);
}

The code doesn't even compile, giving these errors:

/home/jmdavis/dmd2/linux/bin/../../src/phobos/std/conv.d(819): Error: incompatible types for ((cast(immutable(dchar)[])_adDupT(&_D12TypeInfo_Aya6__initZ,value[cast(ulong)0..strlen(cast(const(char*))value)])) ? (null)): 'immutable(dchar)[]' and 'string'
/home/jmdavis/dmd2/linux/bin/../../src/phobos/std/conv.d(268): Error: template instance std.conv.toImpl!(immutable(dchar)[],immutable(char)*) error instantiating
q.d(8):        instantiated from here: to!(immutable(char)*)
q.d(11):        instantiated from here: test!(immutable(char)*)
q.d(8): Error: template instance std.conv.to!(immutable(dchar)[]).to!(immutable(char)*) error instantiating
q.d(11):        instantiated from here: test!(immutable(char)*)
q.d(11): Error: template instance q.main.test!(immutable(char)*) error instantiating
Comment 2 Vladimir Panteleev 2012-07-13 13:36:05 UTC
> to!string should take null-terminated string and give you a string, and it does
> that. This code passes:

Is it something that was fixed recently (within the last two weeks)? My two-week-old dmd git build and dpaste still print offsets for wchar* and dchar*: http://dpaste.dzfl.pl/26a2b284

> So, you expect %s on a pointer to give you the string that it points to? Why?

I think that, before all else, we should be looking for good reasons why format("%s", foo) and to!string(foo) produce different results. Why should one format the offset and the other do a conversion?

Second, I believe that the principle of least surprise is making this case rather clear: if the programmer tries to print a char*, it's almost certain that they want to print the null-terminated string at the given address, rather than a hexadecimal representation of the address (which are rarely useful to the end-user). Generic code is the only exception I can think of, in which case a cast to void* is in order.

> What _doesn't_ work is this:

I think this should call the appropriate toUTFx functions from std.utf.
Comment 3 Vladimir Panteleev 2012-07-13 13:42:17 UTC
> I think this should call the appropriate toUTFx functions from std.utf.

Sorry about that, misread your example. I guess, ideally, conversion between any pair of {|w|d}{char*|string} should work.
Comment 4 Jonathan M Davis 2012-07-13 13:59:09 UTC
format and writeln are supposed to behave the same, because they both operate on format strings (they _don't_ currently behave 100% the same, but format's current implementation will be replaced with the new xformat's implementation in a few months - after the "scheduled for deprecation" time period). to!string is an entirely different beast.

std.conv.to is asking for an explicit conversion to string, whereas format and writeln are converting according to the format specifiers, and %s indicates the default string representation of the type. char*, wchar*, and dchar* are pointers - _not_ strings - and should not be treated as strings. Pointers print their address with %s. Making char*, wchar*, and dchar* print themselves as strings would be inconsistent with other pointer types, and operating on char*, wchar*, and dchar* should be discouraged, not encouraged.

to!string is treated differently, because you're asking for an explicit conversion, and we _do_ need to be able to convert null-terminated strings to D strings.

So, while I can see your point, I really don't think that having format or writeln treat char*, wchar*, or dchar* as null-terminated strings is a good idea. We should provide a means of converting them to D strings but not do anything to encourage using them as-is without converting them.
Comment 5 Vladimir Panteleev 2012-07-13 14:25:36 UTC
OK, fair enough.

I've updated the enhancement request's title according to my previous comment.

Test:

-----------------------------------------------------------------------------

import std.conv;

void test1(T)(T lp)
{
    test2!( string)(lp);
    test2!(wstring)(lp);
    test2!(dstring)(lp);
    test2!(  char*)(lp);
    test2!( wchar*)(lp);
    test2!( dchar*)(lp);
}

void test2(D, S)(S lp)
{
    D dest = to!D(lp);
    assert(to!string(dest) == "Hello, world!");
}

unittest
{
    test1("Hello, world!" );
    test1("Hello, world!"w);
    test1("Hello, world!"d);
    test1("Hello, world!" .ptr);
    test1("Hello, world!"w.ptr);
    test1("Hello, world!"d.ptr);
}
Comment 6 Vladimir Panteleev 2012-07-13 14:31:04 UTC
Oh, I forgot about constness.

I guess that raises the number of combinations to (2*3*3)^2 = 324.
Comment 7 David Nadlinger 2012-07-13 14:37:07 UTC
Hooray for using "static" foreach to conveniently enumerate all the cases to test!
Comment 8 Jonathan M Davis 2012-07-13 14:48:31 UTC
> Hooray for using "static" foreach to conveniently enumerate all the cases to
test!

Yeah. I do that all of the time when I have to test with multiple types (especially with strings), and I always push for string-related tests to do that when I see that someone is looking to submit code to Phobos for a function that takes one or more strings as templated types, and their tests don't do that. It's just one of those things that everyone who writes much in the way of unit tests in D should learn and know about.
Comment 9 Vladimir Panteleev 2012-08-15 13:24:08 UTC
Another case of confusion due to format treating C strings as pointers:

http://stackoverflow.com/q/11975353/21501

I still think that the current behavior, regardless of how much it makes sense from a design/consistency/orthogonality/etc. perspective, is simply not useful and fails the principle of least surprise in most expected cases.

I strongly believe that we should either forbid passing char pointers to format/writeln (and force the user to cast to void* or convert to a D string), or print them as C null-terminated strings.
Comment 10 Jonathan M Davis 2012-08-15 13:35:28 UTC
char* acts identically to the other pointer types, and I fully believe that it should stay that way. We've pretty much removed all of the D features which involved either treating a string as char* or a char* as a string (including disallowing implicit conversion of string to const char*). The _only_ feature that the language has which supports that is the fact that string literals have a null character one past their end and will implicitly convert to const char*.

It would be a huge mistake IMHO to support doing _anything_ with character pointers which treats them as strings without requiring an explicit conversion of some kind. Anyone who continues to think of char* as being a string in D is just asking for trouble. They need to learn to use strings correctly.

If you really want to use char* as a string in functions like format or writeln, then simply either use to!string or ptr[0 .. strln(ptr)].
Comment 11 Vladimir Panteleev 2012-08-15 13:48:30 UTC
Sorry, I don't think that your categorical point of view is constructive. As long as D will interface with C libraries and programs, people will continue to attempt to use C strings together or in place of D strings, and issues like the above will continue to appear.

How often would a typical D user want to print / format the address of a character, versus the null-terminated string at that address?

> It would be a huge mistake IMHO to support doing _anything_ with character
> pointers which treats them as strings without requiring an explicit conversion
> of some kind. 

Why would it be a mistake? What exactly do we lose by allowing writeln/format to understand C strings?

> Anyone who continues to think of char* as being a string in D is
> just asking for trouble.

What kind of trouble?

> They need to learn to use strings correctly.

D printing an address when text was expected will sooner generate a "D sucks" reaction than a "Oops, I need to learn to use strings correctly" one.

> If you really want to use char* as a string in functions like format or
writeln, then simply either use to!string or ptr[0 .. strln(ptr)].

That's not really simple, considering some spots where that (verbose) modification needs to be made would be discovered only late at runtime, and even then the actual problem is not obvious to identify (as seen in the SO question above).
Comment 12 Vladimir Panteleev 2012-08-15 13:56:00 UTC
I would like to stress out a point that I hope could clear up my view of the logic that writeln/format should use.

Printing/formatting memory addresses is extremely rarely useful!

Except for some dirty debugging, I can't imagine a case where the user expects that passing a pointer to something to format would yield the hex representation of that address.

I believe that printing a pointer as a hex address should be the fallback, last-resort behavior, if there is no better representation for the said type. (This also allows discussion of calling toString() on struct pointers.)

For the rare case that the user intends to actually print a pointer, this is easily accomplished by a cast to size_t and using the appropriate hex format specifier.
Comment 13 Jonathan M Davis 2012-08-15 13:57:15 UTC
Anyone who does not understand that char* is _not_ a string will continue to make mistakes like trying to concatenate a char* to a string ( http://stackoverflow.com/questions/11914070/why-can-i-not-concatenate-a-constchar-to-a-string-in-d ) or try and pass string directly to a C function. They will constantly run into problems when dealing with strings. char* is _not_ a string and should not be treated as such. Treating it as a string with something like writeln will just help further the misconception that char* is a string and hinder people learning and using D. D programmers need to understand the difference between char* and string. char* should _not_ be treated as special, because it's not.
Comment 14 Vladimir Panteleev 2012-08-15 14:01:42 UTC
First of all, you are conflating ignorance between the two string types with my arguments. Users who are aware that D has its own way of handling strings are still open to making frustrating mistakes.

Second, getting unexpected output is not a good way to teach people about this. Hence my earlier proposal to make writeln/format REJECT char pointer types, on the basis that the user's intention is ambiguous (I don't think so personally, but obviously that's just my opinion).
Comment 15 Jonathan M Davis 2012-08-15 14:06:49 UTC
I'm saying that we shouldn't treat char* differently from int* just because some newbies expect char* to act like a string. And if you know D, then you know that char* is _not_ a string, and I don't see how you could expect it to be treated as one. Either making char* act like a string or disallowing printing it would make it act differently from other pointer types just to appease the folks who mistakingly think that char* is a string.
Comment 16 Vladimir Panteleev 2012-08-15 14:08:44 UTC
Well, then how about removing the pointer-printing feature entirely, and issue a compile-time error on all pointer types?
Comment 17 Vladimir Panteleev 2012-08-15 14:12:50 UTC
> And if you know D, then you know that char* is _not_ a string,
> and I don't see how you could expect it to be treated as one.

I don't think this argument is valid, because it assumes that all D users are always aware of the types they pass to writeln/format. In the SO case, the argument is a function result, and the function's return type is not explicitly written in the user's code.

People often expect the compiler to shout at them if they try to pass incompatible types to a function. writeln/format accept char pointers, but ultimately do something with them that in 99% of cases is simply not useful, and put the user in search of their mistake all across the data flow.
Comment 18 Adam D. Ruppe 2012-08-15 14:34:54 UTC
I think rejecting might be the best option because if you treat it as a string, what if it doesn't have a 0 terminator? That could easily happen if you pass it a pointer to a D string.

I don't think that is technically un-@safe, but it could be a problem anyway to get an unexpected crash because of it. At least with to!string(char*) you might think about it for a minute and avoid the problem.


So on one hand, I think it should just work, but on the other hand the compile time error might be the most sane.
Comment 19 Jonathan M Davis 2012-08-15 14:40:14 UTC
> Well, then how about removing the pointer-printing feature entirely, and issue
a compile-time error on all pointer types?

So, you're suggesting that we remove a useful feature because newbies coming from C/C++ keep mistakingly thinking that char* is a string?
Comment 20 Vladimir Panteleev 2012-08-15 14:44:20 UTC
Your formulation is misrepresenting the weight of the scales. Please seriously take into account the overall benefit for D for both decisions. The feature is nearly useless and more harmful, and "newbies coming
from C/C++" is, again, a misrepresentation as discussed above. It is also incorrect - someone used to e.g. using SDL bindings on another language may expect that the types returned by the binding would be compatible with the language's native functionality.
Comment 21 Andrej Mitrovic 2013-01-13 10:34:43 UTC
*** Issue 6157 has been marked as a duplicate of this issue. ***
Comment 22 Andrej Mitrovic 2013-01-13 10:35:51 UTC
(In reply to comment #21)
> *** Issue 6157 has been marked as a duplicate of this issue. ***

FYI: http://d.puremagic.com/issues/show_bug.cgi?id=6157 has an experimental implementation in the attachment (for conv.to), but I'm not an expert on things unicode.
Comment 23 anonymous4 2014-02-20 10:21:15 UTC
(In reply to comment #19)
> So, you're suggesting that we remove a useful feature because newbies coming
> from C/C++ keep mistakingly thinking that char* is a string?

char* is the way to represent null-terminated strings and C programmers are not mistaken in that.

As to the useful feature, it can be done with %p format specifier - that's what printf does.
Comment 24 Simen Kjaeraas 2016-04-14 21:41:58 UTC
https://github.com/D-Programming-Language/phobos/pull/4199

PR covers conversion from {X}char* to {Y}char[], but not the other way around. no such conversions are currently supported at all, so took the liberty of not implementing that without a bit more discussion.

Are there convincing reasons to support any of those conversions at all?
Comment 25 github-bugzilla 2016-04-26 20:14:12 UTC
Commits pushed to master at https://github.com/dlang/phobos

https://github.com/dlang/phobos/commit/60a233372a96abab810f030b4e3ff494987aa25e
Partial fix of Issue 8384 - std.conv.to should allow conversion between any pair of string/wstring/dstring/char*/wchar*/dchar*

https://github.com/dlang/phobos/commit/22c7f11265d62ad1ac387bc9aaa90b742f9563b2
Merge pull request #4199 from Biotronic/fix-8384

Partial fix of Issue 8384 - std.conv.to should allow conversion betwe…
Comment 26 dlangBugzillaToGithub 2024-12-01 16:15:18 UTC
THIS ISSUE HAS BEEN MOVED TO GITHUB

https://github.com/dlang/phobos/issues/9929

DO NOT COMMENT HERE ANYMORE, NOBODY WILL SEE IT, THIS ISSUE HAS BEEN MOVED TO GITHUB