D issues are now tracked on GitHub. This Bugzilla instance remains as a read-only archive.
Issue 10717 - std.ascii.toLower and toUpper should return char instead of dchar and avoid me to use a bad cast(char)
Summary: std.ascii.toLower and toUpper should return char instead of dchar and avoid m...
Status: RESOLVED FIXED
Alias: None
Product: D
Classification: Unclassified
Component: phobos (show other issues)
Version: D2
Hardware: All All
: P2 enhancement
Assignee: No Owner
URL:
Keywords: pull
Depends on:
Blocks:
 
Reported: 2013-07-26 05:06 UTC by bearophile_hugs
Modified: 2013-09-22 12:43 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description bearophile_hugs 2013-07-26 05:06:14 UTC
If I use functions from std.ascii I am stating that I am working with ASCII chars.

Also std.ascii docs say: "All of the functions in std.ascii accept unicode characters but effectively ignore them."

So if I use std.ascii.toLower on an ASCII char, in most times (all times so far) I want the result to be a char. If std.ascii.toLower returns a dchar it forces me to use a cast(char) in most (or all) cases. Such cast doesn't increase safety at all, it actually decreases it.

So I'd like std.ascii.toLower and std.ascii.toUpper to return char.

(This has a severity 'normal' instead of 'enhancement' because I regard that as a bug in the design of those two functions.)
Comment 1 Jonathan M Davis 2013-07-26 05:17:49 UTC
I really don't think that this is a bug. By taking dchar, they work with ranges of dchar and allow you to operate on strings that contain Unicode in cases where all you care about are particular ASCII characters (like when you only care about ASCII whitespace in a string - not Unicode whitespace - but the string contains Unicode characters). And as soon the functions accept dchar, they must return dchar, or they'll destroy any Unicode characters that get passed in.

A reasonable enhancement request would be to overload these functions with overloads which take char and return char, but there is actual value in having them accept and return dchar, and it would break existing code if they stopped. So, the functions that are there are there to stay, but you may get overloads which operate specifically on char.
Comment 2 bearophile_hugs 2013-07-26 07:33:37 UTC
(In reply to comment #1)

> So, the functions that are there are there to stay, but you may get overloads
> which operate specifically on char.

OK. Thank for your comments and corrections Jonathan.
Comment 4 bearophile_hugs 2013-07-27 05:46:05 UTC
(In reply to comment #3)
> https://github.com/D-Programming-Language/phobos/pull/1436

Thank you again Jonathan :-)
Comment 5 monarchdodra 2013-07-31 02:43:39 UTC
(In reply to comment #4)
> (In reply to comment #3)
> > https://github.com/D-Programming-Language/phobos/pull/1436
> 
> Thank you again Jonathan :-)

The new semantics are:

//----
If $(D c) is an uppercase ASCII character, then its corresponding lowercase
letter is returned. Otherwise, $(D c) is returned.

$(D C) can be any type which implicitly converts to $(D dchar). In the case
where it's a built-in type, $(D Unqual!C) is returned, whereas if it's a
user-defined type, $(D dchar) is returned.
//----

Does this fit the bill, or do you see anything else that needs to be addressed?
Comment 6 bearophile_hugs 2013-07-31 05:15:58 UTC
(In reply to comment #5)

> Does this fit the bill, or do you see anything else that needs to be addressed?

I have tried it, and the semantics seems OK.


But the implementation is:

auto toLower(C)(C c) if(is(C : dchar)) {
    static if(isScalarType!C)
        return isUpper(c) ? cast(Unqual!C)(c + 'a' - 'A') : cast(Unqual!C)c;
    else
        return toLower!dchar(c);
}


So a call to toLower can cause two function calls. toLower() is a tiny function that could be called many many times, so perhaps in debug mode (without inlining) doubling the number of function calls slows down the user code a bit.
Comment 7 monarchdodra 2013-07-31 12:26:58 UTC
(In reply to comment #6)
> So a call to toLower can cause two function calls. toLower() is a tiny function
> that could be called many many times, so perhaps in debug mode (without
> inlining) doubling the number of function calls slows down the user code a bit.

Well, it can cause two function calls only for user defined types, so the case should not be that common. Doing it this way means you'll only end up instanciating `toLower!dchar` for all user types. Which is also a plus.

One of the implementation I had suggested was:

////--------
//Public template that only filters and changes return type
auto toUpper(C)(C c) @safe pure nothrow
    if (is(C : dchar))
out(result)
{
    assert(!isLower(result));
}
body
{
    alias Ret = Select!(isScalarType!C, C, dchar);
    return cast(Ret) toUpperImpl(c);
}

//Non template that does actual job.
private dchar toUpperImpl(dchar c) @safe pure nothrow
{
    return isLower(c) ? cast(dchar)(c - ('a' - 'A')) : c;
}
////--------
Comment 8 github-bugzilla 2013-08-02 01:49:55 UTC
Commits pushed to master at https://github.com/D-Programming-Language/phobos

https://github.com/D-Programming-Language/phobos/commit/28c1a8480a61054e3ea64b3e55b8c90a3c1754a0
Fix issue# 10717.

This makes it so that if you're operating on chars or wchars with
std.ascii functions, you don't have to cast the return values back to
char or wchar, but it still allows you to operate on dchars without
mangling any Unicode values.

https://github.com/D-Programming-Language/phobos/commit/f7a593fe9cb0567bbb59a79653ae3a8538ffc7b0
Merge pull request #1436 from jmdavis/10717

Fix issue# 10717.

Merged.
Comment 9 github-bugzilla 2013-09-22 12:43:57 UTC
Commits pushed to master at https://github.com/D-Programming-Language/phobos

https://github.com/D-Programming-Language/phobos/commit/978250b3aa23b87f08c37dccb7642f5c77c15561
Fix Issue 10717 - std.ascii.toLower and toUpper should return char instead of dchar and avoid me to use a bad cast(char)

It turns out it was incorrectly working for enum types.

New semantics are that enums are also accepted, and their OriginalType is returned.

https://github.com/D-Programming-Language/phobos/commit/f621b5692f575fc83ec588d10d93dbb222709663
Merge pull request #1581 from monarchdodra/asciiToUpper

Fix Issue 10717 - std.ascii.toLower and toUpper should return char inste...