D issues are now tracked on GitHub. This Bugzilla instance remains as a read-only archive.
Issue 19518 - std.range.front() returns a dchar when applied to char[]
Summary: std.range.front() returns a dchar when applied to char[]
Status: RESOLVED INVALID
Alias: None
Product: D
Classification: Unclassified
Component: phobos (show other issues)
Version: D2
Hardware: All All
: P1 normal
Assignee: No Owner
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-12-26 23:34 UTC by Vijay Nayar
Modified: 2020-03-21 03:56 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description Vijay Nayar 2018-12-26 23:34:55 UTC
Consider the following program:
```
import std.range;
void main()
{
	char[] data = ['a', 'b', 'c'];
	char a = data.front();
}
```

While std.range.front() works fine with most array types, there seems to be a problem with using char[] types. The above program actually produces a compiler error:
```
onlineapp.d(5): Error: cannot implicitly convert expression `front(data)` of type `dchar` to `char`

```

The workaround is to not use std.range.front(), but rather use basic array indexing, e.g. `data[0]`.
Comment 1 basile-z 2018-12-27 06:10:19 UTC
This is not a bug. D standard library auto decoded input ranged of char and wchar so their ElementEncodingType is dchar. The reasoning behind is this is that imagine an array such as

   ['é','µ','ç'] (which is somewhat equivalent to the string "éµç".dup btw)

You'd expect 3 elements, not 6. So if you want to get rid of decoding, cast your array as ubyte[] (or use std.range.byCodeUnit)
Comment 2 Vijay Nayar 2018-12-27 09:08:05 UTC
That makes sense for character processing. Perhaps my understanding of what .front() and .popFront() do is incorrect then. I had assumed that they were general purpose range methods that could also be used on arrays to treat them like ranges as well.

In this particular case, I was implementing a DenseHashSet algorithm, optimized for low memory overhead, when during my unittests, I discovered that they were failing when I made a set of characters. The reason was that my template code was using .front() to manage an internal array.

That may be the dilemma. What does the user have in mind when they use 'char'? Is it strictly for unicode text processing, or is it piece of data with a well defined size? Is it incumbent upon those who use templates to not use 'char' for data in templates (and type-cast bytes), or is it incumbent upon template writers to always consider this special case?

Or is this just the wrong usage of .front(), and array indexing, like data[0], should be preferred?
Comment 3 basile-z 2019-02-14 06:24:24 UTC
it was for phobos anyway.
Comment 4 anonymous4 2019-02-14 08:58:30 UTC
One possible solution is to publish a fork of std.range that treats text as array of code units and use it instead of phobos std.range.
Comment 5 Seb 2019-02-14 09:06:40 UTC
Or use .byCodeUnit, .byChar, . representation, or the upcoming rcstring ;-)
Comment 6 Vijay Nayar 2019-02-14 09:54:39 UTC
I think the tricky case is not so much when one begins and ends thinking of character processing, but when one is writing a generic algorithm using templates that makes use of std.range.front.

A template that takes a range type and an element and works with them will function fine in most cases for most types when they make use of ".front()" in their algorithms.

But as it stands right now, if anyone attempts to use said template with a `char` type, the template will no longer compile, because '.front()' returns a different element type than the range.

This means that either '.front()' shouldn't be used in generic algorithms that need to pull an element out of the range, in favor or something like '[0]', or it means that algorithm writers need to make `char` a special case in any algorithm they write.

I don't actually have a good answer for what approach is best.
Comment 7 Seb 2019-02-14 10:11:08 UTC
Well, we all agree that it's not super nice, but it also has advantages.
Take e.g. 'ü'. If .front would only return a char, you would get the invalid utf symbol. Try printing "ü"[0]

Yes, it has downsides though, but with ElementType!R or auto must generic algorithms don't care about the actual return of .front and if they do, they need special casing for strings anyhow.

Auto-decoding by default is considered as the top2 design error of D, but it's super hard to fix it now. The solutions so far are:
- fork std.range
- use byCodeUnit or similar
- use rcstring (or similar)

If you come up with a better idea, please share it in the NG, but we can't change std.range.front because it would break room of code. Thanks!