D issues are now tracked on GitHub. This Bugzilla instance remains as a read-only archive.
Issue 21038 - wchar and dchar string alignment should be 2 and 4, respectively
Summary: wchar and dchar string alignment should be 2 and 4, respectively
Status: RESOLVED FIXED
Alias: None
Product: D
Classification: Unclassified
Component: dmd (show other issues)
Version: D2
Hardware: x86_64 Linux
: P1 major
Assignee: No Owner
URL:
Keywords: pull, wrong-code
Depends on:
Blocks:
 
Reported: 2020-07-11 17:15 UTC by Tim
Modified: 2020-08-13 14:34 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description Tim 2020-07-11 17:15:35 UTC
The result of wcslen can be too low when compiling with dmd 2.093. Consider the following two files:

//////////////////// testabcd.d ///////////////////
import core.stdc.stddef;
import core.stdc.stdio;
import core.stdc.wchar_;

const(wchar_t)* name = "abcd";

void test()
{
	size_t length = wcslen(name);
	printf("length: %zd\n", length);
	printf("data: \"");
	for(const(wchar_t)* s = name; *s; s++)
		printf("%c", *s);
	printf("\"\n");
}
///////////////////////////////////////////////////

//////////////////// testxyzw.d ///////////////////
import testabcd;

void main()
{
	test();
}
///////////////////////////////////////////////////

Running it results in the following output:
length: 3
data: "abcd"

The correct length would be 4. When compiling with ldc it works as expected.

The filenames are important. When testxyzw.d is renamed to testx.d it produces the expected output.
Comment 1 ag0aep6g 2020-07-11 20:12:20 UTC
The string data is getting misaligned. wcslen assumes properly aligned data. testabcd.d can be reduced to this:

----
alias wchar_t = dchar;
const(wchar_t)* name = "abcd";
void test()
{
    assert((cast(size_t) name) % wchar_t.sizeof == 0); /* Fails. Should pass. */
}
----
Comment 2 Walter Bright 2020-08-07 02:39:44 UTC
For the program:

alias wchar_t = dchar;
const(wchar_t)* x = "xz";
const(wchar_t)* name = "abcd";
void test()
{
    assert((cast(size_t) name) % wchar_t.sizeof == 0); /* Fails. Should pass. */
}

the output generated is:

Section 6  .rodata  PROGBITS,ALLOC,SIZE=0x0030(48),OFFSET=0x0040,ALIGN=16
 0040:  78  0  0  0 7a  0  0  0  0  0  0  0 61  0  0  0   x...z.......a...
 0050:  62  0  0  0 63  0  0  0 64  0  0  0  0  0  0  0   b...c...d.......
 0060:   4 10  0  0  0  0  0  0 74 65 73 74  0  0  0  0   ........test....

It's a surprise to me that a 4 byte element array is supposed to be aligned to 8 bytes. I'm not seeing where this is a requirement?
Comment 3 Walter Bright 2020-08-07 02:43:55 UTC
Oh, I see now. For:

alias wchar_t = dchar;
const(wchar)* x = "xz";
const(wchar_t)* name = "abcd";
void test()
{
    assert((cast(size_t) name) % wchar_t.sizeof == 0); /* Fails. Should pass. */
}

the result is:

Section 6  .rodata  PROGBITS,ALLOC,SIZE=0x0030(48),OFFSET=0x0040,ALIGN=16
 0040:  78  0 7a  0  0  0 61  0  0  0 62  0  0  0 63  0   x.z...a...b...c.
 0050:   0  0 64  0  0  0  0  0  0  0  0  0  0  0  0  0   ..d.............
 0060:   4 10  0  0  0  0  0  0 74 65 73 74  0  0  0  0   ........test....

which is wrongly aligned on a 2 byte boundary.
Comment 4 Dlang Bot 2020-08-07 03:58:35 UTC
@WalterBright created dlang/dmd pull request #11528 "fix Issue 21038 - wchar and dchar string alignment should be 2 and 4,…" fixing this issue:

- fix Issue 21038 - wchar and dchar string alignment should be 2 and 4, respectively

https://github.com/dlang/dmd/pull/11528
Comment 5 Dlang Bot 2020-08-13 14:34:28 UTC
dlang/dmd pull request #11528 "fix Issue 21038 - wchar and dchar string alignment should be 2 and 4,…" was merged into master:

- 46994f578813b365050ce19ed0e0bcc132e7555b by Walter Bright:
  fix Issue 21038 - wchar and dchar string alignment should be 2 and 4, respectively

https://github.com/dlang/dmd/pull/11528