Issue 9173 - std.string.wrap should conform to Unicode line-breaking algorithm
Summary: std.string.wrap should conform to Unicode line-breaking algorithm
Status: NEW
Alias: None
Product: D
Classification: Unclassified
Component: phobos (show other issues)
Version: D2
Hardware: All All
: P4 enhancement
Assignee: No Owner
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-12-17 13:24 UTC by hsteoh
Modified: 2024-12-01 16:15 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description hsteoh 2012-12-17 13:24:08 UTC
Currently, there are some issues with std.string.wrap:

1) It uses std.uni.isWhite as criterion for line-breaking opportunities, but isWhite includes such things as non-breaking space, which should *not* be wrapped. It also includes things like vowel mark separators, which shouldn't be wrapped, either.

2) It does not take zero-width characters and combining diacritics into account when counting columns, which means that it will sometimes wrap the line at the wrong place.

3) It does not wrap CJK text or Thai text correctly.

For reference, here's the Unicode technical reference that describes proper line-breaking of Unicode text:

http://www.unicode.org/reports/tr14/

(After having read through TR14, I was in awe at how insanely complicated line-wrapping in Unicode is. So I'd propose that, if nothing else, we should fix items (1) and (2) above, which should be within the reach of a relatively simple-to-implement European-centric line wrapping algorithm. People who want CJK wrapping or other complicated stuff probably want to be writing their own algo anyway.)
Comment 1 dlangBugzillaToGithub 2024-12-01 16:15:57 UTC
THIS ISSUE HAS BEEN MOVED TO GITHUB

https://github.com/dlang/phobos/issues/9944

DO NOT COMMENT HERE ANYMORE, NOBODY WILL SEE IT, THIS ISSUE HAS BEEN MOVED TO GITHUB