D issues are now tracked on GitHub. This Bugzilla instance remains as a read-only archive.
Issue 11432 - formattedRead and slurp %s format code miss tab as whitespace
Summary: formattedRead and slurp %s format code miss tab as whitespace
Status: RESOLVED INVALID
Alias: None
Product: D
Classification: Unclassified
Component: phobos (show other issues)
Version: D2
Hardware: x86 Windows
: P2 normal
Assignee: No Owner
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-11-03 09:12 UTC by bearophile_hugs
Modified: 2019-12-07 09:52 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description bearophile_hugs 2013-11-03 09:12:46 UTC
This is a C program, note the string s that contains three fields, each separated by a single tab:


#include <stdio.h>
#include <stdlib.h>
int main() {
    char* s = "red\t10\t20";
    char t[20];
    int a, b;
    sscanf(s, "%s %d %d", &t, &a, &b);
    printf(">%s<>%d<>%d<\n", t, a, b);
    return 0;
}


It prints the output I expect:
>red<>10<>20<



The syntax of scanf says regarding the %s code:
http://www.mkssoftware.com/docs/man3/scanf.3.asp


s
A character string is expected; the corresponding argument should be a character pointer pointing to an array of characters large enough to accept the string and a terminating \0, which is added automatically. A white-space character terminates the input field. The conversion specifier hS is equivalent.



I think this is a similar D program:


import std.format, std.stdio;
void main() {
    string s = "red\t10\t20";
    string t;
    int a, b;
    formattedRead(s, "%s %d %d", &t, &a, &b);
    writef(">%s<>%d<>%d<\n", t, a, b);
}


But it prints:

>red    10  20<>0<>0<

As you see the tab is considered part of the first string field. This causes me troubles when I use slurp as shown below.

If I have this "data.txt" text file with Unix-style newlines, and where the string is separated by the integer with just 1 tab character:


red 10
blue    20


(So the whole file is:  "red\t10\nblue\t20").


If I run this code:

import std.file: slurp;
void main() {
    slurp!(string, int)("data.txt", "%s %d");
}


I get a stacktrace (dmd 2.064beta4):

std.conv.ConvException@...\dmd2\src\phobos\std\conv.d(2009): Unexpected end of input when converting from type char[] to type int
--------
0x0040E269 in pure @safe int std.conv.parse!(int, char[]).parse(ref char[]) at C:\dmd2\src\phobos\std\conv.d(2010)
...


To avoid the stack trace I have to put a tab between the two formattings:

import std.file: slurp;
void main() {
    slurp!(string, int)("data.txt", "%s\t%d");
}
Comment 1 berni44 2019-12-07 09:52:52 UTC
IMHO the problem is, that formattedRead is not identical to scanf - but it's not well documented. You should use

formattedRead(s, "%s\t%d\t%d", &t, &a, &b);

The same is true for the slurp example, where you found out allready, that you need \t.