D issues are now tracked on GitHub. This Bugzilla instance remains as a read-only archive.
Issue 11350 - libphobos2 regex match segfaults when a rare HTTP header is received
Summary: libphobos2 regex match segfaults when a rare HTTP header is received
Status: RESOLVED WORKSFORME
Alias: None
Product: D
Classification: Unclassified
Component: phobos (show other issues)
Version: D2
Hardware: x86 Linux
: P2 normal
Assignee: No Owner
URL:
Keywords: pull
Depends on:
Blocks:
 
Reported: 2013-10-25 03:30 UTC by sha0coder
Modified: 2014-09-17 21:10 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description sha0coder 2013-10-25 03:30:36 UTC
A simple std.net.curl.get() is performed to a remote host, which contains some rare http headers, (I don't define the onReceiveHeader callback) but the liphobos2 call to the default onReceiveHeader() which apply a regex to the header, and then crashes.

I connect on this way:

	auto conn = HTTP();
	conn.connectTimeout(dur!"seconds"(4));
	conn.addRequestHeader("User-agent","Mozilla/5.0 (Windows NT 6.1; rv:20.0) Gecko/20100101 Firefox/20.0");
	char[] html = get(url,conn);


It seems the bug is at:

/usr/include/dmd/phobos/std/regex.d  line 6348

6537 public auto match(R, RegEx)(R input, RegEx re)
6538     if(isSomeString!R && is(RegEx == Regex!(BasicElementOf!R)))
6539 {
6540     return RegexMatch!(Unqual!(typeof(input)),ThompsonMatcher)(re, input);
6541 }

Maybe is an encoding problem, it seems the input is:
>>> print "%c%c%c%c%c%c%c%c%c" % (0x64,0x61,0x97,0x48,0x34,0x53,0x54,0x65,0x46)
da�H4STeF



(gdb) bt
#0  0xb76c8d13 in rt.deh2.terminate() () from /usr/lib/i386-linux-gnu/libphobos2.so.0.63
#1  0xb76c8ee3 in _d_throwc () from /usr/lib/i386-linux-gnu/libphobos2.so.0.63
#2  0x080b04cc in _D3std5regex49__T10RegexMatchTAaS273std5regex15ThompsonMatcherZ10RegexMatch43__T6__ctorTS3std5regex12__T5RegexTaZ5RegexZ6__ctorMFNcNeS3std5regex12__T5RegexTaZ5RegexAaZS3std5regex49__T10RegexMatchTAaS273std5regex15ThompsonMatcherZ10RegexMatch (this=0x95ac0774, input=646197483453546546, prog=...)
    at /usr/include/dmd/phobos/std/regex.d:6348
#3  0x080a09a2 in _D3std5regex45__T5matchTAaTS3std5regex12__T5RegexTaZ5RegexZ5matchFNfAaS3std5regex12__T5RegexTaZ5RegexZS3std5regex49__T10RegexMatchTAaS273std5regex15ThompsonMatcherZ10RegexMatch (__HID46=0x95ac0b18, re=..., input=646197483453546546) at /usr/include/dmd/phobos/std/regex.d:6540
#4  0xb768e20f in std.net.curl.HTTP.onReceiveHeader() () from /usr/lib/i386-linux-gnu/libphobos2.so.0.63
#5  0xb769125a in std.net.curl.Curl.onReceiveHeader() () from /usr/lib/i386-linux-gnu/libphobos2.so.0.63
#6  0xb7691665 in std.net.curl.Curl._receiveHeaderCallback() () from /usr/lib/i386-linux-gnu/libphobos2.so.0.63
#7  0xb72a5e7a in Curl_client_write () from /usr/lib/i386-linux-gnu/libcurl.so.4
#8  0xb72a4912 in Curl_http_readwrite_headers () from /usr/lib/i386-linux-gnu/libcurl.so.4
#9  0xb72bbf6d in Curl_readwrite () from /usr/lib/i386-linux-gnu/libcurl.so.4
#10 0xb72bde4d in ?? () from /usr/lib/i386-linux-gnu/libcurl.so.4
#11 0xb72be793 in curl_easy_perform () from /usr/lib/i386-linux-gnu/libcurl.so.4
#12 0xb7691093 in std.net.curl.Curl.perform() () from /usr/lib/i386-linux-gnu/libphobos2.so.0.63
#13 0xb768d8e1 in std.net.curl.HTTP._perform() () from /usr/lib/i386-linux-gnu/libphobos2.so.0.63
#14 0xb768d734 in std.net.curl.HTTP.perform() () from /usr/lib/i386-linux-gnu/libphobos2.so.0.63
#15 0x08081aac in _D3std3net4curl18__T10_basicHTTPTaZ10_basicHTTPFAxaAxvS3std3net4curl4HTTPZAa (client=..., sendData=579669917507256320,
    url=10576998119117946914) at /usr/include/dmd/phobos/std/net/curl.d:762
#16 0x08081948 in _D3std3net4curl30__T3getTS3std3net4curl4HTTPTaZ3getFAxaS3std3net4curl4HTTPZAa (conn=..., url=10576998119117946914)
    at /usr/include/dmd/phobos/std/net/curl.d:364
Comment 1 Dmitry Olshansky 2013-10-25 11:21:26 UTC
(In reply to comment #0)
> 
> It seems the bug is at:
> 
> /usr/include/dmd/phobos/std/regex.d  line 6348
> 
> 6537 public auto match(R, RegEx)(R input, RegEx re)
> 6538     if(isSomeString!R && is(RegEx == Regex!(BasicElementOf!R)))
> 6539 {
> 6540     return RegexMatch!(Unqual!(typeof(input)),ThompsonMatcher)(re, input);
> 6541 }
> 
> Maybe is an encoding problem, it seems the input is:
> >>> print "%c%c%c%c%c%c%c%c%c" % (0x64,0x61,0x97,0x48,0x34,0x53,0x54,0x65,0x46)
> da�H4STeF
>

Would be nice to see what pattern that is and how exactly the argument to it looks like.

I tried to reproduce with this:

void main()
{
    import std.regex;

    ubyte[] header = [0x64,0x61,0x97,0x48,0x34,0x53,0x54,0x65,0x46];
    auto m = match(cast(char[]) header, regex("(.*?): (.*)$"));
    assert(m.empty);
}

I get:

std.utf.UTFException@C:\dmd2\windows\bin\..\..\src\phobos\std\utf.d(1113): Invalid UTF-8 sequence (at index 1)

No crashes.
Now it may have to do with shared object / PIC code for all I know, as I'm testing on Win32.

But w/o a smaller or at least complete reproduceble test-case there is nothing to work on.
Comment 2 Dmitry Olshansky 2013-10-25 11:40:08 UTC
(In reply to comment #0)
> It seems the bug is at:

No and I think I know what it is.

> Maybe is an encoding problem, it seems the input is:
> >>> print "%c%c%c%c%c%c%c%c%c" % (0x64,0x61,0x97,0x48,0x34,0x53,0x54,0x65,0x46)
> da�H4STeF

Yes, this is broken UTF-8 and hence...
> 
> 
> 
> (gdb) bt
> #0  0xb76c8d13 in rt.deh2.terminate() () from
> /usr/lib/i386-linux-gnu/libphobos2.so.0.63

> #1  0xb76c8ee3 in _d_throwc () from /usr/lib/i386-linux-gnu/libphobos2.so.0.63

it throws and exception ...

> #2  0x080b04cc in
> _D3std5regex49__T10RegexMatchTAaS273std5regex15ThompsonMatcherZ10RegexMatch43__T6__ctorTS3std5regex12__T5RegexTaZ5RegexZ6__ctorMFNcNeS3std5regex12__T5RegexTaZ5RegexAaZS3std5regex49__T10RegexMatchTAaS273std5regex15ThompsonMatcherZ10RegexMatch
> (this=0x95ac0774, input=646197483453546546, prog=...)
>     at /usr/include/dmd/phobos/std/regex.d:6348

.. inside of std.regex.match. But the thing is - we are doing it inside of a callback of C-library CURL (browse the call stack to curl_easy_perform). IT HAS NO IDEA what to do with exception hence the crash.

So the fix would be to insulate it with try/catch inside of that onRecieve callback.

> #3  0x080a09a2 in
> _D3std5regex45__T5matchTAaTS3std5regex12__T5RegexTaZ5RegexZ5matchFNfAaS3std5regex12__T5RegexTaZ5RegexZS3std5regex49__T10RegexMatchTAaS273std5regex15ThompsonMatcherZ10RegexMatch
> (__HID46=0x95ac0b18, re=..., input=646197483453546546) at
> /usr/include/dmd/phobos/std/regex.d:6540
> #4  0xb768e20f in std.net.curl.HTTP.onReceiveHeader() () from
> /usr/lib/i386-linux-gnu/libphobos2.so.0.63
> #5  0xb769125a in std.net.curl.Curl.onReceiveHeader() () from
> /usr/lib/i386-linux-gnu/libphobos2.so.0.63
> #6  0xb7691665 in std.net.curl.Curl._receiveHeaderCallback() () from
> /usr/lib/i386-linux-gnu/libphobos2.so.0.63
> #7  0xb72a5e7a in Curl_client_write () from
> /usr/lib/i386-linux-gnu/libcurl.so.4
> #8  0xb72a4912 in Curl_http_readwrite_headers () from
> /usr/lib/i386-linux-gnu/libcurl.so.4
> #9  0xb72bbf6d in Curl_readwrite () from /usr/lib/i386-linux-gnu/libcurl.so.4
> #10 0xb72bde4d in ?? () from /usr/lib/i386-linux-gnu/libcurl.so.4
> #11 0xb72be793 in curl_easy_perform () from
> /usr/lib/i386-linux-gnu/libcurl.so.4
> #12 0xb7691093 in std.net.curl.Curl.perform() () from
> /usr/lib/i386-linux-gnu/libphobos2.so.0.63
> #13 0xb768d8e1 in std.net.curl.HTTP._perform() () from
> /usr/lib/i386-linux-gnu/libphobos2.so.0.63
> #14 0xb768d734 in std.net.curl.HTTP.perform() () from
> /usr/lib/i386-linux-gnu/libphobos2.so.0.63
> #15 0x08081aac in
> _D3std3net4curl18__T10_basicHTTPTaZ10_basicHTTPFAxaAxvS3std3net4curl4HTTPZAa
> (client=..., sendData=579669917507256320,
>     url=10576998119117946914) at /usr/include/dmd/phobos/std/net/curl.d:762
> #16 0x08081948 in
> _D3std3net4curl30__T3getTS3std3net4curl4HTTPTaZ3getFAxaS3std3net4curl4HTTPZAa
> (conn=..., url=10576998119117946914)
>     at /usr/include/dmd/phobos/std/net/curl.d:364
Comment 3 Dmitry Olshansky 2014-01-07 07:26:45 UTC
@sha0coder
Could you try with this fix:
https://github.com/D-Programming-Language/phobos/pull/1842
Comment 4 github-bugzilla 2014-01-07 17:02:24 UTC
Commits pushed to master at https://github.com/D-Programming-Language/phobos

https://github.com/D-Programming-Language/phobos/commit/3b6cc0cb73be19986ef1b8a30036227c98b37bb9
fix issue 11350

Do not throw on bad UTF inside of a C callback

https://github.com/D-Programming-Language/phobos/commit/84dbc9934d4e0e72dc9ce138a0a0771666b51f26
Merge pull request #1842 from blackwhale/issue-11350

Fix issue 11350 ibphobos2 regex match segfaults when a rare HTTP header is received
Comment 5 Dmitry Olshansky 2014-09-17 21:10:17 UTC
Original problem was patched in std.net.curl long ago, so closing this.