From: Jeff King <peff@peff.net>
To: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Cc: Junio C Hamano <gitster@pobox.com>, git@vger.kernel.org
Subject: Re: [PATCH 0/3] Fix a segfault caused by regexec() being called on mmap()ed data
Date: Thu, 8 Sep 2016 04:00:35 -0400 [thread overview]
Message-ID: <20160908080035.czwn5y3re5bp5vkg@sigill.intra.peff.net> (raw)
In-Reply-To: <alpine.DEB.2.20.1609080921030.129229@virtualbox>
On Thu, Sep 08, 2016 at 09:29:58AM +0200, Johannes Schindelin wrote:
> sorry for the late answer, I was really busy trying to come up with a new
> and improved version of the patch series, and while hunting a bug I
> introduced got bogged down with other tasks.
No problem. I am not in a hurry.
> > I always assumed the _point_ of re_search taking a ptr/len pair was
> > exactly to handle this case. The documentation[1] says:
> >
> > `string` is the string you want to match; it can contain newline and
> > null characters. `size` is the length of that string.
> >
> > Which seems pretty definitive to me (that's for re_match(), but
> > re_search() is defined in the docs in terms of re_match()).
>
> Right. The problem is: I *really* want to avoid using GNU-isms.
I don't think GNU-isms are a problem if we wrap them to give a nice
interface, and if we rely on having compat/regex. But if you mean "I do
not want to rely on using compat/regex everywhere", then OK. I can see
arguments both for and against using a consistent regex library, but I
do not care that much either way myself.
> > We can contain this to the existing compat/regexec/regexec.c, and just
> > provide a wrapper that is similar to regexec but takes a ptr/len pair.
>
> But we can do even better than that: we can provide a wrapper that uses
> REG_STARTEND where available (which is really the majority of platforms we
> care about: Linux, MacOSX, Windows, and even the *BSDs). Where it is not
> available, we simply malloc(), memcpy() and append a NUL.
Doesn't that make things much _worse_ for people on systems without
REG_STARTEND? If we imagine that most regexec calls would operate on a
NUL-terminated buffer, then they are now paying the extra malloc and
copy for each call to regexec_buf(), even if the buffer was already
NUL-terminated (because they have no idea whether it was or not).
I think I'd rather just have:
#ifndef REG_STARTEND
#error "Your regex library sucks. Compile with NO_REGEX=NeedsStartEnd"
#endif
(or you could just use REG_STARTEND and let the compiler complain, but
then the user has to figure out the right knob to twiddle).
One other question about REG_STARTEND is: what does it do with NULs
inside the buffer? Certainly glibc (and our compat/regex) treat it as a
buffer with a particular length and ignore embedded NULs, as we want.
But the NetBSD documentation says only:
REG_STARTEND The string is considered to start at string +
pmatch[0].rm_so and to have a terminating NUL
located at string + pmatch[0].rm_eo (there need not
actually be a NUL at that location),
Besides avoiding a segfault, one of the benefits of regcomp_buf() is
that we will now find pickaxe-regex strings inside mixed binary/text
files. But it's not clear to me that NetBSD's implementation does this.
I guess we can assume it is fine (it is certainly no _worse_ than the
current behavior), and if people's platforms do not handle it, they can
build with NO_REGEX.
-Peff
next prev parent reply other threads:[~2016-09-08 8:00 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-09-05 15:44 [PATCH 0/3] Fix a segfault caused by regexec() being called on mmap()ed data Johannes Schindelin
2016-09-05 15:45 ` [PATCH 1/3] Demonstrate a problem: our pickaxe code assumes NUL-terminated buffers Johannes Schindelin
2016-09-06 18:43 ` Jeff King
2016-09-08 7:53 ` Johannes Schindelin
2016-09-05 15:45 ` [PATCH 2/3] diff_populate_filespec: NUL-terminate buffers Johannes Schindelin
2016-09-06 7:06 ` Jeff King
2016-09-06 16:02 ` Johannes Schindelin
2016-09-06 18:41 ` Jeff King
2016-09-07 18:31 ` Junio C Hamano
2016-09-08 7:52 ` Johannes Schindelin
2016-09-08 7:49 ` Johannes Schindelin
2016-09-08 8:22 ` Jeff King
2016-09-08 16:57 ` Junio C Hamano
2016-09-08 18:22 ` Johannes Schindelin
2016-09-08 18:48 ` Jeff King
2016-09-05 15:45 ` [PATCH 3/3] diff_grep: add assertions verifying that the buffers are NUL-terminated Johannes Schindelin
2016-09-06 7:08 ` Jeff King
2016-09-06 16:04 ` Johannes Schindelin
2016-09-05 19:10 ` [PATCH 0/3] Fix a segfault caused by regexec() being called on mmap()ed data Junio C Hamano
2016-09-06 7:12 ` Jeff King
2016-09-06 14:06 ` Johannes Schindelin
2016-09-06 18:29 ` Jeff King
2016-09-08 7:29 ` Johannes Schindelin
2016-09-08 8:00 ` Jeff King [this message]
2016-09-09 10:09 ` Johannes Schindelin
2016-09-09 17:46 ` Junio C Hamano
2016-09-06 13:21 ` Johannes Schindelin
2016-09-06 6:58 ` Jeff King
2016-09-06 14:13 ` Johannes Schindelin
2016-09-08 7:31 ` [PATCH v2 " Johannes Schindelin
2016-09-08 7:31 ` [PATCH v2 2/3] Introduce a function to run regexec() on non-NUL-terminated buffers Johannes Schindelin
2016-09-08 8:04 ` Jeff King
2016-09-09 9:45 ` Johannes Schindelin
2016-09-09 9:59 ` Jeff King
2016-09-08 7:31 ` [PATCH v2 1/3] Demonstrate a problem: our pickaxe code assumes NUL-terminated buffers Johannes Schindelin
2016-09-08 7:31 ` [PATCH v2 3/3] Use the newly-introduced regexec_buf() function Johannes Schindelin
2016-09-08 7:54 ` Johannes Schindelin
2016-09-08 8:10 ` Jeff King
2016-09-08 8:14 ` Jeff King
2016-09-08 8:35 ` Jeff King
2016-09-08 19:06 ` Ramsay Jones
2016-09-08 19:53 ` Jeff King
2016-09-08 21:30 ` Junio C Hamano
2016-09-08 7:33 ` [PATCH v2 0/3] Fix a segfault caused by regexec() being called on mmap()ed data Johannes Schindelin
2016-09-08 8:13 ` Jeff King
2016-09-08 7:57 ` [PATCH v3 " Johannes Schindelin
2016-09-08 7:57 ` [PATCH v3 1/3] Demonstrate a problem: our pickaxe code assumes NUL-terminated buffers Johannes Schindelin
2016-09-08 7:58 ` [PATCH v3 2/3] Introduce a function to run regexec() on non-NUL-terminated buffers Johannes Schindelin
2016-09-08 17:03 ` Junio C Hamano
2016-09-08 7:59 ` [PATCH v3 3/3] Use the newly-introduced regexec_buf() function Johannes Schindelin
2016-09-08 17:09 ` Junio C Hamano
2016-09-09 9:52 ` Johannes Schindelin
2016-09-09 9:57 ` Jeff King
2016-09-09 10:41 ` Johannes Schindelin
2016-09-09 17:49 ` Junio C Hamano
2016-09-21 18:23 ` [PATCH v4 0/3] Fix a segfault caused by regexec() being called on mmap()ed data Johannes Schindelin
2016-09-21 18:23 ` [PATCH v4 1/3] regex: -G<pattern> feeds a non NUL-terminated string to regexec() and fails Johannes Schindelin
2016-09-21 18:24 ` [PATCH v4 2/3] regex: add regexec_buf() that can work on a non NUL-terminated string Johannes Schindelin
2016-09-21 19:17 ` Junio C Hamano
2016-09-22 18:38 ` Johannes Schindelin
2016-09-21 18:24 ` [PATCH v4 3/3] regex: use regexec_buf() Johannes Schindelin
2016-09-21 19:18 ` Junio C Hamano
2016-09-21 20:09 ` Junio C Hamano
2016-09-21 22:03 ` Jeff King
2016-09-25 14:01 ` Johannes Schindelin
2016-09-21 22:04 ` [PATCH v4 0/3] Fix a segfault caused by regexec() being called on mmap()ed data Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160908080035.czwn5y3re5bp5vkg@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=Johannes.Schindelin@gmx.de \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).