From: Jeff King <peff@peff.net>
To: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH 2/3] diff_populate_filespec: NUL-terminate buffers
Date: Tue, 6 Sep 2016 14:41:43 -0400 [thread overview]
Message-ID: <20160906184143.55a5zoa2mj6c2e5m@sigill.intra.peff.net> (raw)
In-Reply-To: <alpine.DEB.2.20.1609061613270.129229@virtualbox>
On Tue, Sep 06, 2016 at 06:02:59PM +0200, Johannes Schindelin wrote:
> It will still be quite tricky, because we have to touch a function that is
> rather at the bottom of the food chain: diff_populate_filespec() is called
> from fill_textconv(), which in turn is called from pickaxe_match(), and
> only pickaxe_match() knows whether we want to call regexec() or not (it
> depends on its regexp parameter).
>
> Adding a flag to diff_populate_filespec() sounds really reasonable until
> you see how many call sites fill_textconv() has.
I was thinking of something quite gross, like a global "switch to using
slower-but-safer NUL termination" flag (but I agree with Junio's point
elsewhere that we do not even know if it is "slower").
> > I thought that operated on the diff content itself, which would always
> > be in a heap buffer (which should be NUL terminated, but if it isn't,
> > that would be a separate fix from this).
>
> That is true.
>
> Except when preimage or postimage does not exist. In which case we call
>
> regexec(regexp, two->ptr, 1, ®match, 0);
>
> or the same with one->ptr. Note the notable absence of two->size.
Thanks, I forgot about that case.
> > [1] We do make the assumption elsewhere that git objects are
> > NUL-terminated, but that is enforced by the object-reading code
> > (with the exception of streamed blobs, but those are obviously dealt
> > with separately anyway).
>
> I know. I am the reason you introduced that, because I added code to
> fsck.c that assumes that tag/commit messages are NUL-terminated.
Sort of. I think it has been part of the design since e871b64
(unpack_sha1_file: zero-pad the unpacked object., 2005-05-25), though I
do recall that we missed a code path that did its allocation differently
(in index-pack, IIRC).
Anyway, that is neither here nor there for the diff code, which as you
noticed may operate on things besides git objects.
> So now for the better idea.
>
> While I was researching the code for this reply, I hit upon one thing that
> I never knew existed, introduced in f96e567 (grep: use REG_STARTEND for
> all matching if available, 2010-05-22). Apparently, NetBSD introduced an
> extension to regexec() where you can specify buffer boundaries using
> REG_STARTEND. Which is pretty much what we need.
Yes, and compat/regex support this, too. My question is whether it is
portable. I see:
> diff --git a/diff.c b/diff.c
> index 534c12e..2c5a360 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -951,7 +951,13 @@ static int find_word_boundaries(mmfile_t *buffer,
> regex_t *word_regex,
> {
> if (word_regex && *begin < buffer->size) {
> regmatch_t match[1];
> - if (!regexec(word_regex, buffer->ptr + *begin, 1, match,
> 0)) {
> + int f = 0;
> +#ifdef REG_STARTEND
> + match[0].rm_so = 0;
> + match[0].rm_eo = *end - *begin;
> + f = REG_STARTEND;
> +#endif
> + if (!regexec(word_regex, buffer->ptr + *begin, 1, match,
> f)) {
What happens to those poor souls on systems without REG_STARTEND? Do
they get to keep segfaulting?
I think the solution is to push them into setting NO_REGEX. So looking
at this versus a "regexecn", it seems:
- this lets people keep using their native regexec if it supports
STARTEND
- this is a bit more clunky to use at the callsites (though we could
_create_ a portable regexecn wrapper that uses this technique on top
of the native regex library)
But I much prefer this approach to copying the data just to add a NUL.
-Peff
next prev parent reply other threads:[~2016-09-06 18:41 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-09-05 15:44 [PATCH 0/3] Fix a segfault caused by regexec() being called on mmap()ed data Johannes Schindelin
2016-09-05 15:45 ` [PATCH 1/3] Demonstrate a problem: our pickaxe code assumes NUL-terminated buffers Johannes Schindelin
2016-09-06 18:43 ` Jeff King
2016-09-08 7:53 ` Johannes Schindelin
2016-09-05 15:45 ` [PATCH 2/3] diff_populate_filespec: NUL-terminate buffers Johannes Schindelin
2016-09-06 7:06 ` Jeff King
2016-09-06 16:02 ` Johannes Schindelin
2016-09-06 18:41 ` Jeff King [this message]
2016-09-07 18:31 ` Junio C Hamano
2016-09-08 7:52 ` Johannes Schindelin
2016-09-08 7:49 ` Johannes Schindelin
2016-09-08 8:22 ` Jeff King
2016-09-08 16:57 ` Junio C Hamano
2016-09-08 18:22 ` Johannes Schindelin
2016-09-08 18:48 ` Jeff King
2016-09-05 15:45 ` [PATCH 3/3] diff_grep: add assertions verifying that the buffers are NUL-terminated Johannes Schindelin
2016-09-06 7:08 ` Jeff King
2016-09-06 16:04 ` Johannes Schindelin
2016-09-05 19:10 ` [PATCH 0/3] Fix a segfault caused by regexec() being called on mmap()ed data Junio C Hamano
2016-09-06 7:12 ` Jeff King
2016-09-06 14:06 ` Johannes Schindelin
2016-09-06 18:29 ` Jeff King
2016-09-08 7:29 ` Johannes Schindelin
2016-09-08 8:00 ` Jeff King
2016-09-09 10:09 ` Johannes Schindelin
2016-09-09 17:46 ` Junio C Hamano
2016-09-06 13:21 ` Johannes Schindelin
2016-09-06 6:58 ` Jeff King
2016-09-06 14:13 ` Johannes Schindelin
2016-09-08 7:31 ` [PATCH v2 " Johannes Schindelin
2016-09-08 7:31 ` [PATCH v2 2/3] Introduce a function to run regexec() on non-NUL-terminated buffers Johannes Schindelin
2016-09-08 8:04 ` Jeff King
2016-09-09 9:45 ` Johannes Schindelin
2016-09-09 9:59 ` Jeff King
2016-09-08 7:31 ` [PATCH v2 1/3] Demonstrate a problem: our pickaxe code assumes NUL-terminated buffers Johannes Schindelin
2016-09-08 7:31 ` [PATCH v2 3/3] Use the newly-introduced regexec_buf() function Johannes Schindelin
2016-09-08 7:54 ` Johannes Schindelin
2016-09-08 8:10 ` Jeff King
2016-09-08 8:14 ` Jeff King
2016-09-08 8:35 ` Jeff King
2016-09-08 19:06 ` Ramsay Jones
2016-09-08 19:53 ` Jeff King
2016-09-08 21:30 ` Junio C Hamano
2016-09-08 7:33 ` [PATCH v2 0/3] Fix a segfault caused by regexec() being called on mmap()ed data Johannes Schindelin
2016-09-08 8:13 ` Jeff King
2016-09-08 7:57 ` [PATCH v3 " Johannes Schindelin
2016-09-08 7:57 ` [PATCH v3 1/3] Demonstrate a problem: our pickaxe code assumes NUL-terminated buffers Johannes Schindelin
2016-09-08 7:58 ` [PATCH v3 2/3] Introduce a function to run regexec() on non-NUL-terminated buffers Johannes Schindelin
2016-09-08 17:03 ` Junio C Hamano
2016-09-08 7:59 ` [PATCH v3 3/3] Use the newly-introduced regexec_buf() function Johannes Schindelin
2016-09-08 17:09 ` Junio C Hamano
2016-09-09 9:52 ` Johannes Schindelin
2016-09-09 9:57 ` Jeff King
2016-09-09 10:41 ` Johannes Schindelin
2016-09-09 17:49 ` Junio C Hamano
2016-09-21 18:23 ` [PATCH v4 0/3] Fix a segfault caused by regexec() being called on mmap()ed data Johannes Schindelin
2016-09-21 18:23 ` [PATCH v4 1/3] regex: -G<pattern> feeds a non NUL-terminated string to regexec() and fails Johannes Schindelin
2016-09-21 18:24 ` [PATCH v4 2/3] regex: add regexec_buf() that can work on a non NUL-terminated string Johannes Schindelin
2016-09-21 19:17 ` Junio C Hamano
2016-09-22 18:38 ` Johannes Schindelin
2016-09-21 18:24 ` [PATCH v4 3/3] regex: use regexec_buf() Johannes Schindelin
2016-09-21 19:18 ` Junio C Hamano
2016-09-21 20:09 ` Junio C Hamano
2016-09-21 22:03 ` Jeff King
2016-09-25 14:01 ` Johannes Schindelin
2016-09-21 22:04 ` [PATCH v4 0/3] Fix a segfault caused by regexec() being called on mmap()ed data Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160906184143.55a5zoa2mj6c2e5m@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=Johannes.Schindelin@gmx.de \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).