git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Johannes Schindelin <johannes.schindelin@gmx.de>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH 2/3] diff_populate_filespec: NUL-terminate buffers
Date: Tue, 6 Sep 2016 03:06:04 -0400	[thread overview]
Message-ID: <20160906070604.i5rojh3kyc7x7kso@sigill.intra.peff.net> (raw)
In-Reply-To: <f899957fa71537aa2686f17ce18aaf16f2fea2ac.1473090278.git.johannes.schindelin@gmx.de>

On Mon, Sep 05, 2016 at 05:45:06PM +0200, Johannes Schindelin wrote:

> It is true that many code paths populate the mmfile_t structure silently
> appending a NUL, e.g. when running textconv on a temporary file and
> reading the results back into an strbuf.
> 
> The assumption is most definitely wrong, however, when mmap()ing a file.
> 
> Practically, we seemed to be lucky that the bytes after mmap()ed memory
> were 1) accessible and 2) somehow contained NUL bytes *somewhere*.
> 
> In a use case reported by Chris Sidi, it turned out that the mmap()ed
> file had the precise size of a memory page, and on Windows the bytes
> after memory-mapped pages are in general not valid.
> 
> This patch works around that issue, giving us time to discuss the best
> course how to fix this problem more generally.

I don't know if we are in that much of a rush. This bug has been around
for many years (the thread I linked earlier is from 2012). Yes, it's bad
and annoying, but we can probably spend a few days discussing the
solution.

> diff --git a/diff.c b/diff.c
> index 534c12e..32f7f46 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -2826,6 +2826,15 @@ int diff_populate_filespec(struct diff_filespec *s, unsigned int flags)
>  			s->data = strbuf_detach(&buf, &size);
>  			s->size = size;
>  			s->should_free = 1;
> +		} else {
> +			/* data must be NUL-terminated so e.g. for regexec() */
> +			char *data = xmalloc(s->size + 1);
> +			memcpy(data, s->data, s->size);
> +			data[s->size] = '\0';
> +			munmap(s->data, s->size);
> +			s->should_munmap = 0;
> +			s->data = data;
> +			s->should_free = 1;
>  		}

Without having done a complete audit recently, my gut and my
recollection from previous discussions is that regexec() really is the
culprit here for the diff code[1]. If we are going to do a workaround
like this, I think we could limit it only to cases where know it
matters, like --pickaxe-regex.

Can it be triggered with -G? I thought that operated on the diff content
itself, which would always be in a heap buffer (which should be NUL
terminated, but if it isn't, that would be a separate fix from this).

-Peff

[1] We do make the assumption elsewhere that git objects are
    NUL-terminated, but that is enforced by the object-reading code
    (with the exception of streamed blobs, but those are obviously dealt
    with separately anyway).

  reply	other threads:[~2016-09-06  7:06 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-05 15:44 [PATCH 0/3] Fix a segfault caused by regexec() being called on mmap()ed data Johannes Schindelin
2016-09-05 15:45 ` [PATCH 1/3] Demonstrate a problem: our pickaxe code assumes NUL-terminated buffers Johannes Schindelin
2016-09-06 18:43   ` Jeff King
2016-09-08  7:53     ` Johannes Schindelin
2016-09-05 15:45 ` [PATCH 2/3] diff_populate_filespec: NUL-terminate buffers Johannes Schindelin
2016-09-06  7:06   ` Jeff King [this message]
2016-09-06 16:02     ` Johannes Schindelin
2016-09-06 18:41       ` Jeff King
2016-09-07 18:31         ` Junio C Hamano
2016-09-08  7:52           ` Johannes Schindelin
2016-09-08  7:49         ` Johannes Schindelin
2016-09-08  8:22           ` Jeff King
2016-09-08 16:57             ` Junio C Hamano
2016-09-08 18:22               ` Johannes Schindelin
2016-09-08 18:48               ` Jeff King
2016-09-05 15:45 ` [PATCH 3/3] diff_grep: add assertions verifying that the buffers are NUL-terminated Johannes Schindelin
2016-09-06  7:08   ` Jeff King
2016-09-06 16:04     ` Johannes Schindelin
2016-09-05 19:10 ` [PATCH 0/3] Fix a segfault caused by regexec() being called on mmap()ed data Junio C Hamano
2016-09-06  7:12   ` Jeff King
2016-09-06 14:06     ` Johannes Schindelin
2016-09-06 18:29       ` Jeff King
2016-09-08  7:29         ` Johannes Schindelin
2016-09-08  8:00           ` Jeff King
2016-09-09 10:09             ` Johannes Schindelin
2016-09-09 17:46               ` Junio C Hamano
2016-09-06 13:21   ` Johannes Schindelin
2016-09-06  6:58 ` Jeff King
2016-09-06 14:13   ` Johannes Schindelin
2016-09-08  7:31 ` [PATCH v2 " Johannes Schindelin
2016-09-08  7:31   ` [PATCH v2 2/3] Introduce a function to run regexec() on non-NUL-terminated buffers Johannes Schindelin
2016-09-08  8:04     ` Jeff King
2016-09-09  9:45       ` Johannes Schindelin
2016-09-09  9:59         ` Jeff King
2016-09-08  7:31   ` [PATCH v2 1/3] Demonstrate a problem: our pickaxe code assumes NUL-terminated buffers Johannes Schindelin
2016-09-08  7:31   ` [PATCH v2 3/3] Use the newly-introduced regexec_buf() function Johannes Schindelin
2016-09-08  7:54     ` Johannes Schindelin
2016-09-08  8:10       ` Jeff King
2016-09-08  8:14         ` Jeff King
2016-09-08  8:35           ` Jeff King
2016-09-08 19:06             ` Ramsay Jones
2016-09-08 19:53               ` Jeff King
2016-09-08 21:30                 ` Junio C Hamano
2016-09-08  7:33   ` [PATCH v2 0/3] Fix a segfault caused by regexec() being called on mmap()ed data Johannes Schindelin
2016-09-08  8:13     ` Jeff King
2016-09-08  7:57   ` [PATCH v3 " Johannes Schindelin
2016-09-08  7:57     ` [PATCH v3 1/3] Demonstrate a problem: our pickaxe code assumes NUL-terminated buffers Johannes Schindelin
2016-09-08  7:58     ` [PATCH v3 2/3] Introduce a function to run regexec() on non-NUL-terminated buffers Johannes Schindelin
2016-09-08 17:03       ` Junio C Hamano
2016-09-08  7:59     ` [PATCH v3 3/3] Use the newly-introduced regexec_buf() function Johannes Schindelin
2016-09-08 17:09       ` Junio C Hamano
2016-09-09  9:52         ` Johannes Schindelin
2016-09-09  9:57           ` Jeff King
2016-09-09 10:41             ` Johannes Schindelin
2016-09-09 17:49           ` Junio C Hamano
2016-09-21 18:23     ` [PATCH v4 0/3] Fix a segfault caused by regexec() being called on mmap()ed data Johannes Schindelin
2016-09-21 18:23       ` [PATCH v4 1/3] regex: -G<pattern> feeds a non NUL-terminated string to regexec() and fails Johannes Schindelin
2016-09-21 18:24       ` [PATCH v4 2/3] regex: add regexec_buf() that can work on a non NUL-terminated string Johannes Schindelin
2016-09-21 19:17         ` Junio C Hamano
2016-09-22 18:38           ` Johannes Schindelin
2016-09-21 18:24       ` [PATCH v4 3/3] regex: use regexec_buf() Johannes Schindelin
2016-09-21 19:18         ` Junio C Hamano
2016-09-21 20:09           ` Junio C Hamano
2016-09-21 22:03         ` Jeff King
2016-09-25 14:01           ` Johannes Schindelin
2016-09-21 22:04       ` [PATCH v4 0/3] Fix a segfault caused by regexec() being called on mmap()ed data Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160906070604.i5rojh3kyc7x7kso@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=johannes.schindelin@gmx.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).