git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Peter Oberndorfer <kumbayo84@arcor.de>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>
Subject: Re: crash on git diff-tree -Ganything <tree> for new files with textconv filter
Date: Tue, 30 Oct 2012 08:17:48 -0400	[thread overview]
Message-ID: <20121030121747.GA4231@sigill.intra.peff.net> (raw)
In-Reply-To: <20121029224705.GA32148@sigill.intra.peff.net>

On Mon, Oct 29, 2012 at 06:47:05PM -0400, Jeff King wrote:

> On Mon, Oct 29, 2012 at 06:35:21PM -0400, Jeff King wrote:
> 
> > The patch below fixes it, but it's terribly inefficient (it just detects
> > the situation and reallocates). It would be much better to disable the
> > reuse_worktree_file mmap when we populate the filespec, but it is too
> > late to pass an option; we may have already populated from an earlier
> > diffcore stage.
> > 
> > I guess if we teach the whole diff code that "-G" (and --pickaxe-regex)
> > is brittle, we can disable the optimization from the beginning based on
> > the diff options. I'll take a look.
> 
> Hmm. That is problematic for two reasons.
> 
>   1. The whole diff call chain will have to be modified to pass the
>      options around, so they can make it down to the
>      diff_populate_filespec level. Alternatively, we could do some kind
>      of global hack, which is ugly but would work OK in practice.
> 
>   2. Reusing a working tree file is only half of the reason a filespec
>      might be mmap'd. It might also be because we are literally diffing
>      the working tree. "-G" was meant to be used to limit log traversal,
>      but it also works to reduce the diff output for something like "git
>      diff HEAD^".
> 
> I really wish there were an alternate regexec interface we could use
> that took a pointer/size pair. Bleh.

Thinking on it more, my patch, hacky thought it seems, may not be the
worst solution. Here are the options that I see:

  1. Use a regex library that does not require NUL termination. If we
     are bound by the regular regexec interface, this is not feasible.
     But the GNU implementation works on arbitrary-length buffers (you
     just have to use a slightly different interface), and we already
     carry it in compat. It would mean platforms which provide a working
     but non-GNU regexec would have to start defining NO_REGEX.

  2. Figure out a way to get one extra zero byte via mmap. If the
     requested size does not fall on a page boundary, you get extra
     zero-ed bytes. Unfortunately, requesting an extra byte does not
     do what we want; you get SIGBUS accessing it.

  3. Copy mmap'd data at point-of-use into a NUL-terminated buffer. That
     way we only incur the cost when we need it.

  4. Avoid mmap-ing in the first place when we are using -G or
     --pickaxe-regex (e.g., by doing a big read()). At first glance,
     this sounds more efficient than loading the data one way and then
     making another copy. But mmap+memcpy, aside from the momentary
     doubled memory requirement, is probably just as fast or faster than
     calling read() repeatedly.

I am really tempted by (1).

Given that (2) does not work, unless somebody comes up with something
clever there, that would make (3) the next best choice.

-Peff

  reply	other threads:[~2012-10-30 12:18 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-27 18:37 crash on git diff-tree -Ganything <tree> for new files with textconv filter Peter Oberndorfer
2012-10-28 12:01 ` Jeff King
2012-10-28 12:45   ` [PATCH 0/2] textconv support for "log -S" Jeff King
2012-10-28 12:46     ` [PATCH 1/2] pickaxe: hoist empty needle check Jeff King
2012-10-28 12:47     ` [PATCH 2/2] pickaxe: use textconv for -S counting Jeff King
2012-11-13 23:13       ` Junio C Hamano
2012-11-15  1:21         ` Jeff King
2012-11-20  0:31           ` Junio C Hamano
2012-11-20  0:48             ` Junio C Hamano
2012-11-21 20:27               ` Jeff King
2012-10-28 19:56   ` crash on git diff-tree -Ganything <tree> for new files with textconv filter Peter Oberndorfer
2012-10-29  6:05     ` Jeff King
2012-10-29  6:18       ` Jeff King
2012-10-29 20:19       ` Peter Oberndorfer
2012-10-29 22:35         ` Jeff King
2012-10-29 22:47           ` Jeff King
2012-10-30 12:17             ` Jeff King [this message]
2012-10-30 12:46               ` Junio C Hamano
2012-10-30 13:12                 ` Jeff King
2012-11-01 19:19               ` Ramsay Jones
2012-11-07 21:10           ` Peter Oberndorfer
2012-11-07 21:13             ` Jeff King
2013-06-03 17:25               ` Peter Oberndorfer
2013-06-03 22:17                 ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121030121747.GA4231@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=kumbayo84@arcor.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).