git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Jeff King <peff@peff.net>, Peter Oberndorfer <kumbayo84@arcor.de>
Cc: git@vger.kernel.org
Subject: Re: crash on git diff-tree -Ganything <tree> for new files with textconv filter
Date: Tue, 30 Oct 2012 21:46:01 +0900	[thread overview]
Message-ID: <da24b6ea-ac9b-46dd-b591-25fd4e8e6504@email.android.com> (raw)
In-Reply-To: <20121030121747.GA4231@sigill.intra.peff.net>

(1) sounds attractive for more than one reason. In addition to avoidance of this issue, it would bring bug-to-bug compatibility across platforms.

(4), if we can run grep on streaming data (tweak interface we have for checking out a large blob to the working tree), would let us work on dataset larger than fit in core. Even though it would be much more work, it might turn out to be a better option in the longer run.

Jeff King <peff@peff.net> wrote:

>On Mon, Oct 29, 2012 at 06:47:05PM -0400, Jeff King wrote:
>
>> On Mon, Oct 29, 2012 at 06:35:21PM -0400, Jeff King wrote:
>> 
>> > The patch below fixes it, but it's terribly inefficient (it just
>detects
>> > the situation and reallocates). It would be much better to disable
>the
>> > reuse_worktree_file mmap when we populate the filespec, but it is
>too
>> > late to pass an option; we may have already populated from an
>earlier
>> > diffcore stage.
>> > 
>> > I guess if we teach the whole diff code that "-G" (and
>--pickaxe-regex)
>> > is brittle, we can disable the optimization from the beginning
>based on
>> > the diff options. I'll take a look.
>> 
>> Hmm. That is problematic for two reasons.
>> 
>>   1. The whole diff call chain will have to be modified to pass the
>>      options around, so they can make it down to the
>>      diff_populate_filespec level. Alternatively, we could do some
>kind
>>      of global hack, which is ugly but would work OK in practice.
>> 
>>   2. Reusing a working tree file is only half of the reason a
>filespec
>>      might be mmap'd. It might also be because we are literally
>diffing
>>      the working tree. "-G" was meant to be used to limit log
>traversal,
>>      but it also works to reduce the diff output for something like
>"git
>>      diff HEAD^".
>> 
>> I really wish there were an alternate regexec interface we could use
>> that took a pointer/size pair. Bleh.
>
>Thinking on it more, my patch, hacky thought it seems, may not be the
>worst solution. Here are the options that I see:
>
>  1. Use a regex library that does not require NUL termination. If we
>     are bound by the regular regexec interface, this is not feasible.
>     But the GNU implementation works on arbitrary-length buffers (you
>     just have to use a slightly different interface), and we already
>    carry it in compat. It would mean platforms which provide a working
>     but non-GNU regexec would have to start defining NO_REGEX.
>
>  2. Figure out a way to get one extra zero byte via mmap. If the
>     requested size does not fall on a page boundary, you get extra
>     zero-ed bytes. Unfortunately, requesting an extra byte does not
>     do what we want; you get SIGBUS accessing it.
>
> 3. Copy mmap'd data at point-of-use into a NUL-terminated buffer. That
>     way we only incur the cost when we need it.
>
>  4. Avoid mmap-ing in the first place when we are using -G or
>     --pickaxe-regex (e.g., by doing a big read()). At first glance,
>     this sounds more efficient than loading the data one way and then
>     making another copy. But mmap+memcpy, aside from the momentary
>    doubled memory requirement, is probably just as fast or faster than
>     calling read() repeatedly.
>
>I am really tempted by (1).
>
>Given that (2) does not work, unless somebody comes up with something
>clever there, that would make (3) the next best choice.
>
>-Peff

  reply	other threads:[~2012-10-30 12:46 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-27 18:37 crash on git diff-tree -Ganything <tree> for new files with textconv filter Peter Oberndorfer
2012-10-28 12:01 ` Jeff King
2012-10-28 12:45   ` [PATCH 0/2] textconv support for "log -S" Jeff King
2012-10-28 12:46     ` [PATCH 1/2] pickaxe: hoist empty needle check Jeff King
2012-10-28 12:47     ` [PATCH 2/2] pickaxe: use textconv for -S counting Jeff King
2012-11-13 23:13       ` Junio C Hamano
2012-11-15  1:21         ` Jeff King
2012-11-20  0:31           ` Junio C Hamano
2012-11-20  0:48             ` Junio C Hamano
2012-11-21 20:27               ` Jeff King
2012-10-28 19:56   ` crash on git diff-tree -Ganything <tree> for new files with textconv filter Peter Oberndorfer
2012-10-29  6:05     ` Jeff King
2012-10-29  6:18       ` Jeff King
2012-10-29 20:19       ` Peter Oberndorfer
2012-10-29 22:35         ` Jeff King
2012-10-29 22:47           ` Jeff King
2012-10-30 12:17             ` Jeff King
2012-10-30 12:46               ` Junio C Hamano [this message]
2012-10-30 13:12                 ` Jeff King
2012-11-01 19:19               ` Ramsay Jones
2012-11-07 21:10           ` Peter Oberndorfer
2012-11-07 21:13             ` Jeff King
2013-06-03 17:25               ` Peter Oberndorfer
2013-06-03 22:17                 ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=da24b6ea-ac9b-46dd-b591-25fd4e8e6504@email.android.com \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=kumbayo84@arcor.de \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).