From: Jeff King <peff@peff.net>
To: Peter Oberndorfer <kumbayo84@arcor.de>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>
Subject: Re: crash on git diff-tree -Ganything <tree> for new files with textconv filter
Date: Tue, 30 Oct 2012 08:17:48 -0400 [thread overview]
Message-ID: <20121030121747.GA4231@sigill.intra.peff.net> (raw)
In-Reply-To: <20121029224705.GA32148@sigill.intra.peff.net>
On Mon, Oct 29, 2012 at 06:47:05PM -0400, Jeff King wrote:
> On Mon, Oct 29, 2012 at 06:35:21PM -0400, Jeff King wrote:
>
> > The patch below fixes it, but it's terribly inefficient (it just detects
> > the situation and reallocates). It would be much better to disable the
> > reuse_worktree_file mmap when we populate the filespec, but it is too
> > late to pass an option; we may have already populated from an earlier
> > diffcore stage.
> >
> > I guess if we teach the whole diff code that "-G" (and --pickaxe-regex)
> > is brittle, we can disable the optimization from the beginning based on
> > the diff options. I'll take a look.
>
> Hmm. That is problematic for two reasons.
>
> 1. The whole diff call chain will have to be modified to pass the
> options around, so they can make it down to the
> diff_populate_filespec level. Alternatively, we could do some kind
> of global hack, which is ugly but would work OK in practice.
>
> 2. Reusing a working tree file is only half of the reason a filespec
> might be mmap'd. It might also be because we are literally diffing
> the working tree. "-G" was meant to be used to limit log traversal,
> but it also works to reduce the diff output for something like "git
> diff HEAD^".
>
> I really wish there were an alternate regexec interface we could use
> that took a pointer/size pair. Bleh.
Thinking on it more, my patch, hacky thought it seems, may not be the
worst solution. Here are the options that I see:
1. Use a regex library that does not require NUL termination. If we
are bound by the regular regexec interface, this is not feasible.
But the GNU implementation works on arbitrary-length buffers (you
just have to use a slightly different interface), and we already
carry it in compat. It would mean platforms which provide a working
but non-GNU regexec would have to start defining NO_REGEX.
2. Figure out a way to get one extra zero byte via mmap. If the
requested size does not fall on a page boundary, you get extra
zero-ed bytes. Unfortunately, requesting an extra byte does not
do what we want; you get SIGBUS accessing it.
3. Copy mmap'd data at point-of-use into a NUL-terminated buffer. That
way we only incur the cost when we need it.
4. Avoid mmap-ing in the first place when we are using -G or
--pickaxe-regex (e.g., by doing a big read()). At first glance,
this sounds more efficient than loading the data one way and then
making another copy. But mmap+memcpy, aside from the momentary
doubled memory requirement, is probably just as fast or faster than
calling read() repeatedly.
I am really tempted by (1).
Given that (2) does not work, unless somebody comes up with something
clever there, that would make (3) the next best choice.
-Peff
next prev parent reply other threads:[~2012-10-30 12:18 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-10-27 18:37 crash on git diff-tree -Ganything <tree> for new files with textconv filter Peter Oberndorfer
2012-10-28 12:01 ` Jeff King
2012-10-28 12:45 ` [PATCH 0/2] textconv support for "log -S" Jeff King
2012-10-28 12:46 ` [PATCH 1/2] pickaxe: hoist empty needle check Jeff King
2012-10-28 12:47 ` [PATCH 2/2] pickaxe: use textconv for -S counting Jeff King
2012-11-13 23:13 ` Junio C Hamano
2012-11-15 1:21 ` Jeff King
2012-11-20 0:31 ` Junio C Hamano
2012-11-20 0:48 ` Junio C Hamano
2012-11-21 20:27 ` Jeff King
2012-10-28 19:56 ` crash on git diff-tree -Ganything <tree> for new files with textconv filter Peter Oberndorfer
2012-10-29 6:05 ` Jeff King
2012-10-29 6:18 ` Jeff King
2012-10-29 20:19 ` Peter Oberndorfer
2012-10-29 22:35 ` Jeff King
2012-10-29 22:47 ` Jeff King
2012-10-30 12:17 ` Jeff King [this message]
2012-10-30 12:46 ` Junio C Hamano
2012-10-30 13:12 ` Jeff King
2012-11-01 19:19 ` Ramsay Jones
2012-11-07 21:10 ` Peter Oberndorfer
2012-11-07 21:13 ` Jeff King
2013-06-03 17:25 ` Peter Oberndorfer
2013-06-03 22:17 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121030121747.GA4231@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=kumbayo84@arcor.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).