git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: Thomas Rast <trast@student.ethz.ch>, Jehan Bing <jehan@orb.com>,
	git@vger.kernel.org
Subject: Re: [PATCH v3 0/3]
Date: Sat, 18 Feb 2012 03:52:21 -0500	[thread overview]
Message-ID: <20120218085221.GA13922@sigill.intra.peff.net> (raw)
In-Reply-To: <7v8vk0r481.fsf@alter.siamese.dyndns.org>

On Fri, Feb 17, 2012 at 11:27:26PM -0800, Junio C Hamano wrote:

> > I'd be happy if we just ignored SIGPIPE everywhere, but turned it on for
> > the log family.
> 
> Hmmmm, now you confused me...  What is special about the log family?  Do
> you mean "when we use pager"?  But then we are writing into the pager,
> which the user can make it exit, which in turn causes us to write into the
> pipe, so I would expect that we would want to ignore SIGPIPE --- ah, then
> we explicitly catch error in xwrite() and say die() which we do not want.

Sort of. I mentioned the log family because those are the commands that
tend to generate large amounts of input, and which are likely to hit
SIGPIPE. But let me elaborate on my thinking a bit.

There are basically three types of writing callsites in git:

  1. Careful calls to write() or fwrite().

     These are the majority of calls. In these, we check the return
     value of write() and die, return the error code, or whatever is
     appropriate for the callsite. SIGPIPE kills the program before the
     careful code gets the chance to make a decision about what to do.
     At best, this is a nuisance; even if the program is going to die(),
     it's likely that the custom code could produce a useful error
     message. At worst it causes bugs. For example, we may write to a
     subprocess that only needs part of our input to make a decision
     (e.g., some hooks and credential helpers). With SIGPIPE, we end up
     dying even though no error has occurred. Worse, these are often
     annoying heisenbugs that depend on the timing of write() and
     close() between the two processes.

  2. Non-careful calls of small, fixed-size output.

     Things like "git remote" will write some output to stdout via
     printf and not bother checking the return code. As the main purpose
     of the program is to create output, it's OK to be killed by
     SIGPIPE if it happens due to a write to stdout. But respecting
     SIGPIPE doesn't buy as all that much. It might save us from making
     a few write() calls that will go to nowhere, but most of the work
     has already been done by the time we are outputting.

     And it has a cost to the other careful write calls in the same
     program, because respecting SIGPIPE is a per-process thing. So for
     something like "git remote show foo", while we would like SIGPIPE
     to kick in for output to stdout, we would not want it for a pipe we
     opened to talk to ls-remote.

  3. Non-careful calls of large, streaming data.

     Commands like "git log" will non-carefully output to stdout, as
     well, but they will generate tons of data, consuming possibly
     minutes of CPU time (e.g., "git log -p"). If whoever is reading the
     output stops doing so, we really want to kill the program to avoid
     wasting CPU time.  In this instance, SIGPIPE is a big win.

     It still has the downside that careful calls in the same program
     will be subject to using SIGPIPE. For "log" and friends, this is
     probably OK with the current code, as we don't make a lot of pipes.
     But that is somewhat of an implementation detail. E.g., "git log
     -p" with external diff or textconv writes to a tempfile, and then
     runs a subprocess with the tempfile as input. But we could just as
     easily have used pipes, and may choose to do so in the future. You
     may even be able to trigger a convert_to_git filter in the current
     code, which does use pipes.

So basically, I find SIGPIPE to be a simplistic too-heavy hammer that
ends up affecting all writes to pipes, when in reality we are only
interested in affecting ones to some "main" output (usually stdout).
That works OK for small Unix-y programs like "head", but is overly
simplistic for something as big as git.

In an ideal world, we could set a per-descriptor version of SIGPIPE, and
just turn it on for stdout (or somehow find out which descriptor caused
the SIGPIPE after the fact). But that's not possible.

So our next best thing would be to actually check the results of
these non-careful writes. Unfortunately, this means either:

  a. wrapping every printf with a function that will appropriately die
     on error. This makes the code more cumbersome.

  or

  b. occasionally checking ferror(stdout) while doing long streams
     (e.g., checking it after each commit is written in git log, and
     aborting if we saw a write error). This is less cumbersome, but it
     does mean that errno may or may not still be accurate by the time
     we notice the error. So it's hard to reliably differentiate EPIPE
     from other errors. It would be nice, for example, to have git log
     exit silently on EPIPE, but properly print an error for something
     like ENOSPC. But perhaps that isn't a big deal, as I believe right
     now that we would silently ignore something like ENOSPC.

Less robust than that is to just ignore SIGPIPE in most git programs
(which don't benefit from it, and where it is only a liability), but
then manually enable it for the few that care (the log family, and
perhaps diff. Maybe things like "git tag -l", though that output tends
to be pretty small). But I think it would work OK in practice, because
those commands don't tend to make pipes other than the "main" output.
And it's quite easy to implement.

> So you want to let SIGPIPE silently kill us when the pager is in use; is
> that it?

That's an OK heuristic, as the pager being in use is a good sign that we
will generate long, streaming output. It has two downsides, though. One
is that it suffers from the too-heavy hammer of the preceding paragraph.
The other is that it only catches when _we_ create the pipe. You would
also want to catch something like:

  git rev-list | head

and stop the traversal. So I think it is less about whether a pager is
in use and more about whether we are creating long output that would
benefit from being cut off early. The pager is an indicator of that, but
it's not a perfect one; I think we'd do better to mark those spots
manually (i.e., by re-enabling SIGPIPE in commands that we deem
appropriate).

-Peff

  reply	other threads:[~2012-02-18  8:52 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-16 21:41 [PATCH v2 0/3] Adding a performance framework Thomas Rast
2012-02-16 21:41 ` [PATCH v2 1/3] Move the user-facing test library to test-lib-functions.sh Thomas Rast
2012-02-16 22:14   ` Junio C Hamano
2012-02-17  9:00     ` Thomas Rast
2012-02-17 10:25     ` [PATCH v3 0/3] Thomas Rast
2012-02-17 10:25       ` [PATCH v3 1/3] Move the user-facing test library to test-lib-functions.sh Thomas Rast
2012-02-17 10:25       ` [PATCH v3 2/3] Introduce a performance testing framework Thomas Rast
2012-02-17 10:25       ` [PATCH v3 3/3] Add a performance test for git-grep Thomas Rast
2012-02-17 17:03       ` [PATCH v3 0/3] Junio C Hamano
2012-02-17 23:28         ` Junio C Hamano
2012-02-18  0:51           ` Jeff King
2012-02-18  7:27             ` Junio C Hamano
2012-02-18  8:52               ` Jeff King [this message]
2012-02-18 10:06                 ` SIGPIPE handling (Re: [PATCH v3 0/3]) Jonathan Nieder
2012-02-18 10:10                   ` Jonathan Nieder
2012-02-18 10:24                   ` Jeff King
2012-02-16 21:41 ` [PATCH v2 2/3] Introduce a performance testing framework Thomas Rast
2012-02-17  7:45   ` Jeff King
2012-02-16 21:41 ` [PATCH v2 3/3] Add a performance test for git-grep Thomas Rast
  -- strict thread matches above, loose matches on Subject: below --
2018-11-11 23:58 [PATCH v2 0/2] refactoring branch colorization to ref-filter nbelakovski
2018-12-16 21:57 ` [PATCH v3 0/3] nbelakovski
2018-12-18 17:25   ` Jeff King
2021-06-21 15:16 [PATCH v2 0/3] bundle.c: remove "ref_list" in favor of string-list.c API Ævar Arnfjörð Bjarmason
2021-06-30 14:06 ` [PATCH v3 0/3] Ævar Arnfjörð Bjarmason
2021-06-30 17:34   ` Jeff King
2021-06-30 17:45     ` Jeff King
2021-06-30 18:00       ` Ævar Arnfjörð Bjarmason
2021-07-01 10:53         ` Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120218085221.GA13922@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jehan@orb.com \
    --cc=trast@student.ethz.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).