git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] do not require filters to consume stdin
@ 2011-08-29 20:31 Joey Hess
  2011-08-29 22:53 ` Junio C Hamano
  2011-12-05 19:29 ` hooks that do not consume stdin sometimes crash git with SIGPIPE Joey Hess
  0 siblings, 2 replies; 7+ messages in thread
From: Joey Hess @ 2011-08-29 20:31 UTC (permalink / raw
  To: git

A clean filter that uses %f to examine a file may not need to consume
the entire file content from stdin every time it's run, due to caching
or other optimisations to support large files.

Ignore the SIGPIPE that may result when writing to the filter
if it exits without consuming stdin, and do not check that all
content is sent to it. The filter is still required to exit
successfully, so a crash in the filter should still be handled
correctly.
---

There has been discussion before about using clean and smudge filters
with %f to handle big files in git, with the file content stored outside
git somewhere.  A simplistic clean filter for large files could look
like this:

#!/bin/sh
file="$1"
ln -f $file ~/.big/$file
echo $file

But trying to use this will fail on truely large files. For example:

$ ls -l sorta.huge 
-rw-r--r-- 3 joey joey 524288000 Aug 29 15:19 sorta.huge
$ git add sorta.huge 
broken pipe  git add sorta.huge
$ echo $?
141

The SIGPIPE occurs because git expects the clean filter to read
the full file content from stdin. (Although if the content is small
enough, a SIGPIPE may not occur.) So the clean filter needs to
look like this:

#!/bin/sh
file="$1"
cat >/dev/null
ln -f $file ~/.big/$file
echo $file

But this means much more work has to be done whenever the clean filter
is run. Including every time git status is run. So it's currently
impractical to use clean/smudge filters like this for large files.
This patch should close that gap and allow such filters to be developed.

 convert.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/convert.c b/convert.c
index 416bf83..3d528c4 100644
--- a/convert.c
+++ b/convert.c
@@ -330,7 +330,7 @@ static int filter_buffer(int in, int out, void *data)
 	 */
 	struct child_process child_process;
 	struct filter_params *params = (struct filter_params *)data;
-	int write_err, status;
+	int write_err = 0, status;
 	const char *argv[] = { NULL, NULL };
 
 	/* apply % substitution to cmd */
@@ -360,9 +360,11 @@ static int filter_buffer(int in, int out, void *data)
 	if (start_command(&child_process))
 		return error("cannot fork to run external filter %s", params->cmd);
 
-	write_err = (write_in_full(child_process.in, params->src, params->size) < 0);
+	signal(SIGPIPE, SIG_IGN);
+	write_in_full(child_process.in, params->src, params->size);
 	if (close(child_process.in))
 		write_err = 1;
+	signal(SIGPIPE, SIG_DFL);
 	if (write_err)
 		error("cannot feed the input to external filter %s", params->cmd);
 
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] do not require filters to consume stdin
  2011-08-29 20:31 [PATCH] do not require filters to consume stdin Joey Hess
@ 2011-08-29 22:53 ` Junio C Hamano
  2011-08-30  1:20   ` Joey Hess
  2011-12-05 19:29 ` hooks that do not consume stdin sometimes crash git with SIGPIPE Joey Hess
  1 sibling, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2011-08-29 22:53 UTC (permalink / raw
  To: Joey Hess; +Cc: git

Joey Hess <joey@kitenet.net> writes:

> There has been discussion before about using clean and smudge filters
> with %f to handle big files in git, with the file content stored outside
> git somewhere.  A simplistic clean filter for large files could look
> like this:
>
> #!/bin/sh
> file="$1"
> ln -f $file ~/.big/$file
> echo $file

Isn't this filter already broken if clean request is for a blob contents
that is different from what is on the filesystem?  The name %f is passed
to give the filter a _hint_ on what the path is about (so that the filter
can choose to work differently depending on the extension, for example),
but the data may or may not come from the filesystem, depending on what is
calling the filter, no?

Most notably, renormalize_buffer() would call convert_to_git() on a buffer
that is internal, possibly quite different from what is in the working
tree.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] do not require filters to consume stdin
  2011-08-29 22:53 ` Junio C Hamano
@ 2011-08-30  1:20   ` Joey Hess
  0 siblings, 0 replies; 7+ messages in thread
From: Joey Hess @ 2011-08-30  1:20 UTC (permalink / raw
  To: git

[-- Attachment #1: Type: text/plain, Size: 1161 bytes --]

Junio C Hamano wrote:
> Isn't this filter already broken if clean request is for a blob contents
> that is different from what is on the filesystem?  The name %f is passed
> to give the filter a _hint_ on what the path is about (so that the filter
> can choose to work differently depending on the extension, for example),
> but the data may or may not come from the filesystem, depending on what is
> calling the filter, no?
> 
> Most notably, renormalize_buffer() would call convert_to_git() on a buffer
> that is internal, possibly quite different from what is in the working
> tree.

So during a merge.

gitattributes(5) is not very clear about this, it would probably be good
to add a caveat there about what %f is not.

This seems to make it impractical to build the sort of thing described here:
http://lists-archives.org/git/737857-fwd-git-and-large-binaries-a-proposed-solution.html

Arguably that thread already reached the same conclusion about using
smudge/clean for handling large files, for other reasons. Since I
already have something that works without smudge/clean, perhaps I should
give up on them.

-- 
see shy jo

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* hooks that do not consume stdin sometimes crash git with SIGPIPE
  2011-08-29 20:31 [PATCH] do not require filters to consume stdin Joey Hess
  2011-08-29 22:53 ` Junio C Hamano
@ 2011-12-05 19:29 ` Joey Hess
  2011-12-05 21:43   ` Jeff King
  2011-12-06  1:39   ` Junio C Hamano
  1 sibling, 2 replies; 7+ messages in thread
From: Joey Hess @ 2011-12-05 19:29 UTC (permalink / raw
  To: git; +Cc: Lars Wirzenius

[-- Attachment #1: Type: text/plain, Size: 1638 bytes --]

We had a weird problem where, after moving to a new, faster server,
"git push" would sometimes fail like this:

Unpacking objects: 100% (3/3), done.
fatal: The remote end hung up unexpectedly
fatal: The remote end hung up unexpectedly

Turns out that git-receive-pack was dying due to an uncaught SIGPIPE.
The SIGPIPE occurred when it tried to write to the pre-receive hook's
stdin. The pre-receive hook, in this case, was able to do all the checks
it needed to do[1] without the input, and so did exit(0) without
consuming it.

Apparently that causes a race. Most of the time, git forks the hook,
writes output to the hook, and then the hook runs, ignores it, and exits.
But sometimes, on our new faster (and SMP) server, git forked the hook, and
it ran, and exited, before git got around to writing to it, resulting in
the SIGPIPE.

write(7, "c9f98c67d70a1cfeba382ec27d87644a"..., 100) = -1 EPIPE (Broken pipe)
--- SIGPIPE (Broken pipe) @ 0 (0) ---

I think git should ignore SIGPIPE when writing to hooks. Otherwise,
hooks may have to go out of their way to consume all input, and as I've
seen, the races when they fail to do this can lurk undiscovered.

Note that I encountered this same sort of problem from another direction
(involving smudge filters) not long ago, and sent a patch, in
<20110829203107.GA4946@gnu.kitenet.net>. That wasn't applied, and is
in different code than the case I outlined above.

-- 
see shy jo

[1] If you're wondering, it only needed to check that the push was
    coming from a trusted UID. With an untrusted UID, it did further
    checks that consumed the stdin.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: hooks that do not consume stdin sometimes crash git with SIGPIPE
  2011-12-05 19:29 ` hooks that do not consume stdin sometimes crash git with SIGPIPE Joey Hess
@ 2011-12-05 21:43   ` Jeff King
  2011-12-06  1:39   ` Junio C Hamano
  1 sibling, 0 replies; 7+ messages in thread
From: Jeff King @ 2011-12-05 21:43 UTC (permalink / raw
  To: Joey Hess; +Cc: git, Lars Wirzenius

On Mon, Dec 05, 2011 at 03:29:30PM -0400, Joey Hess wrote:

> We had a weird problem where, after moving to a new, faster server,
> "git push" would sometimes fail like this:
> 
> Unpacking objects: 100% (3/3), done.
> fatal: The remote end hung up unexpectedly
> fatal: The remote end hung up unexpectedly
> 
> Turns out that git-receive-pack was dying due to an uncaught SIGPIPE.
> The SIGPIPE occurred when it tried to write to the pre-receive hook's
> stdin. The pre-receive hook, in this case, was able to do all the checks
> it needed to do[1] without the input, and so did exit(0) without
> consuming it.

Ugh. Yeah, receive-pack should probably just be ignoring SIGPIPE
entirely. It checks the return from write() properly where it matters,
so SIGPIPE is just an annoyance.

I would go so far as to say that git should be ignoring SIGPIPE 99% of
the time. We have crap like this sprinkled throughout the code base:

  $ git grep -C1 SIGPIPE
  builtin/tag.c-  /* When the username signingkey is bad, program could be terminated
  builtin/tag.c:   * because gpg exits without reading and then write gets SIGPIPE. */
  builtin/tag.c:  signal(SIGPIPE, SIG_IGN);
  [...]
  upload-pack.c-   * If rev-list --stdin encounters an unknown commit, it
  upload-pack.c:   * terminates, which will cause SIGPIPE in the write loop
  upload-pack.c-   */
  upload-pack.c:  sigchain_push(SIGPIPE, SIG_IGN);

but I find it highly unlikely that they are covering all of the cases.
You found one already, and these things can quite often be
race-condition heisenbugs.

The one place where SIGPIPE _is_ useful is for things like "git log"
which are just dumping to a pager over stdout. When the pager dies, we
can stop bothering to produce output.

It would be really nice if we could write a sigpipe handler that knew
which fd caused the the signal, and then we could do something like:

  void sigpipe_handler(int sig)
  {
          /* If we're writing to a pager over stdout, then there is
           * little use in writing more; nobody is interested in our
           * output. */
          if (get_fd_that_caused_sigpipe() == 1 && pager_in_use)
                  exit(1);

          /* Otherwise, ignore it, as it's a write to some auxiliary
           * process and we will be careful about checking the return
           * code from write(). */
  }

But I don't think such a function exists. We could just check
pager_in_use, which would cover most cases (e.g., programs like
receive-pack don't use a pager). But it would fail in the case of
something like "git log" using an auxiliary process that closes the pipe
early. Maybe that would be good enough, though. I dunno.

Another option is to just ignore SIGPIPE entirely, and convince programs
like log to actually bother checking the result of write(). It would be
a slight pain to check every printf() call we make in log-tree.c, but we
could do one of:

  1. Make a set of stdio wrapper scripts that exit gracefully on EPIPE.
     Using the wrapper scripts everywhere is a slight pain, but would
     work pretty well, and adapts easily to other places like printing
     lists of refs, etc.

  2. Don't check every printf. After printing each commit, check
     ferror(stdout) and exit as appropriate. This is a very small amount
     of code, but you'd need to do it in several places (i.e., anywhere
     that produces a lot of output).

> I think git should ignore SIGPIPE when writing to hooks. Otherwise,
> hooks may have to go out of their way to consume all input, and as I've
> seen, the races when they fail to do this can lurk undiscovered.

Yeah, certainly. The question to me is whether we should just stick a
SIG_IGN in the beginning of receive-pack, or whether we should try to
deal with this problem everywhere.

For example, I suspect the same problem exists in the credential helpers
I wrote recently. Generally they will read all of their input, but you
could do something like:

  [credential "https://github.com"]
          username = peff
          helper = "!
                  f() {
                          # if we call this helper, we know we want the
                          # github password,  or else this helper config
                          # would never have been triggered.  so we
                          # don't even have to bother reading our stdin.

                          # We only handle getting the password.
                          test "$1" = "get" || return

                          # Presumably gpg-agent will ask for and cache
                          # your gpg password.
                          p=`gpg -qd --no-tty <~/.github-password.gpg`

                          echo "password=$p"
                  }; f"

This can racily fail if our write happens after the helper has already
finished (unlikely, but possible).

-Peff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: hooks that do not consume stdin sometimes crash git with SIGPIPE
  2011-12-05 19:29 ` hooks that do not consume stdin sometimes crash git with SIGPIPE Joey Hess
  2011-12-05 21:43   ` Jeff King
@ 2011-12-06  1:39   ` Junio C Hamano
  2011-12-06  3:11     ` Joey Hess
  1 sibling, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2011-12-06  1:39 UTC (permalink / raw
  To: Joey Hess; +Cc: git, Lars Wirzenius

Joey Hess <joey@kitenet.net> writes:

> We had a weird problem where, after moving to a new, faster server,
> "git push" would sometimes fail like this:
>
> Unpacking objects: 100% (3/3), done.
> fatal: The remote end hung up unexpectedly
> fatal: The remote end hung up unexpectedly
>
> Turns out that git-receive-pack was dying due to an uncaught SIGPIPE.

Why do you have a hook that is expected to read from receive-pack that
does _not_ read anything from it in the first place? If you do not care
about the update status given to pre-receive, shouldn't you be using the
update hook and ignoring the command line parameters instead?

I am not saying this is a user configuration error and there is nothing to
fix---Git shouldn't get killed merely because of configuration error.

I am wondering if we would want to have a uniform way to tell run_*_hook()
functions that the hook writer explicitly declines to get any input. E.g.
"hooks/pre-receive-noinput" is called instead of "hooks/pre-receive" and
we do not send any input to it, or something like that.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: hooks that do not consume stdin sometimes crash git with SIGPIPE
  2011-12-06  1:39   ` Junio C Hamano
@ 2011-12-06  3:11     ` Joey Hess
  0 siblings, 0 replies; 7+ messages in thread
From: Joey Hess @ 2011-12-06  3:11 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git, Lars Wirzenius

[-- Attachment #1: Type: text/plain, Size: 737 bytes --]

Junio C Hamano wrote:
> Why do you have a hook that is expected to read from receive-pack that
> does _not_ read anything from it in the first place? If you do not care
> about the update status given to pre-receive, shouldn't you be using the
> update hook and ignoring the command line parameters instead?

My hook *does* consume the stdin in one case, but in another case it
does no checks and so can immediately exit. 

Also, I didn't want it to be run once per updated ref as the update hook
is, since the tests it performs are rather expensive -- loading a perl
wiki engine in order to check that the changeset contains only changes to
wiki pages that are allowed based on the wiki's configuration.

-- 
see shy jo

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-12-06  3:11 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-08-29 20:31 [PATCH] do not require filters to consume stdin Joey Hess
2011-08-29 22:53 ` Junio C Hamano
2011-08-30  1:20   ` Joey Hess
2011-12-05 19:29 ` hooks that do not consume stdin sometimes crash git with SIGPIPE Joey Hess
2011-12-05 21:43   ` Jeff King
2011-12-06  1:39   ` Junio C Hamano
2011-12-06  3:11     ` Joey Hess

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).