git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] upload-pack: use buffered I/O to talk to rev-list
@ 2020-08-02 14:38 René Scharfe
  2020-08-02 16:03 ` Chris Torek
  2020-08-12 16:52 ` [PATCH v2] " René Scharfe
  0 siblings, 2 replies; 9+ messages in thread
From: René Scharfe @ 2020-08-02 14:38 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Junio C Hamano

Like f0bca72dc77 (send-pack: use buffered I/O to talk to pack-objects,
2016-06-08), significantly reduce the number of system calls and
simplify the code for sending object IDs to rev-list by using stdio's
buffering and handling errors after the loops.

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 upload-pack.c | 21 +++++++--------------
 1 file changed, 7 insertions(+), 14 deletions(-)

diff --git a/upload-pack.c b/upload-pack.c
index 86737410709..9f616c2c6a6 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -595,10 +595,9 @@ static int do_reachable_revlist(struct child_process *cmd,
 	static const char *argv[] = {
 		"rev-list", "--stdin", NULL,
 	};
+	FILE *cmd_in;
 	struct object *o;
-	char namebuf[GIT_MAX_HEXSZ + 2]; /* ^ + hash + LF */
 	int i;
-	const unsigned hexsz = the_hash_algo->hexsz;

 	cmd->argv = argv;
 	cmd->git_cmd = 1;
@@ -616,8 +615,8 @@ static int do_reachable_revlist(struct child_process *cmd,
 	if (start_command(cmd))
 		goto error;

-	namebuf[0] = '^';
-	namebuf[hexsz + 1] = '\n';
+	cmd_in = xfdopen(cmd->in, "w");
+
 	for (i = get_max_object_index(); 0 < i; ) {
 		o = get_indexed_object(--i);
 		if (!o)
@@ -626,11 +625,8 @@ static int do_reachable_revlist(struct child_process *cmd,
 			o->flags &= ~TMP_MARK;
 		if (!is_our_ref(o, allow_uor))
 			continue;
-		memcpy(namebuf + 1, oid_to_hex(&o->oid), hexsz);
-		if (write_in_full(cmd->in, namebuf, hexsz + 2) < 0)
-			goto error;
+		fprintf(cmd_in, "^%s\n", oid_to_hex(&o->oid));
 	}
-	namebuf[hexsz] = '\n';
 	for (i = 0; i < src->nr; i++) {
 		o = src->objects[i].item;
 		if (is_our_ref(o, allow_uor)) {
@@ -640,12 +636,11 @@ static int do_reachable_revlist(struct child_process *cmd,
 		}
 		if (reachable && o->type == OBJ_COMMIT)
 			o->flags |= TMP_MARK;
-		memcpy(namebuf, oid_to_hex(&o->oid), hexsz);
-		if (write_in_full(cmd->in, namebuf, hexsz + 1) < 0)
-			goto error;
+		fprintf(cmd_in, "%s\n", oid_to_hex(&o->oid));
 	}
-	close(cmd->in);
 	cmd->in = -1;
+	if (fclose(cmd_in))
+		goto error;
 	sigchain_pop(SIGPIPE);

 	return 0;
@@ -653,8 +648,6 @@ static int do_reachable_revlist(struct child_process *cmd,
 error:
 	sigchain_pop(SIGPIPE);

-	if (cmd->in >= 0)
-		close(cmd->in);
 	if (cmd->out >= 0)
 		close(cmd->out);
 	return -1;
--
2.28.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] upload-pack: use buffered I/O to talk to rev-list
  2020-08-02 14:38 [PATCH] upload-pack: use buffered I/O to talk to rev-list René Scharfe
@ 2020-08-02 16:03 ` Chris Torek
  2020-08-03 14:00   ` René Scharfe
  2020-08-03 18:15   ` Johannes Sixt
  2020-08-12 16:52 ` [PATCH v2] " René Scharfe
  1 sibling, 2 replies; 9+ messages in thread
From: Chris Torek @ 2020-08-02 16:03 UTC (permalink / raw)
  To: René Scharfe; +Cc: Git Mailing List, Junio C Hamano

One suggestion here:

On Sun, Aug 2, 2020 at 7:41 AM René Scharfe <l.s.r@web.de> wrote:
> Like f0bca72dc77 (send-pack: use buffered I/O to talk to pack-objects,
> 2016-06-08), significantly reduce the number of system calls and
> simplify the code for sending object IDs to rev-list by using stdio's
> buffering and handling errors after the loops.
>
> Signed-off-by: René Scharfe <l.s.r@web.de>
> ---
>  upload-pack.c | 21 +++++++--------------
>  1 file changed, 7 insertions(+), 14 deletions(-)
>
> diff --git a/upload-pack.c b/upload-pack.c
> index 86737410709..9f616c2c6a6 100644
> --- a/upload-pack.c
> +++ b/upload-pack.c

[snip]

> @@ -640,12 +636,11 @@ static int do_reachable_revlist(struct child_process *cmd,
>                 }
>                 if (reachable && o->type == OBJ_COMMIT)
>                         o->flags |= TMP_MARK;
> -               memcpy(namebuf, oid_to_hex(&o->oid), hexsz);
> -               if (write_in_full(cmd->in, namebuf, hexsz + 1) < 0)
> -                       goto error;
> +               fprintf(cmd_in, "%s\n", oid_to_hex(&o->oid));

The fprintf() call here *can* return an error, e.g., if the
connection has died.  If it does, it should set things up so that
a later ferror(cmd_in) returns true.

>         }
> -       close(cmd->in);
>         cmd->in = -1;
> +       if (fclose(cmd_in))
> +               goto error;

The fclose() call doesn't necessarily check ferror().  (The
FreeBSD stdio in particular definitely does not.)  It might
be better to use:

    failure = ferror(cmd_in);
    failure |= fclose(cmd_in);
    if (failure) ...

here, or similar.  (The temporary variable is not needed,
but someone might assume `if (ferror(fp) | fclose(fp))` is
a typo for `if (ferror(fp) || fclose(fp))`.)

(Note: my sample isn't properly indented as gmail does not
let me insert tabs easily)

Chris

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] upload-pack: use buffered I/O to talk to rev-list
  2020-08-02 16:03 ` Chris Torek
@ 2020-08-03 14:00   ` René Scharfe
  2020-08-03 15:54     ` Chris Torek
  2020-08-03 18:15   ` Johannes Sixt
  1 sibling, 1 reply; 9+ messages in thread
From: René Scharfe @ 2020-08-03 14:00 UTC (permalink / raw)
  To: Chris Torek; +Cc: Git Mailing List, Junio C Hamano

Am 02.08.20 um 18:03 schrieb Chris Torek:
> One suggestion here:
>
> On Sun, Aug 2, 2020 at 7:41 AM René Scharfe <l.s.r@web.de> wrote:
>> Like f0bca72dc77 (send-pack: use buffered I/O to talk to pack-objects,
>> 2016-06-08), significantly reduce the number of system calls and
>> simplify the code for sending object IDs to rev-list by using stdio's
>> buffering and handling errors after the loops.
>>
>> Signed-off-by: René Scharfe <l.s.r@web.de>
>> ---
>>  upload-pack.c | 21 +++++++--------------
>>  1 file changed, 7 insertions(+), 14 deletions(-)
>>
>> diff --git a/upload-pack.c b/upload-pack.c
>> index 86737410709..9f616c2c6a6 100644
>> --- a/upload-pack.c
>> +++ b/upload-pack.c
>
> [snip]
>
>> @@ -640,12 +636,11 @@ static int do_reachable_revlist(struct child_process *cmd,
>>                 }
>>                 if (reachable && o->type == OBJ_COMMIT)
>>                         o->flags |= TMP_MARK;
>> -               memcpy(namebuf, oid_to_hex(&o->oid), hexsz);
>> -               if (write_in_full(cmd->in, namebuf, hexsz + 1) < 0)
>> -                       goto error;
>> +               fprintf(cmd_in, "%s\n", oid_to_hex(&o->oid));
>
> The fprintf() call here *can* return an error, e.g., if the
> connection has died.  If it does, it should set things up so that
> a later ferror(cmd_in) returns true.
>
>>         }
>> -       close(cmd->in);
>>         cmd->in = -1;
>> +       if (fclose(cmd_in))
>> +               goto error;
>
> The fclose() call doesn't necessarily check ferror().  (The
> FreeBSD stdio in particular definitely does not.)  It might
> be better to use:
>
>     failure = ferror(cmd_in);
>     failure |= fclose(cmd_in);
>     if (failure) ...
>
> here, or similar.  (The temporary variable is not needed,
> but someone might assume `if (ferror(fp) | fclose(fp))` is
> a typo for `if (ferror(fp) || fclose(fp))`.)

Thanks, didn't know that.  That's awful.  So the sentence "The fclose()
and fdclose() functions may also fail and set errno for any of the
errors specified for fflush(3)." from the FreeBSD manpage for fclose(3)
actually means that while it will flush, it is free to ignore any
flush errors?

Or do you mean that fflush() can succeed on a stream that has its error
indicator set?

In any case, we'd then better add a function that flushes the buffer,
closes the stream and reports errors in its return code and errno --
i.e. a sane fclose().

René


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] upload-pack: use buffered I/O to talk to rev-list
  2020-08-03 14:00   ` René Scharfe
@ 2020-08-03 15:54     ` Chris Torek
  0 siblings, 0 replies; 9+ messages in thread
From: Chris Torek @ 2020-08-03 15:54 UTC (permalink / raw)
  To: René Scharfe; +Cc: Git Mailing List, Junio C Hamano

On Mon, Aug 3, 2020 at 7:00 AM René Scharfe <l.s.r@web.de> wrote:
>
> Am 02.08.20 um 18:03 schrieb Chris Torek:
> > The fclose() call doesn't necessarily check ferror().  (The
> > FreeBSD stdio in particular definitely does not.)
>
> Thanks, didn't know that.  That's awful.  So the sentence "The fclose()
> and fdclose() functions may also fail and set errno for any of the
> errors specified for fflush(3)." from the FreeBSD manpage for fclose(3)
> actually means that while it will flush, it is free to ignore any
> flush errors?
>
> Or do you mean that fflush() can succeed on a stream that has its error
> indicator set?

The latter.  You have to get particularly (un?)lucky here: say the
buffer size is n, and the *last* write operation (fprintf, whatever)
filled the buffer mod n and called fflush internally and that failed.
Then at the time you call fclose() the buffer is empty.  There is
nothing to flush, so we just get the result of the system's close()
call.

> In any case, we'd then better add a function that flushes the buffer,
> closes the stream and reports errors in its return code and errno --
> i.e. a sane fclose().

Unfortunately the global-variable nature of errno means it may be
clobbered by the time you inspect it here, again with the same sort
of issue above.  It might be nice if C's error conventions were more
like Go's, and stdio retained an earlier error, but this stuff all dates
back to 16-bit "int" and 512-byte buffers and no room for nice stuff. :-)

Chris

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] upload-pack: use buffered I/O to talk to rev-list
  2020-08-02 16:03 ` Chris Torek
  2020-08-03 14:00   ` René Scharfe
@ 2020-08-03 18:15   ` Johannes Sixt
  1 sibling, 0 replies; 9+ messages in thread
From: Johannes Sixt @ 2020-08-03 18:15 UTC (permalink / raw)
  To: Chris Torek, René Scharfe; +Cc: Git Mailing List, Junio C Hamano

Am 02.08.20 um 18:03 schrieb Chris Torek:
> One suggestion here:
> 
> On Sun, Aug 2, 2020 at 7:41 AM René Scharfe <l.s.r@web.de> wrote:
>> Like f0bca72dc77 (send-pack: use buffered I/O to talk to pack-objects,
>> 2016-06-08), significantly reduce the number of system calls and
>> simplify the code for sending object IDs to rev-list by using stdio's
>> buffering and handling errors after the loops.
>>
>> Signed-off-by: René Scharfe <l.s.r@web.de>
>> ---
>>  upload-pack.c | 21 +++++++--------------
>>  1 file changed, 7 insertions(+), 14 deletions(-)
>>
>> diff --git a/upload-pack.c b/upload-pack.c
>> index 86737410709..9f616c2c6a6 100644
>> --- a/upload-pack.c
>> +++ b/upload-pack.c
> 
> [snip]
> 
>> @@ -640,12 +636,11 @@ static int do_reachable_revlist(struct child_process *cmd,
>>                 }
>>                 if (reachable && o->type == OBJ_COMMIT)
>>                         o->flags |= TMP_MARK;
>> -               memcpy(namebuf, oid_to_hex(&o->oid), hexsz);
>> -               if (write_in_full(cmd->in, namebuf, hexsz + 1) < 0)
>> -                       goto error;
>> +               fprintf(cmd_in, "%s\n", oid_to_hex(&o->oid));
> 
> The fprintf() call here *can* return an error, e.g., if the
> connection has died.  If it does, it should set things up so that
> a later ferror(cmd_in) returns true.

True. We need an explicit test after each fprintf anyway because SIGPIPE
may be ignored, and then writing fails with EPIPE. On Windows, this is
doubly important because we do not have SIGPIPE at all (and always see
EPIPE), but we see EPIPE only on the first failed write; subsequent
writes produce EINVAL.

-- Hannes

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2] upload-pack: use buffered I/O to talk to rev-list
  2020-08-02 14:38 [PATCH] upload-pack: use buffered I/O to talk to rev-list René Scharfe
  2020-08-02 16:03 ` Chris Torek
@ 2020-08-12 16:52 ` René Scharfe
  2020-08-13  5:17   ` Christian Couder
  1 sibling, 1 reply; 9+ messages in thread
From: René Scharfe @ 2020-08-12 16:52 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Junio C Hamano, Chris Torek, Johannes Sixt

Like f0bca72dc77 (send-pack: use buffered I/O to talk to pack-objects,
2016-06-08), significantly reduce the number of system calls and
simplify the code for sending object IDs to rev-list by using stdio's
buffering.

Take care to handle errors immediately to get the correct error code,
and to flush the buffer explicitly before closing the stream in order to
catch any write errors for these last bytes.

Helped-by: Chris Torek <chris.torek@gmail.com>
Helped-by: Johannes Sixt <j6t@kdbg.org>
---
Changes since v1:
- Handle fprintf() errors immediately.
- Call ferror() and fflush() explicitly before calling fclose().
- Ignore fclose() errors, similar to the original code.
- Initialize the stream pointer and use a NULL check in the error
  handler to avoid a bogus compiler warning about it being used
  without initialization.

 upload-pack.c | 22 ++++++++++------------
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/upload-pack.c b/upload-pack.c
index 80ad9a38d81..811fdd47b58 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -603,9 +603,8 @@ static int do_reachable_revlist(struct child_process *cmd,
 		"rev-list", "--stdin", NULL,
 	};
 	struct object *o;
-	char namebuf[GIT_MAX_HEXSZ + 2]; /* ^ + hash + LF */
+	FILE *cmd_in = NULL;
 	int i;
-	const unsigned hexsz = the_hash_algo->hexsz;

 	cmd->argv = argv;
 	cmd->git_cmd = 1;
@@ -623,8 +622,8 @@ static int do_reachable_revlist(struct child_process *cmd,
 	if (start_command(cmd))
 		goto error;

-	namebuf[0] = '^';
-	namebuf[hexsz + 1] = '\n';
+	cmd_in = xfdopen(cmd->in, "w");
+
 	for (i = get_max_object_index(); 0 < i; ) {
 		o = get_indexed_object(--i);
 		if (!o)
@@ -633,11 +632,9 @@ static int do_reachable_revlist(struct child_process *cmd,
 			o->flags &= ~TMP_MARK;
 		if (!is_our_ref(o, allow_uor))
 			continue;
-		memcpy(namebuf + 1, oid_to_hex(&o->oid), hexsz);
-		if (write_in_full(cmd->in, namebuf, hexsz + 2) < 0)
+		if (fprintf(cmd_in, "^%s\n", oid_to_hex(&o->oid)) < 0)
 			goto error;
 	}
-	namebuf[hexsz] = '\n';
 	for (i = 0; i < src->nr; i++) {
 		o = src->objects[i].item;
 		if (is_our_ref(o, allow_uor)) {
@@ -647,11 +644,12 @@ static int do_reachable_revlist(struct child_process *cmd,
 		}
 		if (reachable && o->type == OBJ_COMMIT)
 			o->flags |= TMP_MARK;
-		memcpy(namebuf, oid_to_hex(&o->oid), hexsz);
-		if (write_in_full(cmd->in, namebuf, hexsz + 1) < 0)
+		if (fprintf(cmd_in, "%s\n", oid_to_hex(&o->oid)) < 0)
 			goto error;
 	}
-	close(cmd->in);
+	if (ferror(cmd_in) || fflush(cmd_in))
+		goto error;
+	fclose(cmd_in);
 	cmd->in = -1;
 	sigchain_pop(SIGPIPE);

@@ -660,8 +658,8 @@ static int do_reachable_revlist(struct child_process *cmd,
 error:
 	sigchain_pop(SIGPIPE);

-	if (cmd->in >= 0)
-		close(cmd->in);
+	if (cmd_in)
+		fclose(cmd_in);
 	if (cmd->out >= 0)
 		close(cmd->out);
 	return -1;
--
2.28.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] upload-pack: use buffered I/O to talk to rev-list
  2020-08-12 16:52 ` [PATCH v2] " René Scharfe
@ 2020-08-13  5:17   ` Christian Couder
  2020-08-13  5:57     ` René Scharfe
  0 siblings, 1 reply; 9+ messages in thread
From: Christian Couder @ 2020-08-13  5:17 UTC (permalink / raw)
  To: René Scharfe
  Cc: Git Mailing List, Junio C Hamano, Chris Torek, Johannes Sixt

On Wed, Aug 12, 2020 at 6:54 PM René Scharfe <l.s.r@web.de> wrote:

> -       close(cmd->in);
> +       if (ferror(cmd_in) || fflush(cmd_in))
> +               goto error;
> +       fclose(cmd_in);
>         cmd->in = -1;

I wonder if setting cmd->in to -1 is still useful...

>         sigchain_pop(SIGPIPE);
>
> @@ -660,8 +658,8 @@ static int do_reachable_revlist(struct child_process *cmd,
>  error:
>         sigchain_pop(SIGPIPE);
>
> -       if (cmd->in >= 0)
> -               close(cmd->in);
> +       if (cmd_in)
> +               fclose(cmd_in);

...as we don't check cmd->in anymore at the end of the function, but
we now check cmd_in instead. So should cmd_in have been set to -1
instead of cmd->in?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] upload-pack: use buffered I/O to talk to rev-list
  2020-08-13  5:17   ` Christian Couder
@ 2020-08-13  5:57     ` René Scharfe
  2020-08-13  9:01       ` Jeff King
  0 siblings, 1 reply; 9+ messages in thread
From: René Scharfe @ 2020-08-13  5:57 UTC (permalink / raw)
  To: Christian Couder
  Cc: Git Mailing List, Junio C Hamano, Chris Torek, Johannes Sixt

Am 13.08.20 um 07:17 schrieb Christian Couder:
> On Wed, Aug 12, 2020 at 6:54 PM René Scharfe <l.s.r@web.de> wrote:
>
>> -       close(cmd->in);
>> +       if (ferror(cmd_in) || fflush(cmd_in))
>> +               goto error;
>> +       fclose(cmd_in);
>>         cmd->in = -1;
>
> I wonder if setting cmd->in to -1 is still useful...

The patch doesn't change its usefulness.  It probably was not necessary
to begin with.

>
>>         sigchain_pop(SIGPIPE);
>>

The next line right here (not shown in the diff) returns 0.

>> @@ -660,8 +658,8 @@ static int do_reachable_revlist(struct child_process *cmd,
>>  error:
>>         sigchain_pop(SIGPIPE);
>>
>> -       if (cmd->in >= 0)
>> -               close(cmd->in);
>> +       if (cmd_in)
>> +               fclose(cmd_in);
>
> ...as we don't check cmd->in anymore at the end of the function, but
> we now check cmd_in instead. So should cmd_in have been set to -1
> instead of cmd->in?

This error handler is not reached from the place that sets cmd->in back
to -1.  It can be reached from a place where cmd_in is still NULL, so
this check is necessary and setting cmd_in to NULL above is not needed.

René

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] upload-pack: use buffered I/O to talk to rev-list
  2020-08-13  5:57     ` René Scharfe
@ 2020-08-13  9:01       ` Jeff King
  0 siblings, 0 replies; 9+ messages in thread
From: Jeff King @ 2020-08-13  9:01 UTC (permalink / raw)
  To: René Scharfe
  Cc: Christian Couder, Git Mailing List, Junio C Hamano, Chris Torek,
	Johannes Sixt

On Thu, Aug 13, 2020 at 07:57:20AM +0200, René Scharfe wrote:

> Am 13.08.20 um 07:17 schrieb Christian Couder:
> > On Wed, Aug 12, 2020 at 6:54 PM René Scharfe <l.s.r@web.de> wrote:
> >
> >> -       close(cmd->in);
> >> +       if (ferror(cmd_in) || fflush(cmd_in))
> >> +               goto error;
> >> +       fclose(cmd_in);
> >>         cmd->in = -1;
> >
> > I wonder if setting cmd->in to -1 is still useful...
> 
> The patch doesn't change its usefulness.  It probably was not necessary
> to begin with.

Right. We exit immediately after setting it, but it's not clearly a dead
load: the "cmd" struct is passed in from the caller. Prior to 2997178ee6
(upload-pack: split check_unreachable() in two, prep for
get_reachable_list(), 2016-06-12), it was part of a longer function
which did have more error handling.

But after the split, has_unreachable() knows that it is always closed
after do_reachable_revlist() returns, so it never looks at cmd.in again.
It might be worth removing just to avoid confusion.

I thought this also implied that the conditional close() in the error
block was not necessary, but it is. In the existing code we could "goto
error" from start_command() failing, before we ever assign to cmd->in.
The run-command API clears the struct in that case (so we'd see cmd->in
== 0, and avoid closing).

> >> -       if (cmd->in >= 0)
> >> -               close(cmd->in);
> >> +       if (cmd_in)
> >> +               fclose(cmd_in);
> >
> > ...as we don't check cmd->in anymore at the end of the function, but
> > we now check cmd_in instead. So should cmd_in have been set to -1
> > instead of cmd->in?
> 
> This error handler is not reached from the place that sets cmd->in back
> to -1.  It can be reached from a place where cmd_in is still NULL, so
> this check is necessary and setting cmd_in to NULL above is not needed.

Right, that makes sense. Your NULL cmd_in is replacing the zero'd
cmd->in from start_command().

So I think your patch is correct. It might be worth removing the "-1"
assignment on top.

-Peff

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-08-13  9:01 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-02 14:38 [PATCH] upload-pack: use buffered I/O to talk to rev-list René Scharfe
2020-08-02 16:03 ` Chris Torek
2020-08-03 14:00   ` René Scharfe
2020-08-03 15:54     ` Chris Torek
2020-08-03 18:15   ` Johannes Sixt
2020-08-12 16:52 ` [PATCH v2] " René Scharfe
2020-08-13  5:17   ` Christian Couder
2020-08-13  5:57     ` René Scharfe
2020-08-13  9:01       ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).