git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Jacob Vosmaer <jacob@gitlab.com>
Cc: peff@peff.net, me@ttaylorr.com, git@vger.kernel.org, ps@pks.im
Subject: Re: [PATCH 2/2] upload-pack: use stdio in send_ref callbacks
Date: Thu, 26 Aug 2021 09:33:08 -0700	[thread overview]
Message-ID: <xmqqpmu0f9ob.fsf@gitster.g> (raw)
In-Reply-To: <20210826100648.10333-2-jacob@gitlab.com> (Jacob Vosmaer's message of "Thu, 26 Aug 2021 12:06:48 +0200")

Jacob Vosmaer <jacob@gitlab.com> writes:

> In both protocol v0 and v2, upload-pack writes one pktline packet per
> advertised ref to stdout. That means one or two write(2) syscalls per
> ref. This is problematic if these writes become network sends with
> high overhead.
>
> This commit changes both send_ref callbacks to use buffered IO using
> stdio. The default buffer size is usually 4KB, but with musl libc it
> is only 1KB. So for consistent results we need to set a buffer size
> ourselves. During startup of upload-pack, we set the stdout buffer
> size to LARGE_PACKET_MAX, i.e. just shy of 64KB.
>
> To give an example of the impact: I set up a single-threaded loop that
> calls ls-remote (with HTTP and protocol v2) on a local GitLab
> instance, on a repository with 11K refs. When I switch from Git
> v2.32.0 to this patch, I see a 50% reduction in CPU time for Git, and
> 75% for Gitaly (GitLab's Git RPC service).
>
> So using buffered IO not only saves syscalls in upload-pack, it also
> saves time in things that consume upload-pack's output.
>
> Signed-off-by: Jacob Vosmaer <jacob@gitlab.com>
> ---

Nicely explained.

> +	/*
> +	 * Increase the stdio buffer size for stdout, for the benefit of ref
> +	 * advertisement writes. We are only allowed to call setvbuf(3) "after
> +	 * opening a stream and before any other operations have been performed
> +	 * on it", so let's call it before we have written anything to stdout.
> +	 */
> +	if (setvbuf(stdout, xmalloc(LARGE_PACKET_MAX), _IOFBF,
> +			LARGE_PACKET_MAX))
> +		die_errno("failed to grow stdout buffer");

Nice to see a comment on the tricky part.  I do not think we mind if
we rounded up the allocation size to the next power of two here, but
there probably won't be any measurable benefit for doing so.

> diff --git a/ls-refs.c b/ls-refs.c
> index 88f6c3f60d..83f2948fc3 100644
> --- a/ls-refs.c
> +++ b/ls-refs.c
> @@ -105,7 +105,7 @@ static int send_ref(const char *refname, const struct object_id *oid,
>  	}
>  
>  	strbuf_addch(&refline, '\n');
> -	packet_write(1, refline.buf, refline.len);
> +	packet_fwrite(stdout, refline.buf, refline.len);
>  
>  	strbuf_release(&refline);
>  	return 0;
> @@ -171,6 +171,9 @@ int ls_refs(struct repository *r, struct strvec *keys,
>  		strvec_push(&data.prefixes, "");
>  	for_each_fullref_in_prefixes(get_git_namespace(), data.prefixes.v,
>  				     send_ref, &data, 0);
> +	/* Call fflush because send_ref uses stdio. */
> +	if (fflush(stdout))
> +		die_errno(_("write failure on standard output"));

There is maybe_flush_or_die() helper that does a bit too much (I do
not think this code path needs to worry about GIT_FLUSH) but does
call check_pipe(errno) like packet_write_fmt() does upon seeing a
write failure.

>  	packet_flush(1);

I briefly wondered if all the operations on refline can be turned
into direct operations on stdout, but the answer obviously is no.
We still need to have a strbuf to assemble a single packet and
measure its final length before we can send it out to stdout.

> diff --git a/upload-pack.c b/upload-pack.c
> index 297b76fcb4..b592ac6cfb 100644
> --- a/upload-pack.c
> +++ b/upload-pack.c
> @@ -58,6 +58,7 @@ enum allow_uor {
>   */
>  struct upload_pack_data {
>  	struct string_list symref;				/* v0 only */
> +	struct strbuf send_ref_buf;				/* v0 only */
>  	struct object_array want_obj;
>  	struct object_array have_obj;
>  	struct oid_array haves;					/* v2 only */
> @@ -126,6 +127,7 @@ static void upload_pack_data_init(struct upload_pack_data *data)
>  	struct string_list uri_protocols = STRING_LIST_INIT_DUP;
>  	struct object_array extra_edge_obj = OBJECT_ARRAY_INIT;
>  	struct string_list allowed_filters = STRING_LIST_INIT_DUP;
> +	struct strbuf send_ref_buf = STRBUF_INIT;

IIUC, the most notable difference between this round and the
previous one is that now we are no longer buffering more than one
packet worth of data because we let the stdio to accumulate them.
I was a bit surprised that we still want to have a strbuf inside
this structure (which is there only because it wants to persist
during the whole conversation with the other side).

Ahh, sorry, scratch that.  I do remember the discussion/patch that
it was hurting to make calls to strbuf-init/strbuf-release once per
ref, and it is an easy way to have the structure here to reuse it.

OK.  This step looks quite sensible, other than the same "do we want
check_pipe() before dying upon fflush() failure?" we see in the last
hunk below.  I didn't mention this in the review of 1/2, but the new
fwrite_or_die() helper function may also have the same issue.

Thanks.

> @@ -1348,6 +1354,9 @@ void upload_pack(struct upload_pack_options *options)
>  		reset_timeout(data.timeout);
>  		head_ref_namespaced(send_ref, &data);
>  		for_each_namespaced_ref(send_ref, &data);
> +		/* Call fflush because send_ref uses stdio. */
> +		if (fflush(stdout))
> +			die_errno(_("write failure on standard output"));
>  		advertise_shallow_grafts(1);
>  		packet_flush(1);
>  	} else {

  reply	other threads:[~2021-08-26 16:33 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-24 14:02 [PATCH 1/1] upload-pack: buffer ref advertisement writes Jacob Vosmaer
2021-08-24 21:07 ` Taylor Blau
2021-08-24 21:42   ` Junio C Hamano
2021-08-25  0:44     ` Jeff King
2021-08-26 10:02       ` Jacob Vosmaer
2021-08-26 10:06         ` [PATCH 1/2] pkt-line: add packet_fwrite Jacob Vosmaer
2021-08-26 10:06           ` [PATCH 2/2] upload-pack: use stdio in send_ref callbacks Jacob Vosmaer
2021-08-26 16:33             ` Junio C Hamano [this message]
2021-08-26 20:21               ` Junio C Hamano
2021-08-26 22:35               ` Taylor Blau
2021-08-26 23:24               ` Jeff King
2021-08-27 16:15                 ` Junio C Hamano
2021-08-31  9:34                   ` [PATCH v3 0/2] send_ref buffering Jacob Vosmaer
2021-08-31  9:34                     ` [PATCH v3 1/2] pkt-line: add stdio packet write functions Jacob Vosmaer
2021-08-31 10:37                       ` Jeff King
2021-08-31 18:13                       ` Junio C Hamano
2021-09-01 12:54                         ` [PATCH v4 0/2] send_ref buffering Jacob Vosmaer
2021-09-01 12:54                           ` [PATCH v4 1/2] pkt-line: add stdio packet write functions Jacob Vosmaer
2021-09-01 12:54                           ` [PATCH v4 2/2] upload-pack: use stdio in send_ref callbacks Jacob Vosmaer
2021-09-02  9:18                           ` [PATCH v4 0/2] send_ref buffering Jeff King
2021-08-31  9:34                     ` [PATCH v3 2/2] upload-pack: use stdio in send_ref callbacks Jacob Vosmaer
2021-08-31 10:25                     ` [PATCH v3 0/2] send_ref buffering Jeff King
2021-08-31 13:08                       ` Jacob Vosmaer
2021-08-31 17:44                         ` Jacob Vosmaer
2021-09-01  0:15                         ` Jeff King
2021-08-26 23:32             ` [PATCH 2/2] upload-pack: use stdio in send_ref callbacks Jeff King
2021-08-26 16:33           ` [PATCH 1/2] pkt-line: add packet_fwrite Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqpmu0f9ob.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=jacob@gitlab.com \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).