git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] ls-refs: reuse buffer when sending refs
@ 2021-08-25 13:49 Patrick Steinhardt
  2021-08-25 14:10 ` Patrick Steinhardt
  2021-08-25 14:50 ` Derrick Stolee
  0 siblings, 2 replies; 5+ messages in thread
From: Patrick Steinhardt @ 2021-08-25 13:49 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 3739 bytes --]

In the initial reference advertisement, the Git server will first
announce all of its references to the client. The logic is handled in
`send_ref()`, which will allocate a new buffer for each refline it is
about to send. This is quite wasteful: instead of allocating a new
buffer each time, we can just reuse a buffer.

Improve this by passing in a buffer via the `ls_refs_data` struct which
is then reused on each reference. In a repository with about 2.3M refs,
this speeds up local mirror fetches by about 2%:

    Benchmark #1: HEAD~: git-fetch
      Time (mean ± σ):     25.415 s ±  0.131 s    [User: 22.722 s, System: 4.740 s]
      Range (min … max):   25.240 s … 25.543 s    5 runs

    Benchmark #2: HEAD: git-fetch
      Time (mean ± σ):     24.922 s ±  0.110 s    [User: 22.404 s, System: 4.476 s]
      Range (min … max):   24.825 s … 25.081 s    5 runs

    Summary
      'HEAD: git-fetch' ran
        1.02 ± 0.01 times faster than 'HEAD~: git-fetch'

Signed-off-by: Patrick Steinhardt <ps@kps.im>
---

Note that while this topic applies on top of "master", I've done the
benchmark on top of my other optimizations for fetches. It's cheating a
bit, but it's easier to see that the optimization does something when
the remaining constant part is lower.

 ls-refs.c | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/ls-refs.c b/ls-refs.c
index 88f6c3f60d..84021416ca 100644
--- a/ls-refs.c
+++ b/ls-refs.c
@@ -65,6 +65,7 @@ struct ls_refs_data {
 	unsigned peel;
 	unsigned symrefs;
 	struct strvec prefixes;
+	struct strbuf buf;
 	unsigned unborn : 1;
 };
 
@@ -73,7 +74,8 @@ static int send_ref(const char *refname, const struct object_id *oid,
 {
 	struct ls_refs_data *data = cb_data;
 	const char *refname_nons = strip_namespace(refname);
-	struct strbuf refline = STRBUF_INIT;
+
+	strbuf_reset(&data->buf);
 
 	if (ref_is_hidden(refname_nons, refname))
 		return 0;
@@ -82,9 +84,9 @@ static int send_ref(const char *refname, const struct object_id *oid,
 		return 0;
 
 	if (oid)
-		strbuf_addf(&refline, "%s %s", oid_to_hex(oid), refname_nons);
+		strbuf_addf(&data->buf, "%s %s", oid_to_hex(oid), refname_nons);
 	else
-		strbuf_addf(&refline, "unborn %s", refname_nons);
+		strbuf_addf(&data->buf, "unborn %s", refname_nons);
 	if (data->symrefs && flag & REF_ISSYMREF) {
 		struct object_id unused;
 		const char *symref_target = resolve_ref_unsafe(refname, 0,
@@ -94,20 +96,19 @@ static int send_ref(const char *refname, const struct object_id *oid,
 		if (!symref_target)
 			die("'%s' is a symref but it is not?", refname);
 
-		strbuf_addf(&refline, " symref-target:%s",
+		strbuf_addf(&data->buf, " symref-target:%s",
 			    strip_namespace(symref_target));
 	}
 
 	if (data->peel && oid) {
 		struct object_id peeled;
 		if (!peel_iterated_oid(oid, &peeled))
-			strbuf_addf(&refline, " peeled:%s", oid_to_hex(&peeled));
+			strbuf_addf(&data->buf, " peeled:%s", oid_to_hex(&peeled));
 	}
 
-	strbuf_addch(&refline, '\n');
-	packet_write(1, refline.buf, refline.len);
+	strbuf_addch(&data->buf, '\n');
+	packet_write(1, data->buf.buf, data->buf.len);
 
-	strbuf_release(&refline);
 	return 0;
 }
 
@@ -145,6 +146,7 @@ int ls_refs(struct repository *r, struct strvec *keys,
 
 	memset(&data, 0, sizeof(data));
 	strvec_init(&data.prefixes);
+	strbuf_init(&data.buf, 0);
 
 	ensure_config_read();
 	git_config(ls_refs_config, NULL);
@@ -173,6 +175,7 @@ int ls_refs(struct repository *r, struct strvec *keys,
 				     send_ref, &data, 0);
 	packet_flush(1);
 	strvec_clear(&data.prefixes);
+	strbuf_release(&data.buf);
 	return 0;
 }
 
-- 
2.33.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] ls-refs: reuse buffer when sending refs
  2021-08-25 13:49 [PATCH] ls-refs: reuse buffer when sending refs Patrick Steinhardt
@ 2021-08-25 14:10 ` Patrick Steinhardt
  2021-08-25 14:50 ` Derrick Stolee
  1 sibling, 0 replies; 5+ messages in thread
From: Patrick Steinhardt @ 2021-08-25 14:10 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 1350 bytes --]

On Wed, Aug 25, 2021 at 03:49:51PM +0200, Patrick Steinhardt wrote:
> In the initial reference advertisement, the Git server will first
> announce all of its references to the client. The logic is handled in
> `send_ref()`, which will allocate a new buffer for each refline it is
> about to send. This is quite wasteful: instead of allocating a new
> buffer each time, we can just reuse a buffer.
> 
> Improve this by passing in a buffer via the `ls_refs_data` struct which
> is then reused on each reference. In a repository with about 2.3M refs,
> this speeds up local mirror fetches by about 2%:
> 
>     Benchmark #1: HEAD~: git-fetch
>       Time (mean ± σ):     25.415 s ±  0.131 s    [User: 22.722 s, System: 4.740 s]
>       Range (min … max):   25.240 s … 25.543 s    5 runs
> 
>     Benchmark #2: HEAD: git-fetch
>       Time (mean ± σ):     24.922 s ±  0.110 s    [User: 22.404 s, System: 4.476 s]
>       Range (min … max):   24.825 s … 25.081 s    5 runs
> 
>     Summary
>       'HEAD: git-fetch' ran
>         1.02 ± 0.01 times faster than 'HEAD~: git-fetch'
> 
> Signed-off-by: Patrick Steinhardt <ps@kps.im>

I accidentally mis-typed my mail here. Really shows I should be using
`--signoff` instead of manually signing things. Anyway, I'll fix this in
case I'll need to re-roll.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] ls-refs: reuse buffer when sending refs
  2021-08-25 13:49 [PATCH] ls-refs: reuse buffer when sending refs Patrick Steinhardt
  2021-08-25 14:10 ` Patrick Steinhardt
@ 2021-08-25 14:50 ` Derrick Stolee
  2021-08-25 15:43   ` Patrick Steinhardt
  1 sibling, 1 reply; 5+ messages in thread
From: Derrick Stolee @ 2021-08-25 14:50 UTC (permalink / raw)
  To: Patrick Steinhardt, git

On 8/25/2021 9:49 AM, Patrick Steinhardt wrote:
> In the initial reference advertisement, the Git server will first
> announce all of its references to the client. The logic is handled in
> `send_ref()`, which will allocate a new buffer for each refline it is
> about to send. This is quite wasteful: instead of allocating a new
> buffer each time, we can just reuse a buffer.

Reusing a buffer makes perfect sense and is a clear improvement.
 
> Improve this by passing in a buffer via the `ls_refs_data` struct which
> is then reused on each reference. In a repository with about 2.3M refs,
> this speeds up local mirror fetches by about 2%:
> 
>     Benchmark #1: HEAD~: git-fetch
>       Time (mean ± σ):     25.415 s ±  0.131 s    [User: 22.722 s, System: 4.740 s]
>       Range (min … max):   25.240 s … 25.543 s    5 runs
> 
>     Benchmark #2: HEAD: git-fetch
>       Time (mean ± σ):     24.922 s ±  0.110 s    [User: 22.404 s, System: 4.476 s]
>       Range (min … max):   24.825 s … 25.081 s    5 runs
> 
>     Summary
>       'HEAD: git-fetch' ran
>         1.02 ± 0.01 times faster than 'HEAD~: git-fetch'
> 
> Signed-off-by: Patrick Steinhardt <ps@kps.im>
> ---
> 
> Note that while this topic applies on top of "master", I've done the
> benchmark on top of my other optimizations for fetches. It's cheating a
> bit, but it's easier to see that the optimization does something when
> the remaining constant part is lower.

I don't mind demonstrating an optimization using the other work.

Perhaps this would be better grouped with those other changes?
I know that the text is independent and merges cleanly without it,
but it can be helpful to think about the effort as one unified
topic instead of juggling multiple, especially because I don't
see the other one needing many revisions.

> -	struct strbuf refline = STRBUF_INIT;
> +
> +	strbuf_reset(&data->buf);

It's nice that this is the only _real_ change, and everything
else is a find-and-replace.

> @@ -145,6 +146,7 @@ int ls_refs(struct repository *r, struct strvec *keys,
>  
>  	memset(&data, 0, sizeof(data));
>  	strvec_init(&data.prefixes);
> +	strbuf_init(&data.buf, 0);
>  
>  	ensure_config_read();
>  	git_config(ls_refs_config, NULL);
> @@ -173,6 +175,7 @@ int ls_refs(struct repository *r, struct strvec *keys,
>  				     send_ref, &data, 0);
>  	packet_flush(1);
>  	strvec_clear(&data.prefixes);
> +	strbuf_release(&data.buf);
>  	return 0;
>  }

Except, of course, these two lines.

I think this patch is good to go!

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] ls-refs: reuse buffer when sending refs
  2021-08-25 14:50 ` Derrick Stolee
@ 2021-08-25 15:43   ` Patrick Steinhardt
  2021-08-25 17:23     ` Junio C Hamano
  0 siblings, 1 reply; 5+ messages in thread
From: Patrick Steinhardt @ 2021-08-25 15:43 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 2004 bytes --]

On Wed, Aug 25, 2021 at 10:50:30AM -0400, Derrick Stolee wrote:
> On 8/25/2021 9:49 AM, Patrick Steinhardt wrote:
> > Improve this by passing in a buffer via the `ls_refs_data` struct which
> > is then reused on each reference. In a repository with about 2.3M refs,
> > this speeds up local mirror fetches by about 2%:
> > 
> >     Benchmark #1: HEAD~: git-fetch
> >       Time (mean ± σ):     25.415 s ±  0.131 s    [User: 22.722 s, System: 4.740 s]
> >       Range (min … max):   25.240 s … 25.543 s    5 runs
> > 
> >     Benchmark #2: HEAD: git-fetch
> >       Time (mean ± σ):     24.922 s ±  0.110 s    [User: 22.404 s, System: 4.476 s]
> >       Range (min … max):   24.825 s … 25.081 s    5 runs
> > 
> >     Summary
> >       'HEAD: git-fetch' ran
> >         1.02 ± 0.01 times faster than 'HEAD~: git-fetch'
> > 
> > Signed-off-by: Patrick Steinhardt <ps@kps.im>
> > ---
> > 
> > Note that while this topic applies on top of "master", I've done the
> > benchmark on top of my other optimizations for fetches. It's cheating a
> > bit, but it's easier to see that the optimization does something when
> > the remaining constant part is lower.
> 
> I don't mind demonstrating an optimization using the other work.
> 
> Perhaps this would be better grouped with those other changes?
> I know that the text is independent and merges cleanly without it,
> but it can be helpful to think about the effort as one unified
> topic instead of juggling multiple, especially because I don't
> see the other one needing many revisions.

I don't know. I just happen to revisit this topic every few days, and
every time I stumble upon some more performance improvements. It would
feel wrong to shift the goalposts of the other series every time I find
something new, so I instead opt for separate patch series.

If this proves to be annoying for reviewers, then feel free to shout at
me and I'll change my approach.

Thanks for your review!

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] ls-refs: reuse buffer when sending refs
  2021-08-25 15:43   ` Patrick Steinhardt
@ 2021-08-25 17:23     ` Junio C Hamano
  0 siblings, 0 replies; 5+ messages in thread
From: Junio C Hamano @ 2021-08-25 17:23 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Derrick Stolee, git

Patrick Steinhardt <ps@pks.im> writes:

> I don't know. I just happen to revisit this topic every few days, and
> every time I stumble upon some more performance improvements. It would
> feel wrong to shift the goalposts of the other series every time I find
> something new, so I instead opt for separate patch series.
>
> If this proves to be annoying for reviewers, then feel free to shout at
> me and I'll change my approach.

The easiest would be to keep them independent and to justify them
independently.  There is no shifting the goalposts involved if you
did so.  Of course, you wouldn't be able to include the improvements
the follow-on topics make as part of the "advertising material" to
sell the benefit of the base series, though.  It can only work when
the follow-on topic is truly independent and is a good idea by
itself.  Another is to keep these follow-on topics unpublished
before the base topic graduates, and pretend that you came up with
them much later than you originally did.  Nobody would notice and
mind, as the base topic would be cast in stone by that time.

At the receiving end, what is most irritating is a series of topics
that pretends to be about different things but depend on each other
to function well [*].  I would imagine that it would be a lot more
trivial and pleasant to handle if any of the patches involved did
not come before these follow-on topics are all already thought of by
the author and instead came in a well-structured single topic, but
we do not live in a perfect world ;-).

In the case of this particular patch, I think the logic behind the
change makes sense by itself, so if I were doing it, I'd probably
choose to sell it as an independent change unrelated to the other
topic.

Thanks.


[Footnote]

* Any time the base topic gets rerolled, I'd be the one who ends up
  having to remember which other topics that did not rerolled depend
  on it in what order and rebase them correctly, and then I have to
  replace the follow-on topics that have been rerolled.  Even a
  single missed subtopic will cause the day's integration work
  redone, and all that robs time and concentration that the topics
  by other contributors needs from me, which would make me grumpy
  X-<.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-08-25 17:24 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-25 13:49 [PATCH] ls-refs: reuse buffer when sending refs Patrick Steinhardt
2021-08-25 14:10 ` Patrick Steinhardt
2021-08-25 14:50 ` Derrick Stolee
2021-08-25 15:43   ` Patrick Steinhardt
2021-08-25 17:23     ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).