git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Jeff King <peff@peff.net>
Cc: Jacob Vosmaer <jacob@gitlab.com>,
	git@vger.kernel.org, Taylor Blau <me@ttaylorr.com>
Subject: Re: [PATCH 1/1] builtin/pack-objects.c: avoid iterating all refs
Date: Thu, 21 Jan 2021 11:26:13 +0100	[thread overview]
Message-ID: <874kja8u2i.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <YAiKQ0M0/14Q13Ee@coredump.intra.peff.net>


On Wed, Jan 20 2021, Jeff King wrote:

> On Wed, Jan 20, 2021 at 09:50:19AM +0100, Ævar Arnfjörð Bjarmason wrote:
>
>> To elaborate on other things that aren't really your problem & Taylor's
>> E-Mail downthread we originally added this because:
>> 
>>     If new annotated tags have been introduced then we can also include
>>     them in the packfile, saving the client from needing to request them
>>     through a second connection.
>> 
>> I've barked up this whole tag fetch tree before 97716d217c (fetch: add a
>> --prune-tags option and fetch.pruneTags config, 2018-02-09) but really
>> not this specific area.
>> 
>> I wonder if longer term simply moving this to be work the client does
>> wouldn't make more sense. I.e. if we just delete this for_each_ref()
>> loop.
>>     
>> We're just saving a client from saying they "want" a tag. I.e. with the
>> whole thing removed we need this to make t5503-tagfollow.sh pass (see
>> [1] at the end for the dialog, the tag is 3253df4d...):
>
> Isn't this an ordering problem, though? The client cannot mention
> auto-followed tags in a "want", because they first need to "want" the
> main history, receive the pack, and then realize they want the others.
>
> So we are not just saving the client from sending a "want", but making a
> second full connection to do so. That seems to be an optimization worth
> continuing to have.

Correct. Suppose a history COMMIT(TAG) history like:

    mainline: A(X) -> B(Y) -> C
    topic:    A(X) -> B(Y) -> D(Z)

You only want the "mainline" history and its tags in your fetch. Now a
server will give you tags X & Y for free, ignoring Z.

The note in protocol-capabilities.txt supposes that you'll need to only
get A/B/C in one fetch, then do another one where you see from the OIDs
you have and advertised (peeled) OIDs for the tags and know you just
need X/Y, not Z.

So we could also just fetch Z, then do our own walk on the client to see
if it's reachable, and throw it away if not. Although I suppose that'll
need a list of "bad" tags on the client so we don't repeat the process
the next time around, hrm...

>>     diff --git a/t/t5503-tagfollow.sh b/t/t5503-tagfollow.sh
>>     index 6041a4dd32..1ddc430aef 100755
>>     --- a/t/t5503-tagfollow.sh
>>     +++ b/t/t5503-tagfollow.sh
>>     @@ -134,6 +134,7 @@ test_expect_success 'fetch B, S (commit and tag : 1 connection)' '
>>      test_expect_success 'setup expect' '
>>      cat - <<EOF >expect
>>      want $B
>>     +want $T
>>      want $S
>>      EOF
>>      '
>>     @@ -146,6 +147,7 @@ test_expect_success 'new clone fetch master and tags' '
>>                     cd clone2 &&
>>                     git init &&
>>                     git remote add origin .. &&
>>     +               git config --add remote.origin.fetch "+refs/tags/*:refs/tags/*" &&
>>                     GIT_TRACE_PACKET=$UPATH git fetch &&
>>                     test $B = $(git rev-parse --verify origin/master) &&
>>                     test $S = $(git rev-parse --verify tag2) &&
>> 
>> We're also saving the client the work of having to go through
>> refs/tags/* and figure out whether there are tags there that aren't on
>> our main history.
>
> You seem to be against auto-following at all. And certainly I can see an
> argument that it is not worth the trouble it causes. But it is the
> default behavior, and I suspect many people are relying on it. Fetching
> every tag indiscriminately is going to grab a bunch of extra unwanted
> objects in some repos. An obvious case is any time "clone
> --single-branch --depth" is used.

I wonder if the use-cases for --depth in the wild aren't 99.9%
--depth=1, in which case the peeled commit on the main branch being
advertised by a tag would alredy give you this information.

I.e. in the common case you don't get a tag, but if you happen to land
on a release you can see that the commit at the tip is tagged. I wonder
if doing that shortcut already in the client (so not send include-tag)
is a useful micro-optimization.

> Maybe I'm not quite understanding what you're proposing.

Not much really, just that looking deeper into this area might be a
useful exercise. I.e. it's a case of server<->client cooperation where
we offlod the work to one or the other based on an old design decision,
maybe that trade-off is not optimal in most cases anymore.

  reply	other threads:[~2021-01-21 10:28 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-19 14:33 [PATCH 0/1] builtin/pack-objects.c: avoid iterating all refs Jacob Vosmaer
2021-01-19 14:33 ` [PATCH 1/1] " Jacob Vosmaer
2021-01-19 23:08   ` Taylor Blau
2021-01-19 23:33     ` Jeff King
2021-01-19 23:54       ` Taylor Blau
2021-01-19 23:15   ` Jeff King
2021-01-20  8:50   ` Ævar Arnfjörð Bjarmason
2021-01-20 15:02     ` Taylor Blau
2021-01-20 16:32       ` Ævar Arnfjörð Bjarmason
2021-01-20 19:53     ` Jeff King
2021-01-21 10:26       ` Ævar Arnfjörð Bjarmason [this message]
2021-01-21 15:34         ` Jeff King
2021-01-19 23:26 ` [PATCH 0/1] " Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=874kja8u2i.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jacob@gitlab.com \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).