git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Derrick Stolee <stolee@gmail.com>
Cc: Junio C Hamano <gitster@pobox.com>,
	Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org, jonathantanmy@google.com, me@ttaylorr.com,
	Derrick Stolee <dstolee@microsoft.com>,
	git@jeffhostetler.com
Subject: Re: [PATCH] connected.c: reprepare packs for corner cases
Date: Thu, 12 Mar 2020 22:30:34 -0400	[thread overview]
Message-ID: <20200313023034.GA900007@coredump.intra.peff.net> (raw)
In-Reply-To: <7378a863-7e2a-455e-4635-e07938ef3381@gmail.com>

On Thu, Mar 12, 2020 at 08:54:34PM -0400, Derrick Stolee wrote:

> On 3/12/2020 5:26 PM, Jeff King wrote:
> > On Thu, Mar 12, 2020 at 05:16:38PM -0400, Jeff King wrote:
> > 
> >> There we see that same reprepare happen in 882839, which is the child
> >> fetch-pack. The parent fetch probably needs to reprepare itself after
> >> fetch-pack completes.
> 
> I agree with you and Junio that where I put the reprepare was non-
> optimal. The initial reason to put it there was that I found where
> the error was happening, and thought that placing the reprepare there
> was the best way to prevent this error from popping up in another case.

To be fair to you, the place you put it is almost certainly fine in
practice. The connectivity check is likely the first time we'll actually
look at the objects in the parent process, and once we've re-scanned,
all the code after us is protected.

But I do still think it's cleaner to put it closer to the responsible
code.

Although there was one thing that puzzled me when writing the previous
email. Why isn't this a problem for non-partial clones? And the answer
is that in a normal repo, as soon as we try to look up a missing object,
we'd trigger the usual re-scan. But we _don't_ do object lookups in
check_connected(). We do this funky loop over get_all_packs() ourselves,
looking only at promisor packs. And if we didn't find the object, then
we immediately complain in that loop with no fallback to re-scan.

So that would actually argue that your patch is putting it in the right
place. It's _not_ fetch's responsibility to reprepare_packed_git(). It's
the loop in check_connected() that is skipping the usual reprepare
logic, and shouldn't.

And one fix (which you did) is to just preemptively reprepare right
above that loop. Some other solutions are:

  - teach the loop to reprepare() when we don't find an object and see
    if we picked up a new promisor pack

  - reverse the loop logic: look up each object in the usual way (in all
    packs), and see if the resulting pack is a promisor. I guess that
    produces slightly different results though if an object is in the
    new promisor pack _and_ available elsewhere; but isn't the point of
    that check_refs_are_promisor_objects_only flag that we're doing a
    clone?

> I wonder if we could do something more complicated in the long-term,
> which was recommended to me by Jeff Hostetler: add the pack to the
> packed_git list once we've indexed it. That way, we don't reprepare
> and scan the packs one-by-one, but instead we insert to the list
> a single pack that we already know about.

I don't think the parent git-fetch process even knows about the pack,
though. It just asked the remote-helper to somehow make the objects
arrive, and it doesn't know what form that took.

reprepare_packed_git() should be pretty cheap, though. If we already
have a pack in the list, we won't re-open it. Finding out if we already
had one used to be O(n), making the whole re-scan quadratic. But that
was fixed recently in ec48540fe8 (packfile.c: speed up loading lots of
packfiles, 2019-11-27), where we switched to keeping a hashmap of
loaded pack names.

-Peff

  parent reply	other threads:[~2020-03-13  2:30 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-12 15:36 [PATCH] connected.c: reprepare packs for corner cases Derrick Stolee via GitGitGadget
2020-03-12 16:39 ` Jonathan Tan
2020-03-12 17:34   ` Derrick Stolee
2020-03-12 20:42 ` Junio C Hamano
2020-03-12 21:16   ` Jeff King
2020-03-12 21:26     ` Jeff King
2020-03-13  0:54       ` Derrick Stolee
2020-03-13  1:14         ` Junio C Hamano
2020-03-13  2:30         ` Jeff King [this message]
2020-03-13  2:34           ` Jeff King
2020-03-13 12:43             ` Derrick Stolee
2020-03-13 21:11 ` [PATCH v2] " Derrick Stolee via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200313023034.GA900007@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=dstolee@microsoft.com \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    --cc=me@ttaylorr.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).