git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jonathan Tan <jonathantanmy@google.com>
To: Johannes.Schindelin@gmx.de
Cc: jonathantanmy@google.com, git@vger.kernel.org
Subject: Re: [PATCH] cache-tree: avoid needless promisor object fetch
Date: Mon, 26 Apr 2021 12:49:28 -0700	[thread overview]
Message-ID: <20210426194928.326338-1-jonathantanmy@google.com> (raw)
In-Reply-To: <nycvar.QRO.7.76.6.2104231409500.54@tvgsbejvaqbjf.bet>

> > In update_one() (used only by cache_tree_update()), there is an object
> > existence check that, if it fails, will automatically trigger a lazy
> > fetch in a partial clone. But the fetch is not necessary - the object is
> > not actually being used.
> 
> I find it curious, though, that the `ce_missing_ok` variable is defined
> thusly (sadly, the context of your diff is too small to show it):
> 
>                 ce_missing_ok = mode == S_IFGITLINK || missing_ok ||
>                         (has_promisor_remote() &&
>                          ce_skip_worktree(ce));
> 
> Which means that the `has_object_file()` function is only called if the
> entry is not marked with the `skip-worktree` bit, i.e. if it is _not_
> excluded from the sparse checkout.
> 
> Wouldn't that mean that the object _should_ be there?

In a partial clone, probably not?

> I guess what I am saying is that while the commit message focuses on the
> "What?" of the patch, I would love to hear more about the "Why?". And
> maybe the "When?" as in: when does this actually matter?

In this case, that's something I'd like help in figuring out too.
Normally this code path (unpack_trees()) prefetches everything through a
call to check_updates(), but the update flag is somehow not set so there
is no prefetching happening.

> And since the bug was critical enough for you to spend time on crafting
> it, maybe it would make sense to add a regression test to ensure that this
> bug does not creep in again?

OK.

> > Replace that check with two checks: an object existence check that does
> > not fetch, and then a check that that object is a promisor object.
> 
> This essentially repeats what the diff says, but it might make more sense
> to explain why the post-image of this diff is more correct (and maybe
> discuss performance implications).

OK - I think this is the "why" and "when" you described above.

> > Doing this avoids multiple lazy fetches when merging two trees in a
> > partial clone, as noticed at $DAYJOB.
> 
> Ah. But where are those trees fetched, then?
> 
> Maybe lead with the description of the bug?

This was a partial clone excluding blobs only. I'll update the commit
message to mention this detail.

> > Another alternative is to think about whether the object existence check
> > here is needed in the first place.
> >
> > There might also be other places we can make a similar change in
> > update_one(), but I limited myself to what's needed to solve the
> > specific case we discovered at $DAYJOB.
> 
> I only see another `has_object_file()` call site at the very beginning,
> and I think this needs to fetch. Or maybe it is more efficient to
> construct the cache tree from scratch than fetch it?

Good point - if we can construct it, we probably shouldn't fetch it.

> There is also `cache_tree_fully_valid_1()`, where I think the same
> handling could potentially make sense. (Or, if you target `seen`,
> `cache_tree_fully_valid()`.

True.

      reply	other threads:[~2021-04-26 19:49 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-23  3:16 [PATCH] cache-tree: avoid needless promisor object fetch Jonathan Tan
2021-04-23 12:28 ` Johannes Schindelin
2021-04-26 19:49   ` Jonathan Tan [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210426194928.326338-1-jonathantanmy@google.com \
    --to=jonathantanmy@google.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).