* [PATCH] cache-tree: avoid needless promisor object fetch
@ 2021-04-23 3:16 Jonathan Tan
2021-04-23 12:28 ` Johannes Schindelin
0 siblings, 1 reply; 3+ messages in thread
From: Jonathan Tan @ 2021-04-23 3:16 UTC (permalink / raw)
To: git; +Cc: Jonathan Tan
In update_one() (used only by cache_tree_update()), there is an object
existence check that, if it fails, will automatically trigger a lazy
fetch in a partial clone. But the fetch is not necessary - the object is
not actually being used.
Replace that check with two checks: an object existence check that does
not fetch, and then a check that that object is a promisor object.
Doing this avoids multiple lazy fetches when merging two trees in a
partial clone, as noticed at $DAYJOB.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
Another alternative is to think about whether the object existence check
here is needed in the first place.
There might also be other places we can make a similar change in
update_one(), but I limited myself to what's needed to solve the
specific case we discovered at $DAYJOB.
---
cache-tree.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/cache-tree.c b/cache-tree.c
index add1f07713..6728722597 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -6,6 +6,7 @@
#include "object-store.h"
#include "replace-object.h"
#include "promisor-remote.h"
+#include "packfile.h"
#ifndef DEBUG_CACHE_TREE
#define DEBUG_CACHE_TREE 0
@@ -362,7 +363,9 @@ static int update_one(struct cache_tree *it,
(has_promisor_remote() &&
ce_skip_worktree(ce));
if (is_null_oid(oid) ||
- (!ce_missing_ok && !has_object_file(oid))) {
+ (!ce_missing_ok &&
+ !has_object_file_with_flags(oid, OBJECT_INFO_SKIP_FETCH_OBJECT) &&
+ !is_promisor_object(oid))) {
strbuf_release(&buffer);
if (expected_missing)
return -1;
--
2.31.1.498.g6c1eba8ee3d-goog
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] cache-tree: avoid needless promisor object fetch
2021-04-23 3:16 [PATCH] cache-tree: avoid needless promisor object fetch Jonathan Tan
@ 2021-04-23 12:28 ` Johannes Schindelin
2021-04-26 19:49 ` Jonathan Tan
0 siblings, 1 reply; 3+ messages in thread
From: Johannes Schindelin @ 2021-04-23 12:28 UTC (permalink / raw)
To: Jonathan Tan; +Cc: git
Hi Jonathan,
On Thu, 22 Apr 2021, Jonathan Tan wrote:
> In update_one() (used only by cache_tree_update()), there is an object
> existence check that, if it fails, will automatically trigger a lazy
> fetch in a partial clone. But the fetch is not necessary - the object is
> not actually being used.
I find it curious, though, that the `ce_missing_ok` variable is defined
thusly (sadly, the context of your diff is too small to show it):
ce_missing_ok = mode == S_IFGITLINK || missing_ok ||
(has_promisor_remote() &&
ce_skip_worktree(ce));
Which means that the `has_object_file()` function is only called if the
entry is not marked with the `skip-worktree` bit, i.e. if it is _not_
excluded from the sparse checkout.
Wouldn't that mean that the object _should_ be there?
I guess what I am saying is that while the commit message focuses on the
"What?" of the patch, I would love to hear more about the "Why?". And
maybe the "When?" as in: when does this actually matter?
And since the bug was critical enough for you to spend time on crafting
it, maybe it would make sense to add a regression test to ensure that this
bug does not creep in again?
>
> Replace that check with two checks: an object existence check that does
> not fetch, and then a check that that object is a promisor object.
This essentially repeats what the diff says, but it might make more sense
to explain why the post-image of this diff is more correct (and maybe
discuss performance implications).
>
> Doing this avoids multiple lazy fetches when merging two trees in a
> partial clone, as noticed at $DAYJOB.
Ah. But where are those trees fetched, then?
Maybe lead with the description of the bug?
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
> Another alternative is to think about whether the object existence check
> here is needed in the first place.
>
> There might also be other places we can make a similar change in
> update_one(), but I limited myself to what's needed to solve the
> specific case we discovered at $DAYJOB.
I only see another `has_object_file()` call site at the very beginning,
and I think this needs to fetch. Or maybe it is more efficient to
construct the cache tree from scratch than fetch it?
There is also `cache_tree_fully_valid_1()`, where I think the same
handling could potentially make sense. (Or, if you target `seen`,
`cache_tree_fully_valid()`.
Ciao,
Johannes
> ---
> cache-tree.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/cache-tree.c b/cache-tree.c
> index add1f07713..6728722597 100644
> --- a/cache-tree.c
> +++ b/cache-tree.c
> @@ -6,6 +6,7 @@
> #include "object-store.h"
> #include "replace-object.h"
> #include "promisor-remote.h"
> +#include "packfile.h"
>
> #ifndef DEBUG_CACHE_TREE
> #define DEBUG_CACHE_TREE 0
> @@ -362,7 +363,9 @@ static int update_one(struct cache_tree *it,
> (has_promisor_remote() &&
> ce_skip_worktree(ce));
> if (is_null_oid(oid) ||
> - (!ce_missing_ok && !has_object_file(oid))) {
> + (!ce_missing_ok &&
> + !has_object_file_with_flags(oid, OBJECT_INFO_SKIP_FETCH_OBJECT) &&
> + !is_promisor_object(oid))) {
> strbuf_release(&buffer);
> if (expected_missing)
> return -1;
> --
> 2.31.1.498.g6c1eba8ee3d-goog
>
>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] cache-tree: avoid needless promisor object fetch
2021-04-23 12:28 ` Johannes Schindelin
@ 2021-04-26 19:49 ` Jonathan Tan
0 siblings, 0 replies; 3+ messages in thread
From: Jonathan Tan @ 2021-04-26 19:49 UTC (permalink / raw)
To: Johannes.Schindelin; +Cc: jonathantanmy, git
> > In update_one() (used only by cache_tree_update()), there is an object
> > existence check that, if it fails, will automatically trigger a lazy
> > fetch in a partial clone. But the fetch is not necessary - the object is
> > not actually being used.
>
> I find it curious, though, that the `ce_missing_ok` variable is defined
> thusly (sadly, the context of your diff is too small to show it):
>
> ce_missing_ok = mode == S_IFGITLINK || missing_ok ||
> (has_promisor_remote() &&
> ce_skip_worktree(ce));
>
> Which means that the `has_object_file()` function is only called if the
> entry is not marked with the `skip-worktree` bit, i.e. if it is _not_
> excluded from the sparse checkout.
>
> Wouldn't that mean that the object _should_ be there?
In a partial clone, probably not?
> I guess what I am saying is that while the commit message focuses on the
> "What?" of the patch, I would love to hear more about the "Why?". And
> maybe the "When?" as in: when does this actually matter?
In this case, that's something I'd like help in figuring out too.
Normally this code path (unpack_trees()) prefetches everything through a
call to check_updates(), but the update flag is somehow not set so there
is no prefetching happening.
> And since the bug was critical enough for you to spend time on crafting
> it, maybe it would make sense to add a regression test to ensure that this
> bug does not creep in again?
OK.
> > Replace that check with two checks: an object existence check that does
> > not fetch, and then a check that that object is a promisor object.
>
> This essentially repeats what the diff says, but it might make more sense
> to explain why the post-image of this diff is more correct (and maybe
> discuss performance implications).
OK - I think this is the "why" and "when" you described above.
> > Doing this avoids multiple lazy fetches when merging two trees in a
> > partial clone, as noticed at $DAYJOB.
>
> Ah. But where are those trees fetched, then?
>
> Maybe lead with the description of the bug?
This was a partial clone excluding blobs only. I'll update the commit
message to mention this detail.
> > Another alternative is to think about whether the object existence check
> > here is needed in the first place.
> >
> > There might also be other places we can make a similar change in
> > update_one(), but I limited myself to what's needed to solve the
> > specific case we discovered at $DAYJOB.
>
> I only see another `has_object_file()` call site at the very beginning,
> and I think this needs to fetch. Or maybe it is more efficient to
> construct the cache tree from scratch than fetch it?
Good point - if we can construct it, we probably shouldn't fetch it.
> There is also `cache_tree_fully_valid_1()`, where I think the same
> handling could potentially make sense. (Or, if you target `seen`,
> `cache_tree_fully_valid()`.
True.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2021-04-26 19:49 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-23 3:16 [PATCH] cache-tree: avoid needless promisor object fetch Jonathan Tan
2021-04-23 12:28 ` Johannes Schindelin
2021-04-26 19:49 ` Jonathan Tan
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).