git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Han Xin <hanxin.hx@bytedance.com>
Cc: chiyutianyi@gmail.com, derrickstolee@github.com,
	git@vger.kernel.org, haiyangtand@gmail.com,
	jonathantanmy@google.com, me@ttaylorr.com, ps@pks.im
Subject: Re: [PATCH v2 2/2] commit-graph.c: no lazy fetch in lookup_commit_in_graph()
Date: Fri, 24 Jun 2022 09:56:16 -0700	[thread overview]
Message-ID: <xmqqpmiyuhjj.fsf@gitster.g> (raw)
In-Reply-To: <d3a99a5c5ae538b626e04d7069dd2fc316605dfc.1656044659.git.hanxin.hx@bytedance.com> (Han Xin's message of "Fri, 24 Jun 2022 13:27:57 +0800")

Han Xin <hanxin.hx@bytedance.com> writes:

> If a commit is in the commit graph, we would expect the commit to also
> be present.

Hmph, is that a fundamental requirement, or is that a limitation of
the current implementation?  Naïvely, I do not quite see why we
cannot first partially clone from a remote, access objects that
locally do not exist and lazily fetch them from the promissor, and
then build a reachability graph.  I expect that the resulting commit
graph records the lazily fetched objects at that point.  And then we
should be able to "lose" the lazily fetched objects that we know we
can fetch from the promissor again when we need them in the future.
And we would be in a situation where commits are pruned away, not
locally available in our object store, but can be (re)fetched from
the promisor, no?

> So we should use has_object() instead of
> repo_has_object_file(), which will help us avoid getting into an endless
> loop of lazy fetch.

It all depends on the reason we call lookup_commit_in_graph(), I
think.  Is there an easy way to remember the fact that we are
checking if object X is here with repo_has_object_file(X), so that
an on-demand fetch that happens when X does not locally exist would
not bother checking with lookup_commit_in_graph()?  IOW, temporarily
disable the use of commit-graph when we are lazily fetching?

> When we found the commit in the graph in lookup_commit_in_graph(),
> but the commit is missing from the repository, we will try
> promisor_remote_get_direct() and then enter another loop.  While
> sometimes it will finally succeed because it cannot fork
> subprocess,

Is that a mode of "succeed"-ing?  Or merely a way to exit an endless
loop that does not make any progress with a failure?

> it has exhausted the local process resources and can be harmful to the
> remote service.
>
> Signed-off-by: Han Xin <hanxin.hx@bytedance.com>
> ---

I think the single-liner change in the patch is a good one, but I am
having a hard time to agree with the reasoning above that explains
why it is a good change.

Here is an attempt to express a reasoning I can understand, can
agree with, and (I think) better describes why the change is a good
one.  Does my understanding of the problem and the solution totally
misses the mark?

	The commit-graph is used to opportunistically optimize
	accesses to certain pieces of information on commit objects,
	and lookup_commit_in_graph() tries to say "no" when the
	requested commit does not locally exist by returning NULL,
	in which case the caller can ask for (which may result in
	on-demand fetching from a promisor remote) and parse the
	commit object itself.

	However, it uses a wrong helper, repo_has_object_file(), to
	do so.  This helper not only checks if an object is
	immediately available in the local object store, but also
	tries to fetch from a promisor remote.  But the fetch
	machinery calls lookup_commit_in_graph(), thus causing an
	infinite loop.

	We should make lookup_commit_in_graph() expect that a commit
	given to it can be legitimately missing from the local
	object store, by using the has_object_file() helper instead.
	
> diff --git a/t/t5329-no-lazy-fetch-with-commit-graph.sh b/t/t5329-no-lazy-fetch-with-commit-graph.sh
> new file mode 100755
> index 0000000000..4d25d2c950
> --- /dev/null
> +++ b/t/t5329-no-lazy-fetch-with-commit-graph.sh

Hmph, does this short-test need a completely new file?

> @@ -0,0 +1,47 @@
> +#!/bin/sh
> +
> +test_description='test for no lazy fetch with the commit-graph'
> +
> +. ./test-lib.sh
> +
> +test_expect_success 'setup: prepare a repository with a commit' '
> +	git init with-commit &&
> +	test_commit -C with-commit the-commit &&
> +	oid=$(git -C with-commit rev-parse HEAD)
> +'
> +
> +test_expect_success 'setup: prepare a repository with commit-graph contains the commit' '
> +	git init with-commit-graph &&
> +	echo "$(pwd)/with-commit/.git/objects" \
> +		>with-commit-graph/.git/objects/info/alternates &&
> +	# create a ref that points to the commit in alternates
> +	git -C with-commit-graph update-ref refs/ref_to_the_commit "$oid" &&
> +	# prepare some other objects to commit-graph
> +	test_commit -C with-commit-graph somthing &&

somthing? something?

> +	git -c gc.writeCommitGraph=true -C with-commit-graph gc &&
> +	test_path_is_file with-commit-graph/.git/objects/info/commit-graph
> +'
> +
> +test_expect_success 'setup: change the alternates to what without the commit' '
> +	git init --bare without-commit &&
> +	echo "$(pwd)/without-commit/objects" \
> +		>with-commit-graph/.git/objects/info/alternates &&

Doesn't this deliberately _corrupt_ the with-commit-graph repository
that depended on the object whose name is $oid in with-commit
repository?  Do we require a corrupt repository to trigger the "bug"?

> +	test_must_fail git -C with-commit-graph cat-file -e $oid
> +'
> +
> +test_expect_success 'setup: prepare another commit to fetch' '
> +	test_commit -C with-commit another-commit &&
> +	anycommit=$(git -C with-commit rev-parse HEAD)

anycommit?  another_commit?  Be consistent in naming.

> +'
> +
> +test_expect_success ULIMIT_PROCESSES 'fetch any commit from promisor with the usage of the commit graph' '

So we did all of the above set-up sequences only to skip the most
interesting test, if we were testing with "dash"?  I suspect that it
may be cleaner to put the prerequisite to the whole file with the
"early test_done" trick like t0051 and t3008.

> +	git -C with-commit-graph remote add origin "$(pwd)/with-commit" &&
> +	git -C with-commit-graph config remote.origin.promisor true &&
> +	git -C with-commit-graph config remote.origin.partialclonefilter blob:none &&
> +	GIT_TRACE="$(pwd)/trace" run_with_limited_processses \
> +		git -C with-commit-graph fetch origin $anycommit 2>err &&
> +	test_i18ngrep ! "fatal: promisor-remote: unable to fork off fetch subprocess" err &&
> +	test $(grep "fetch origin" trace | wc -l) -eq 1
> +'
> +
> +test_done

Thanks.

  reply	other threads:[~2022-06-24 16:58 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-14  7:25 An endless loop fetching issue with partial clone, alternates and commit graph Haiyng Tan
2022-06-15  2:18 ` Taylor Blau
2022-06-16  3:38   ` [RFC PATCH 0/2] " Han Xin
2022-06-16  3:38     ` [RFC PATCH 1/2] commit-graph.c: add "flags" to lookup_commit_in_graph() Han Xin
2022-06-16  3:38     ` [RFC PATCH 2/2] fetch-pack.c: pass "oi_flags" " Han Xin
2022-06-17 21:47     ` [RFC PATCH 0/2] Re: An endless loop fetching issue with partial clone, alternates and commit graph Jonathan Tan
2022-06-18  3:01     ` [PATCH v1] commit-graph.c: no lazy fetch in lookup_commit_in_graph() Han Xin
2022-06-20  7:07       ` Patrick Steinhardt
2022-06-20  8:53         ` [External] " 欣韩
2022-06-20  9:05           ` Patrick Steinhardt
2022-06-21 18:23       ` Jonathan Tan
2022-06-22  3:17         ` Han Xin
2022-06-24  5:27       ` [PATCH v2 0/2] " Han Xin
2022-06-24  5:27         ` [PATCH v2 1/2] test-lib.sh: add limited processes to test-lib Han Xin
2022-06-24 16:03           ` Junio C Hamano
2022-06-25  1:35             ` Han Xin
2022-06-27 12:22               ` Junio C Hamano
2022-06-24  5:27         ` [PATCH v2 2/2] commit-graph.c: no lazy fetch in lookup_commit_in_graph() Han Xin
2022-06-24 16:56           ` Junio C Hamano [this message]
2022-06-25  2:25             ` Han Xin
2022-06-25  2:31               ` Han Xin
2022-06-28  2:02         ` [PATCH v3 0/2] " Han Xin
2022-06-28  2:02           ` [PATCH v3 1/2] test-lib.sh: add limited processes to test-lib Han Xin
2022-06-28  2:02           ` [PATCH v3 2/2] commit-graph.c: no lazy fetch in lookup_commit_in_graph() Han Xin
2022-06-28  7:49             ` Ævar Arnfjörð Bjarmason
2022-06-28 17:36               ` Junio C Hamano
2022-06-30 12:21                 ` Johannes Schindelin
2022-06-30 13:43                   ` Ævar Arnfjörð Bjarmason
2022-06-30 15:40                     ` Junio C Hamano
2022-06-30 18:47                       ` Ævar Arnfjörð Bjarmason
2022-07-01 19:31                       ` Johannes Schindelin
2022-07-01 20:47                         ` Junio C Hamano
2022-06-29  2:08               ` Han Xin
2022-06-30 17:37           ` test name conflict + js/ci-github-workflow-markup regression (was: [PATCH v3 0/2] no lazy fetch in lookup_commit_in_graph()) Ævar Arnfjörð Bjarmason
2022-07-01  1:34           ` [PATCH v4 0/1] no lazy fetch in lookup_commit_in_graph() Han Xin
2022-07-01  1:34             ` [PATCH v4 1/1] commit-graph.c: " Han Xin
2022-07-09 12:23               ` Michael J Gruber
2022-07-11 15:09                 ` Jeff King
2022-07-11 20:17                   ` Junio C Hamano
2022-07-12  1:52                     ` [External] " Han Xin
2022-07-12  5:23                       ` Junio C Hamano
2022-07-12  5:32                         ` Han Xin
2022-07-12  6:37                         ` [External] " Jeff King
2022-07-12 14:19                           ` Junio C Hamano
2022-07-12  6:50             ` [PATCH v5 0/1] " Han Xin
2022-07-12  6:50               ` [PATCH v5 1/1] commit-graph.c: " Han Xin
2022-07-12  9:50                 ` Ævar Arnfjörð Bjarmason
2022-07-13  1:26                   ` Han Xin
2022-07-12  6:58               ` [PATCH v5 0/1] " Jeff King
2022-07-12  8:01             ` [PATCH v1] t5330: remove run_with_limited_processses() Han Xin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqpmiyuhjj.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=chiyutianyi@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=haiyangtand@gmail.com \
    --cc=hanxin.hx@bytedance.com \
    --cc=jonathantanmy@google.com \
    --cc=me@ttaylorr.com \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).