git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* Re: An endless loop fetching issue with partial clone, alternates and commit graph
@ 2022-06-14  7:25 Haiyng Tan
  2022-06-15  2:18 ` Taylor Blau
  0 siblings, 1 reply; 27+ messages in thread
From: Haiyng Tan @ 2022-06-14  7:25 UTC (permalink / raw)
  To: git, chiyutianyi; +Cc: ps, jonathantanmy

On Mon, 13 Jun 2022 00:17:07 +0800, Han Xin wrote:
> We found an issue that could create an endless loop where alternates
> objects are used improperly.
>
> While do fetching in a partial cloned repository with a commit graph,
> deref_without_lazy_fetch_extended() will call lookup_commit_in_graph()
> to find the commit object. We can found the code in commit-graph.c:
>
>      struct commit *lookup_commit_in_graph(struct repository *repo, const struct object_id *id)
>      {
>           …
>           if (!search_commit_pos_in_graph(id, repo->objects->commit_graph, &pos))
>                return NULL;
>           if (!repo_has_object_file(repo, id))
>                return NULL;

> If we found the object in the commit graph, but missing it in the repository,
> we will go into an endless loop:
>      git fetch -> deref_without_lazy_fetch_extended() ->
>           lookup_commit_in_graph() -> repo_has_object_file() ->
>                promisor_remote_get_direct() -> fetch_objects() ->
>                     git fetch
>
> I know that the reason for this issue is due to improper use of
> alternates, we can ensure that objects will not be lost by maintaining
> all the references. But shouldn't we do something about this unusual
> usage, it will cause a fetch bombardment of the remote git service.
>
> We can reproduce this issue with the following test case, it will
> generate a lot of git processes, please be careful to stop it.
> ———————————————————————————
> #!/bin/sh
>
> test_description='test for an endless loop fetching’
>
> GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
> export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
>
> . ./test-lib.sh
>
> test_expect_success 'setup’ ‘
>     git init --bare dest.git &&
>     test_commit one &&
>    git checkout -b testbranch &&
>    test_commit two &&
>    git push dest.git --all
> '
>
> test_expect_success 'prepare a alternates repository without testbranch' '
>    git clone -b $GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME dest.git alternates &&
>    oid=$(git -C alternates rev-parse refs/remotes/origin/testbranch) &&
>    git -C alternates update-ref -d refs/remotes/origin/testbranch &&
>    git -C alternates gc --prune=now
> '
>
> test_expect_success 'prepare a repository with commit-graph' '
>    git init source &&
>    echo "$(pwd)/dest.git/objects" >source/.git/objects/info/alternates &&
>    git -C source remote add origin "$(pwd)/dest.git" &&
>    git -C source config remote.origin.promisor true &&
>    git -C source config remote.origin.partialclonefilter blob:none &&
>    git -C source fetch origin &&
>    (
>        cd source &&
>        test_commit three &&
>        git -c gc.writeCommitGraph=true gc
>    )
> '
>
> test_expect_success 'change alternates' '
>    echo "$(pwd)/alternates/.git/objects" >source/.git/objects/info/alternates &&
>    # this will bring an endless loop fetching
>    git -C source fetch origin $oid
> '
>
> test_done
>
> ------------------------------------------------------
>
> Thanks
> -Han Xin

I think it's caused by using lazy-fetch in deref_without_lazy_fetch_extended().
In lookup_commit_in_graph(), lazy-fetch is initiated by
repo_has_object_file() used.
has_object() should be used, it's no-lazy-fetch.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: An endless loop fetching issue with partial clone, alternates and commit graph
  2022-06-14  7:25 An endless loop fetching issue with partial clone, alternates and commit graph Haiyng Tan
@ 2022-06-15  2:18 ` Taylor Blau
  2022-06-16  3:38   ` [RFC PATCH 0/2] " Han Xin
  0 siblings, 1 reply; 27+ messages in thread
From: Taylor Blau @ 2022-06-15  2:18 UTC (permalink / raw)
  To: Haiyng Tan; +Cc: git, chiyutianyi, ps, jonathantanmy, Derrick Stolee

[+cc Stolee]

On Tue, Jun 14, 2022 at 03:25:13PM +0800, Haiyng Tan wrote:
> I think it's caused by using lazy-fetch in
> deref_without_lazy_fetch_extended().  In lookup_commit_in_graph(),
> lazy-fetch is initiated by repo_has_object_file() used.  has_object()
> should be used, it's no-lazy-fetch.

Hmm. Are there cases where lookup_commit_in_graph() is expected to
lazily fetch missing objects from promisor remotes? If so, then this
wouldn't quite work. If not, then this seems like an appropriate fix to
me.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [RFC PATCH 0/2] Re: An endless loop fetching issue with partial clone, alternates and commit graph
  2022-06-15  2:18 ` Taylor Blau
@ 2022-06-16  3:38   ` Han Xin
  2022-06-16  3:38     ` [RFC PATCH 1/2] commit-graph.c: add "flags" to lookup_commit_in_graph() Han Xin
                       ` (3 more replies)
  0 siblings, 4 replies; 27+ messages in thread
From: Han Xin @ 2022-06-16  3:38 UTC (permalink / raw)
  To: me
  Cc: chiyutianyi, derrickstolee, git, haiyangtand, jonathantanmy, ps, Han Xin

On Wed, Jun 15, 2022 at 10:18 AM Taylor Blau <me@ttaylorr.com> wrote:
>
> [+cc Stolee]
>
> On Tue, Jun 14, 2022 at 03:25:13PM +0800, Haiyng Tan wrote:
> > I think it's caused by using lazy-fetch in
> > deref_without_lazy_fetch_extended().  In lookup_commit_in_graph(),
> > lazy-fetch is initiated by repo_has_object_file() used.  has_object()
> > should be used, it's no-lazy-fetch.
>
> Hmm. Are there cases where lookup_commit_in_graph() is expected to
> lazily fetch missing objects from promisor remotes? If so, then this
> wouldn't quite work. If not, then this seems like an appropriate fix to
> me.
>
> Thanks,
> Taylor

We can see the use of has_object() in RelNotes/2.29.0.txt[1]:
   * A new helper function has_object() has been introduced to make it
     easier to mark object existence checks that do and don't want to
     trigger lazy fetches, and a few such checks are converted using it.

Let's see the difference between has_object() and repo_has_object_file():
    int has_object(struct repository *r, const struct object_id *oid,
            unsigned flags)
    {
        int quick = !(flags & HAS_OBJECT_RECHECK_PACKED);
        unsigned object_info_flags = OBJECT_INFO_SKIP_FETCH_OBJECT |
            (quick ? OBJECT_INFO_QUICK : 0);

        if (!startup_info->have_repository)
            return 0;
        return oid_object_info_extended(r, oid, NULL, object_info_flags) >= 0;
    }

    int repo_has_object_file_with_flags(struct repository *r,
                        const struct object_id *oid, int flags)
    {
        if (!startup_info->have_repository)
            return 0;
        return oid_object_info_extended(r, oid, NULL, flags) >= 0;
    }

    int repo_has_object_file(struct repository *r,
                const struct object_id *oid)
    {
        return repo_has_object_file_with_flags(r, oid, 0);
    }

Now we kown that has_object() add OBJECT_INFO_SKIP_FETCH_OBJECT to skip
fetch object.

I found that Ævar Arnfjörð Bjarmason added deref_without_lazy_fetch()
4 weeks ago[2]:
    static struct commit *deref_without_lazy_fetch(const struct object_id *oid,
                            int mark_tags_complete)
    {
        enum object_type type;
        unsigned flags = OBJECT_INFO_SKIP_FETCH_OBJECT | OBJECT_INFO_QUICK;
        return deref_without_lazy_fetch_extended(oid, mark_tags_complete,
                            &type, flags);
    }

But oi_flags is only used by oid_object_info_extended() and is missed by
lookup_commit_in_graph():
    static struct commit *deref_without_lazy_fetch_extended(const struct object_id *oid,
                                int mark_tags_complete,
                                enum object_type *type,
                                unsigned int oi_flags)
    {
        struct object_info info = { .typep = type };
        struct commit *commit;

        commit = lookup_commit_in_graph(the_repository, oid);
        if (commit)
            return commit;

        while (1) {
            if (oid_object_info_extended(the_repository, oid, &info,
                            oi_flags))

So, an appropriate fix can be that let lookup_commit_in_graph() pickup
oi_flags and pass it to oid_object_info_extended(), then the fetching
loop will be prevent by the given flag OBJECT_INFO_SKIP_FETCH_OBJECT.

1. https://github.com/git/git/blob/master/Documentation/RelNotes/2.29.0.txt
2. https://lore.kernel.org/git/2a563b5f18cc9c42cb71a9547344a5435f6bc058.1652731865.git.gitgitgadget@gmail.com/

Thanks
-Han Xin

Han Xin (2):
  commit-graph.c: add "flags" to lookup_commit_in_graph()
  fetch-pack.c: pass "oi_flags" to lookup_commit_in_graph()

 builtin/fetch.c                    |  4 ++-
 commit-graph.c                     |  5 ++--
 commit-graph.h                     |  3 +-
 fetch-pack.c                       | 10 +++----
 revision.c                         |  2 +-
 t/t5583-fetch-with-commit-graph.sh | 47 ++++++++++++++++++++++++++++++
 upload-pack.c                      |  5 ++--
 7 files changed, 64 insertions(+), 12 deletions(-)
 create mode 100644 t/t5583-fetch-with-commit-graph.sh

-- 
2.36.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [RFC PATCH 1/2] commit-graph.c: add "flags" to lookup_commit_in_graph()
  2022-06-16  3:38   ` [RFC PATCH 0/2] " Han Xin
@ 2022-06-16  3:38     ` Han Xin
  2022-06-16  3:38     ` [RFC PATCH 2/2] fetch-pack.c: pass "oi_flags" " Han Xin
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 27+ messages in thread
From: Han Xin @ 2022-06-16  3:38 UTC (permalink / raw)
  To: me
  Cc: chiyutianyi, derrickstolee, git, haiyangtand, jonathantanmy, ps, Han Xin

When try to do deref_without_lazy_fetch_extended(), "oi_flags" will
be missed by lookup_commit_in_graph(), then repo_has_object_file()
may start a new round objects fetching.
So let's add "flags" to lookup_commit_in_graph() and use
repo_has_object_file_with_flags() to pass the flags.

Signed-off-by: Han Xin <hanxin.hx@bytedance.com>
---
 builtin/fetch.c | 4 +++-
 commit-graph.c  | 5 +++--
 commit-graph.h  | 3 ++-
 fetch-pack.c    | 4 ++--
 revision.c      | 2 +-
 upload-pack.c   | 5 +++--
 6 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index ac29c2b1ae..44285d5318 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1179,7 +1179,9 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 				 * annotated tags.
 				 */
 				if (!starts_with(rm->name, "refs/tags/"))
-					commit = lookup_commit_in_graph(the_repository, &rm->old_oid);
+					commit = lookup_commit_in_graph(
+						the_repository, &rm->old_oid,
+						0);
 				if (!commit) {
 					commit = lookup_commit_reference_gently(the_repository,
 										&rm->old_oid,
diff --git a/commit-graph.c b/commit-graph.c
index 92d4503336..b09f454bb5 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -889,7 +889,8 @@ static int find_commit_pos_in_graph(struct commit *item, struct commit_graph *g,
 	}
 }
 
-struct commit *lookup_commit_in_graph(struct repository *repo, const struct object_id *id)
+struct commit *lookup_commit_in_graph(struct repository *repo,
+				      const struct object_id *id, int flags)
 {
 	struct commit *commit;
 	uint32_t pos;
@@ -898,7 +899,7 @@ struct commit *lookup_commit_in_graph(struct repository *repo, const struct obje
 		return NULL;
 	if (!search_commit_pos_in_graph(id, repo->objects->commit_graph, &pos))
 		return NULL;
-	if (!repo_has_object_file(repo, id))
+	if (!repo_has_object_file_with_flags(repo, id, flags))
 		return NULL;
 
 	commit = lookup_commit(repo, id);
diff --git a/commit-graph.h b/commit-graph.h
index 2e3ac35237..747a67c0ee 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -46,7 +46,8 @@ int parse_commit_in_graph(struct repository *r, struct commit *item);
  * that we don't return commits whose object has been pruned. Otherwise, this
  * function returns `NULL`.
  */
-struct commit *lookup_commit_in_graph(struct repository *repo, const struct object_id *id);
+struct commit *lookup_commit_in_graph(struct repository *repo,
+				      const struct object_id *id, int flags);
 
 /*
  * It is possible that we loaded commit contents from the commit buffer,
diff --git a/fetch-pack.c b/fetch-pack.c
index cb6647d657..4a62fb182e 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -123,7 +123,7 @@ static struct commit *deref_without_lazy_fetch_extended(const struct object_id *
 	struct object_info info = { .typep = type };
 	struct commit *commit;
 
-	commit = lookup_commit_in_graph(the_repository, oid);
+	commit = lookup_commit_in_graph(the_repository, oid, 0);
 	if (commit)
 		return commit;
 
@@ -714,7 +714,7 @@ static void mark_complete_and_common_ref(struct fetch_negotiator *negotiator,
 	for (ref = *refs; ref; ref = ref->next) {
 		struct commit *commit;
 
-		commit = lookup_commit_in_graph(the_repository, &ref->old_oid);
+		commit = lookup_commit_in_graph(the_repository, &ref->old_oid, 0);
 		if (!commit) {
 			struct object *o;
 
diff --git a/revision.c b/revision.c
index 211352795c..df5db51f98 100644
--- a/revision.c
+++ b/revision.c
@@ -379,7 +379,7 @@ static struct object *get_reference(struct rev_info *revs, const char *name,
 	 * look up the object ID in those graphs. Like this, we can avoid
 	 * parsing commit data from disk.
 	 */
-	commit = lookup_commit_in_graph(revs->repo, oid);
+	commit = lookup_commit_in_graph(revs->repo, oid, 0);
 	if (commit)
 		object = &commit->object;
 	else
diff --git a/upload-pack.c b/upload-pack.c
index 3a851b3606..0fa9c3cf3f 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -1407,7 +1407,7 @@ static int parse_want(struct packet_writer *writer, const char *line,
 			die("git upload-pack: protocol error, "
 			    "expected to get oid, not '%s'", line);
 
-		commit = lookup_commit_in_graph(the_repository, &oid);
+		commit = lookup_commit_in_graph(the_repository, &oid, 0);
 		if (commit)
 			o = &commit->object;
 		else
@@ -1455,7 +1455,8 @@ static int parse_want_ref(struct packet_writer *writer, const char *line,
 		item->util = oiddup(&oid);
 
 		if (!starts_with(refname_nons, "refs/tags/")) {
-			struct commit *commit = lookup_commit_in_graph(the_repository, &oid);
+			struct commit *commit =
+				lookup_commit_in_graph(the_repository, &oid, 0);
 			if (commit)
 				o = &commit->object;
 		}
-- 
2.36.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [RFC PATCH 2/2] fetch-pack.c: pass "oi_flags" to lookup_commit_in_graph()
  2022-06-16  3:38   ` [RFC PATCH 0/2] " Han Xin
  2022-06-16  3:38     ` [RFC PATCH 1/2] commit-graph.c: add "flags" to lookup_commit_in_graph() Han Xin
@ 2022-06-16  3:38     ` Han Xin
  2022-06-17 21:47     ` [RFC PATCH 0/2] Re: An endless loop fetching issue with partial clone, alternates and commit graph Jonathan Tan
  2022-06-18  3:01     ` [PATCH v1] commit-graph.c: no lazy fetch in lookup_commit_in_graph() Han Xin
  3 siblings, 0 replies; 27+ messages in thread
From: Han Xin @ 2022-06-16  3:38 UTC (permalink / raw)
  To: me
  Cc: chiyutianyi, derrickstolee, git, haiyangtand, jonathantanmy, ps, Han Xin

As the custom "oi_flags" is missed by lookup_commit_in_graph(), we will
get another lazy fetch round if we found the commit in commit graph but
miss it in the local object repository.

We can see the issue via[1].

1. https://lore.kernel.org/git/20220612161707.21807-1-chiyutianyi@gmail.com/

Signed-off-by: Han Xin <hanxin.hx@bytedance.com>
---
 fetch-pack.c                       | 10 +++----
 t/t5583-fetch-with-commit-graph.sh | 47 ++++++++++++++++++++++++++++++
 2 files changed, 52 insertions(+), 5 deletions(-)
 create mode 100644 t/t5583-fetch-with-commit-graph.sh

diff --git a/fetch-pack.c b/fetch-pack.c
index 4a62fb182e..ca1234e456 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -123,7 +123,7 @@ static struct commit *deref_without_lazy_fetch_extended(const struct object_id *
 	struct object_info info = { .typep = type };
 	struct commit *commit;
 
-	commit = lookup_commit_in_graph(the_repository, oid, 0);
+	commit = lookup_commit_in_graph(the_repository, oid, oi_flags);
 	if (commit)
 		return commit;
 
@@ -704,6 +704,7 @@ static void mark_complete_and_common_ref(struct fetch_negotiator *negotiator,
 	struct ref *ref;
 	int old_save_commit_buffer = save_commit_buffer;
 	timestamp_t cutoff = 0;
+	int oi_flags = OBJECT_INFO_SKIP_FETCH_OBJECT | OBJECT_INFO_QUICK;
 
 	if (args->refetch)
 		return;
@@ -714,13 +715,12 @@ static void mark_complete_and_common_ref(struct fetch_negotiator *negotiator,
 	for (ref = *refs; ref; ref = ref->next) {
 		struct commit *commit;
 
-		commit = lookup_commit_in_graph(the_repository, &ref->old_oid, 0);
+		commit = lookup_commit_in_graph(the_repository, &ref->old_oid,
+						oi_flags);
 		if (!commit) {
 			struct object *o;
 
-			if (!has_object_file_with_flags(&ref->old_oid,
-						OBJECT_INFO_QUICK |
-						OBJECT_INFO_SKIP_FETCH_OBJECT))
+			if (!has_object_file_with_flags(&ref->old_oid, oi_flags))
 				continue;
 			o = parse_object(the_repository, &ref->old_oid);
 			if (!o || o->type != OBJ_COMMIT)
diff --git a/t/t5583-fetch-with-commit-graph.sh b/t/t5583-fetch-with-commit-graph.sh
new file mode 100644
index 0000000000..cb2beafa8d
--- /dev/null
+++ b/t/t5583-fetch-with-commit-graph.sh
@@ -0,0 +1,47 @@
+#!/bin/sh
+
+test_description='test for fetching missing object with a full commit-graph'
+
+GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
+export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	git init --bare dest.git &&
+	test_commit one &&
+	git checkout -b testbranch &&
+	test_commit two &&
+	git push dest.git --all
+'
+
+test_expect_success 'prepare a alternates repository without testbranch' '
+	git clone -b $GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME dest.git alternates &&
+	oid=$(git -C alternates rev-parse refs/remotes/origin/testbranch) &&
+	git -C alternates update-ref -d refs/remotes/origin/testbranch &&
+	git -C alternates gc --prune=now
+'
+
+test_expect_success 'prepare a repository with a full commit-graph' '
+	git init source &&
+	echo "$(pwd)/dest.git/objects" >source/.git/objects/info/alternates &&
+	git -C source remote add origin "$(pwd)/dest.git" &&
+	git -C source config remote.origin.promisor true &&
+	git -C source config remote.origin.partialclonefilter blob:none &&
+	git -C source fetch origin &&
+	(
+		cd source &&
+		test_commit three &&
+		git -c gc.writeCommitGraph=true gc
+	)
+'
+
+test_expect_success 'change the alternates to that without commit two' '
+	echo "$(pwd)/alternates/.git/objects" >source/.git/objects/info/alternates
+'
+
+test_expect_success 'fetch the missing object' '
+	git -C source fetch origin $oid
+'
+
+test_done
-- 
2.36.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 0/2] Re: An endless loop fetching issue with partial clone, alternates and commit graph
  2022-06-16  3:38   ` [RFC PATCH 0/2] " Han Xin
  2022-06-16  3:38     ` [RFC PATCH 1/2] commit-graph.c: add "flags" to lookup_commit_in_graph() Han Xin
  2022-06-16  3:38     ` [RFC PATCH 2/2] fetch-pack.c: pass "oi_flags" " Han Xin
@ 2022-06-17 21:47     ` Jonathan Tan
  2022-06-18  3:01     ` [PATCH v1] commit-graph.c: no lazy fetch in lookup_commit_in_graph() Han Xin
  3 siblings, 0 replies; 27+ messages in thread
From: Jonathan Tan @ 2022-06-17 21:47 UTC (permalink / raw)
  To: Han Xin
  Cc: Jonathan Tan, me, chiyutianyi, derrickstolee, git, haiyangtand, ps

Han Xin <hanxin.hx@bytedance.com> writes:
> On Wed, Jun 15, 2022 at 10:18 AM Taylor Blau <me@ttaylorr.com> wrote:
> >
> > [+cc Stolee]
> >
> > On Tue, Jun 14, 2022 at 03:25:13PM +0800, Haiyng Tan wrote:
> > > I think it's caused by using lazy-fetch in
> > > deref_without_lazy_fetch_extended().  In lookup_commit_in_graph(),
> > > lazy-fetch is initiated by repo_has_object_file() used.  has_object()
> > > should be used, it's no-lazy-fetch.
> >
> > Hmm. Are there cases where lookup_commit_in_graph() is expected to
> > lazily fetch missing objects from promisor remotes? If so, then this
> > wouldn't quite work. If not, then this seems like an appropriate fix to
> > me.
> >
> > Thanks,
> > Taylor

I think that if a commit is in the commit graph, we would expect the
commit to also be present. So changing to has_object() makes sense.

> We can see the use of has_object() in RelNotes/2.29.0.txt[1]:
>    * A new helper function has_object() has been introduced to make it
>      easier to mark object existence checks that do and don't want to
>      trigger lazy fetches, and a few such checks are converted using it.

Also relevant is the comment on repo_has_object_file() in
object-store.h.

> So, an appropriate fix can be that let lookup_commit_in_graph() pickup
> oi_flags and pass it to oid_object_info_extended(), then the fetching
> loop will be prevent by the given flag OBJECT_INFO_SKIP_FETCH_OBJECT.

Hmm...why not change it to has_object_file() instead, as Haiyng Tan
mentioned?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v1] commit-graph.c: no lazy fetch in lookup_commit_in_graph()
  2022-06-16  3:38   ` [RFC PATCH 0/2] " Han Xin
                       ` (2 preceding siblings ...)
  2022-06-17 21:47     ` [RFC PATCH 0/2] Re: An endless loop fetching issue with partial clone, alternates and commit graph Jonathan Tan
@ 2022-06-18  3:01     ` Han Xin
  2022-06-20  7:07       ` Patrick Steinhardt
                         ` (2 more replies)
  3 siblings, 3 replies; 27+ messages in thread
From: Han Xin @ 2022-06-18  3:01 UTC (permalink / raw)
  To: hanxin.hx
  Cc: chiyutianyi, derrickstolee, git, haiyangtand, jonathantanmy, me, ps

If a commit is in the commit graph, we would expect the commit to also
be present. So we should use has_object() instead of
repo_has_object_file(), which will help us avoid getting into an endless
loop of lazy fetch.

We can see the endless loop issue via this[1].

1. https://lore.kernel.org/git/20220612161707.21807-1-chiyutianyi@gmail.com/

Signed-off-by: Han Xin <hanxin.hx@bytedance.com>
---
 commit-graph.c                             |  2 +-
 t/t5329-no-lazy-fetch-with-commit-graph.sh | 50 ++++++++++++++++++++++
 2 files changed, 51 insertions(+), 1 deletion(-)
 create mode 100755 t/t5329-no-lazy-fetch-with-commit-graph.sh

diff --git a/commit-graph.c b/commit-graph.c
index 2b52818731..2dd9bcc7ea 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -907,7 +907,7 @@ struct commit *lookup_commit_in_graph(struct repository *repo, const struct obje
 		return NULL;
 	if (!search_commit_pos_in_graph(id, repo->objects->commit_graph, &pos))
 		return NULL;
-	if (!repo_has_object_file(repo, id))
+	if (!has_object(repo, id, 0))
 		return NULL;
 
 	commit = lookup_commit(repo, id);
diff --git a/t/t5329-no-lazy-fetch-with-commit-graph.sh b/t/t5329-no-lazy-fetch-with-commit-graph.sh
new file mode 100755
index 0000000000..ea5940b9f1
--- /dev/null
+++ b/t/t5329-no-lazy-fetch-with-commit-graph.sh
@@ -0,0 +1,50 @@
+#!/bin/sh
+
+test_description='test for no lazy fetch with the commit-graph'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	git init --bare dest.git &&
+	test_commit one &&
+	git checkout -b tmp &&
+	test_commit two &&
+	git push dest.git --all
+'
+
+test_expect_success 'prepare a alternates repository without commit two' '
+	git clone --bare dest.git alternates &&
+	oid=$(git -C alternates rev-parse refs/heads/tmp) &&
+	git -C alternates update-ref -d refs/heads/tmp &&
+	git -C alternates gc --prune=now &&
+	pack=$(echo alternates/objects/pack/*.pack) &&
+	git verify-pack -v "$pack" >have &&
+	! grep "$oid" have
+'
+
+test_expect_success 'prepare a repository with a commit-graph contains commit two' '
+	git init source &&
+	echo "$(pwd)/dest.git/objects" >source/.git/objects/info/alternates &&
+	git -C source remote add origin "$(pwd)/dest.git" &&
+	git -C source config remote.origin.promisor true &&
+	git -C source config remote.origin.partialclonefilter blob:none &&
+	# the source repository has the whole refs contains refs/heads/tmp
+	git -C source fetch origin &&
+	(
+		cd source &&
+		test_commit three &&
+		git -c gc.writeCommitGraph=true gc
+	)
+'
+
+test_expect_success 'change the alternates of source to that without commit two' '
+	# now we have a commit-graph in the source repository but without the commit two
+	echo "$(pwd)/alternates/objects" >source/.git/objects/info/alternates
+'
+
+test_expect_success 'fetch the missing commit' '
+	git -C source fetch origin $oid 2>fetch.out &&
+	grep "$oid" fetch.out
+'
+
+test_done
-- 
2.36.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1] commit-graph.c: no lazy fetch in lookup_commit_in_graph()
  2022-06-18  3:01     ` [PATCH v1] commit-graph.c: no lazy fetch in lookup_commit_in_graph() Han Xin
@ 2022-06-20  7:07       ` Patrick Steinhardt
  2022-06-20  8:53         ` [External] " 欣韩
  2022-06-21 18:23       ` Jonathan Tan
  2022-06-24  5:27       ` [PATCH v2 0/2] " Han Xin
  2 siblings, 1 reply; 27+ messages in thread
From: Patrick Steinhardt @ 2022-06-20  7:07 UTC (permalink / raw)
  To: Han Xin; +Cc: chiyutianyi, derrickstolee, git, haiyangtand, jonathantanmy, me

[-- Attachment #1: Type: text/plain, Size: 3810 bytes --]

On Sat, Jun 18, 2022 at 11:01:30AM +0800, Han Xin wrote:
> If a commit is in the commit graph, we would expect the commit to also
> be present. So we should use has_object() instead of
> repo_has_object_file(), which will help us avoid getting into an endless
> loop of lazy fetch.
> 
> We can see the endless loop issue via this[1].
> 
> 1. https://lore.kernel.org/git/20220612161707.21807-1-chiyutianyi@gmail.com/
> 
> Signed-off-by: Han Xin <hanxin.hx@bytedance.com>

Thanks a lot for working on this issue!

> ---
>  commit-graph.c                             |  2 +-
>  t/t5329-no-lazy-fetch-with-commit-graph.sh | 50 ++++++++++++++++++++++
>  2 files changed, 51 insertions(+), 1 deletion(-)
>  create mode 100755 t/t5329-no-lazy-fetch-with-commit-graph.sh
> 
> diff --git a/commit-graph.c b/commit-graph.c
> index 2b52818731..2dd9bcc7ea 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -907,7 +907,7 @@ struct commit *lookup_commit_in_graph(struct repository *repo, const struct obje
>  		return NULL;
>  	if (!search_commit_pos_in_graph(id, repo->objects->commit_graph, &pos))
>  		return NULL;
> -	if (!repo_has_object_file(repo, id))
> +	if (!has_object(repo, id, 0))
>  		return NULL;

Agreed, this change makes sense to me.

>  	commit = lookup_commit(repo, id);
> diff --git a/t/t5329-no-lazy-fetch-with-commit-graph.sh b/t/t5329-no-lazy-fetch-with-commit-graph.sh
> new file mode 100755
> index 0000000000..ea5940b9f1
> --- /dev/null
> +++ b/t/t5329-no-lazy-fetch-with-commit-graph.sh
> @@ -0,0 +1,50 @@
> +#!/bin/sh
> +
> +test_description='test for no lazy fetch with the commit-graph'
> +
> +. ./test-lib.sh
> +
> +test_expect_success 'setup' '

Nit: I find it a bit confusing that you only call the first test case
`setup`, while all the other tests except for the last one are also only
setting up the necessary state.

> +	git init --bare dest.git &&
> +	test_commit one &&
> +	git checkout -b tmp &&
> +	test_commit two &&
> +	git push dest.git --all
> +'
> +
> +test_expect_success 'prepare a alternates repository without commit two' '
> 
> +	git clone --bare dest.git alternates &&
> 
> +	oid=$(git -C alternates rev-parse refs/heads/tmp) &&
> +	git -C alternates update-ref -d refs/heads/tmp &&
> +	git -C alternates gc --prune=now &&
> +	pack=$(echo alternates/objects/pack/*.pack) &&
> +	git verify-pack -v "$pack" >have &&
> +	! grep "$oid" have
> +'

Instead of going into the low-level details you could just verify that
`git cat-file -e $oid` returns `1`.

> +
> +test_expect_success 'prepare a repository with a commit-graph contains commit two' '
> +	git init source &&
> +	echo "$(pwd)/dest.git/objects" >source/.git/objects/info/alternates &&
> +	git -C source remote add origin "$(pwd)/dest.git" &&
> +	git -C source config remote.origin.promisor true &&
> +	git -C source config remote.origin.partialclonefilter blob:none &&
> +	# the source repository has the whole refs contains refs/heads/tmp
> +	git -C source fetch origin &&
> +	(
> +		cd source &&
> +		test_commit three &&
> +		git -c gc.writeCommitGraph=true gc
> +	)
> +'
> +
> +test_expect_success 'change the alternates of source to that without commit two' '
> +	# now we have a commit-graph in the source repository but without the commit two
> +	echo "$(pwd)/alternates/objects" >source/.git/objects/info/alternates
> +'
> +
> +test_expect_success 'fetch the missing commit' '
> +	git -C source fetch origin $oid 2>fetch.out &&
> +	grep "$oid" fetch.out
> +'

This test passes even without your fix, albeit a lot slower compared
to with it. Can we somehow cause it to fail reliably so that the test
becomes effective in catching a regression here?

> +test_done
> -- 
> 2.36.1
> 

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [External] Re: [PATCH v1] commit-graph.c: no lazy fetch in lookup_commit_in_graph()
  2022-06-20  7:07       ` Patrick Steinhardt
@ 2022-06-20  8:53         ` 欣韩
  2022-06-20  9:05           ` Patrick Steinhardt
  0 siblings, 1 reply; 27+ messages in thread
From: 欣韩 @ 2022-06-20  8:53 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: chiyutianyi, derrickstolee, git, haiyangtand, jonathantanmy, me

On Mon, Jun 20, 2022 at 3:34 PM Patrick Steinhardt <ps@pks.im> wrote:
>
> On Sat, Jun 18, 2022 at 11:01:30AM +0800, Han Xin wrote:
> > If a commit is in the commit graph, we would expect the commit to also
> > be present. So we should use has_object() instead of
> > repo_has_object_file(), which will help us avoid getting into an endless
> > loop of lazy fetch.
> >
> > We can see the endless loop issue via this[1].
> >
> > 1. https://lore.kernel.org/git/20220612161707.21807-1-chiyutianyi@gmail.com/
> >
> > Signed-off-by: Han Xin <hanxin.hx@bytedance.com>
>
> Thanks a lot for working on this issue!
>
> > ---
> >  commit-graph.c                             |  2 +-
> >  t/t5329-no-lazy-fetch-with-commit-graph.sh | 50 ++++++++++++++++++++++
> >  2 files changed, 51 insertions(+), 1 deletion(-)
> >  create mode 100755 t/t5329-no-lazy-fetch-with-commit-graph.sh
> >
> > diff --git a/commit-graph.c b/commit-graph.c
> > index 2b52818731..2dd9bcc7ea 100644
> > --- a/commit-graph.c
> > +++ b/commit-graph.c
> > @@ -907,7 +907,7 @@ struct commit *lookup_commit_in_graph(struct repository *repo, const struct obje
> >               return NULL;
> >       if (!search_commit_pos_in_graph(id, repo->objects->commit_graph, &pos))
> >               return NULL;
> > -     if (!repo_has_object_file(repo, id))
> > +     if (!has_object(repo, id, 0))
> >               return NULL;
>
> Agreed, this change makes sense to me.
>
> >       commit = lookup_commit(repo, id);
> > diff --git a/t/t5329-no-lazy-fetch-with-commit-graph.sh b/t/t5329-no-lazy-fetch-with-commit-graph.sh
> > new file mode 100755
> > index 0000000000..ea5940b9f1
> > --- /dev/null
> > +++ b/t/t5329-no-lazy-fetch-with-commit-graph.sh
> > @@ -0,0 +1,50 @@
> > +#!/bin/sh
> > +
> > +test_description='test for no lazy fetch with the commit-graph'
> > +
> > +. ./test-lib.sh
> > +
> > +test_expect_success 'setup' '
>
> Nit: I find it a bit confusing that you only call the first test case
> `setup`, while all the other tests except for the last one are also only
> setting up the necessary state.

Nod.
There is indeed some misleading information here.
I will adjust it in the next patch.

>
> > +     git init --bare dest.git &&
> > +     test_commit one &&
> > +     git checkout -b tmp &&
> > +     test_commit two &&
> > +     git push dest.git --all
> > +'
> > +
> > +test_expect_success 'prepare a alternates repository without commit two' '
> >
> > +     git clone --bare dest.git alternates &&
> >
> > +     oid=$(git -C alternates rev-parse refs/heads/tmp) &&
> > +     git -C alternates update-ref -d refs/heads/tmp &&
> > +     git -C alternates gc --prune=now &&
> > +     pack=$(echo alternates/objects/pack/*.pack) &&
> > +     git verify-pack -v "$pack" >have &&
> > +     ! grep "$oid" have
> > +'
>
> Instead of going into the low-level details you could just verify that
> `git cat-file -e $oid` returns `1`.
>

Nod.

> > +
> > +test_expect_success 'prepare a repository with a commit-graph contains commit two' '
> > +     git init source &&
> > +     echo "$(pwd)/dest.git/objects" >source/.git/objects/info/alternates &&
> > +     git -C source remote add origin "$(pwd)/dest.git" &&
> > +     git -C source config remote.origin.promisor true &&
> > +     git -C source config remote.origin.partialclonefilter blob:none &&
> > +     # the source repository has the whole refs contains refs/heads/tmp
> > +     git -C source fetch origin &&
> > +     (
> > +             cd source &&
> > +             test_commit three &&
> > +             git -c gc.writeCommitGraph=true gc
> > +     )
> > +'
> > +
> > +test_expect_success 'change the alternates of source to that without commit two' '
> > +     # now we have a commit-graph in the source repository but without the commit two
> > +     echo "$(pwd)/alternates/objects" >source/.git/objects/info/alternates
> > +'
> > +
> > +test_expect_success 'fetch the missing commit' '
> > +     git -C source fetch origin $oid 2>fetch.out &&
> > +     grep "$oid" fetch.out
> > +'
>
> This test passes even without your fix, albeit a lot slower compared
> to with it. Can we somehow cause it to fail reliably so that the test
> becomes effective in catching a regression here?
>

Could you help me find the reason why this testcase passes even
without the fix.

From the execution of Github Action, it seems that the problem always exist:
https://github.com/chiyutianyi/git/actions/runs/2527421443.

Thanks.
-Han Xin

> > +test_done
> > --
> > 2.36.1
> >
>
> Patrick

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [External] Re: [PATCH v1] commit-graph.c: no lazy fetch in lookup_commit_in_graph()
  2022-06-20  8:53         ` [External] " 欣韩
@ 2022-06-20  9:05           ` Patrick Steinhardt
  0 siblings, 0 replies; 27+ messages in thread
From: Patrick Steinhardt @ 2022-06-20  9:05 UTC (permalink / raw)
  To: 欣韩
  Cc: chiyutianyi, derrickstolee, git, haiyangtand, jonathantanmy, me

[-- Attachment #1: Type: text/plain, Size: 2157 bytes --]

On Mon, Jun 20, 2022 at 04:53:47PM +0800, 欣韩 wrote:
> On Mon, Jun 20, 2022 at 3:34 PM Patrick Steinhardt <ps@pks.im> wrote:
> >
> > On Sat, Jun 18, 2022 at 11:01:30AM +0800, Han Xin wrote:
[snip]
> > > +test_expect_success 'prepare a repository with a commit-graph contains commit two' '
> > > +     git init source &&
> > > +     echo "$(pwd)/dest.git/objects" >source/.git/objects/info/alternates &&
> > > +     git -C source remote add origin "$(pwd)/dest.git" &&
> > > +     git -C source config remote.origin.promisor true &&
> > > +     git -C source config remote.origin.partialclonefilter blob:none &&
> > > +     # the source repository has the whole refs contains refs/heads/tmp
> > > +     git -C source fetch origin &&
> > > +     (
> > > +             cd source &&
> > > +             test_commit three &&
> > > +             git -c gc.writeCommitGraph=true gc
> > > +     )
> > > +'
> > > +
> > > +test_expect_success 'change the alternates of source to that without commit two' '
> > > +     # now we have a commit-graph in the source repository but without the commit two
> > > +     echo "$(pwd)/alternates/objects" >source/.git/objects/info/alternates
> > > +'
> > > +
> > > +test_expect_success 'fetch the missing commit' '
> > > +     git -C source fetch origin $oid 2>fetch.out &&
> > > +     grep "$oid" fetch.out
> > > +'
> >
> > This test passes even without your fix, albeit a lot slower compared
> > to with it. Can we somehow cause it to fail reliably so that the test
> > becomes effective in catching a regression here?
> >
> 
> Could you help me find the reason why this testcase passes even
> without the fix.
> 
> From the execution of Github Action, it seems that the problem always exist:
> https://github.com/chiyutianyi/git/actions/runs/2527421443.
> 
> Thanks.
> -Han Xin

Hard to say, I'm not sure either. One thing I noticed though is that in
your CI run there's failure in e.g. linux-gcc, but the test run for
linux-musl succeeds. Personally I'm using musl libc on my system, as
well, so maybe it's a discrepancy between musl- and glibc-based systems?

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1] commit-graph.c: no lazy fetch in lookup_commit_in_graph()
  2022-06-18  3:01     ` [PATCH v1] commit-graph.c: no lazy fetch in lookup_commit_in_graph() Han Xin
  2022-06-20  7:07       ` Patrick Steinhardt
@ 2022-06-21 18:23       ` Jonathan Tan
  2022-06-22  3:17         ` Han Xin
  2022-06-24  5:27       ` [PATCH v2 0/2] " Han Xin
  2 siblings, 1 reply; 27+ messages in thread
From: Jonathan Tan @ 2022-06-21 18:23 UTC (permalink / raw)
  To: Han Xin
  Cc: Jonathan Tan, chiyutianyi, derrickstolee, git, haiyangtand, me, ps

Han Xin <hanxin.hx@bytedance.com> writes:
> If a commit is in the commit graph, we would expect the commit to also
> be present. So we should use has_object() instead of
> repo_has_object_file(), which will help us avoid getting into an endless
> loop of lazy fetch.
> 
> We can see the endless loop issue via this[1].
> 
> 1. https://lore.kernel.org/git/20220612161707.21807-1-chiyutianyi@gmail.com/

As described in SubmittingPatches:

  Try to make sure your explanation can be understood   
  without external resources. Instead of giving a URL to a mailing list
  archive, summarize the relevant points of the discussion.

> +test_expect_success 'setup' '
> +	git init --bare dest.git &&
> +	test_commit one &&
> +	git checkout -b tmp &&
> +	test_commit two &&
> +	git push dest.git --all
> +'

You can commit directly to the repo by using "test_commit -C dest.git".
Also, can the repositories be better named? I see a "dest.git" (which
seems to contain all the objects), "alternates" (which seems to contain
everything except refs/heads/tmp), "source" (which only contains the
commit graph), and the current directory. It would probably be better to
name them e.g. "with-commit", "without-commit", "only-commit-graph", and
omit using the current directory altogether.

> +test_expect_success 'prepare a alternates repository without commit two' '
> +	git clone --bare dest.git alternates &&
> +	oid=$(git -C alternates rev-parse refs/heads/tmp) &&
> +	git -C alternates update-ref -d refs/heads/tmp &&
> +	git -C alternates gc --prune=now &&
> +	pack=$(echo alternates/objects/pack/*.pack) &&
> +	git verify-pack -v "$pack" >have &&
> +	! grep "$oid" have
> +'

OK, except refs/heads/tmp could probably have a better name.

> +test_expect_success 'prepare a repository with a commit-graph contains commit two' '
> +	git init source &&
> +	echo "$(pwd)/dest.git/objects" >source/.git/objects/info/alternates &&
> +	git -C source remote add origin "$(pwd)/dest.git" &&
> +	git -C source config remote.origin.promisor true &&
> +	git -C source config remote.origin.partialclonefilter blob:none &&
> +	# the source repository has the whole refs contains refs/heads/tmp
> +	git -C source fetch origin &&
> +	(
> +		cd source &&
> +		test_commit three &&
> +		git -c gc.writeCommitGraph=true gc
> +	)
> +'

Is the purpose of the fetch only to add a ref? If yes, it's clearer just
to create that branch instead of fetching.

> +test_expect_success 'change the alternates of source to that without commit two' '
> +	# now we have a commit-graph in the source repository but without the commit two
> +	echo "$(pwd)/alternates/objects" >source/.git/objects/info/alternates
> +'

OK.

> +test_expect_success 'fetch the missing commit' '
> +	git -C source fetch origin $oid 2>fetch.out &&
> +	grep "$oid" fetch.out
> +'

Is the bug triggered by fetching the missing commit or by fetching any
commit (which triggers the usage of the commit graph)? If any commit,
then it's clearer to create an arbitrary commit and then fetch it.

Also, I thought that the issue was an infinite loop, and the thing being
tested here looks different from that. If you want to ensure that
nothing is being fetched, you can use GIT_TRACE="$(pwd)/trace" to
observe a fetch-pack command being invoked or
GIT_TRACE_PACKET="$(pwd)/trace" to observe the packet being sent. If
you're worried about an infinite loop, you can set origin to a directory
that does not exist (so that the fetch immediately fails).

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Re: [PATCH v1] commit-graph.c: no lazy fetch in lookup_commit_in_graph()
  2022-06-21 18:23       ` Jonathan Tan
@ 2022-06-22  3:17         ` Han Xin
  0 siblings, 0 replies; 27+ messages in thread
From: Han Xin @ 2022-06-22  3:17 UTC (permalink / raw)
  To: Jonathan Tan
  Cc: chiyutianyi, derrickstolee, git, haiyangtand, Taylor Blau,
	Patrick Steinhardt

On Wed, Jun 22, 2022 at 2:23 AM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> Han Xin <hanxin.hx@bytedance.com> writes:
> > If a commit is in the commit graph, we would expect the commit to also
> > be present. So we should use has_object() instead of
> > repo_has_object_file(), which will help us avoid getting into an endless
> > loop of lazy fetch.
> >
> > We can see the endless loop issue via this[1].
> >
> > 1. https://lore.kernel.org/git/20220612161707.21807-1-chiyutianyi@gmail.com/
>
> As described in SubmittingPatches:
>
>   Try to make sure your explanation can be understood
>   without external resources. Instead of giving a URL to a mailing list
>   archive, summarize the relevant points of the discussion.
>

Nod.

> > +test_expect_success 'setup' '
> > +     git init --bare dest.git &&
> > +     test_commit one &&
> > +     git checkout -b tmp &&
> > +     test_commit two &&
> > +     git push dest.git --all
> > +'
>
> You can commit directly to the repo by using "test_commit -C dest.git".
> Also, can the repositories be better named? I see a "dest.git" (which
> seems to contain all the objects), "alternates" (which seems to contain
> everything except refs/heads/tmp), "source" (which only contains the
> commit graph), and the current directory. It would probably be better to
> name them e.g. "with-commit", "without-commit", "only-commit-graph", and
> omit using the current directory altogether.

Yes, it makes sense to me.

>
> > +test_expect_success 'prepare a alternates repository without commit two' '
> > +     git clone --bare dest.git alternates &&
> > +     oid=$(git -C alternates rev-parse refs/heads/tmp) &&
> > +     git -C alternates update-ref -d refs/heads/tmp &&
> > +     git -C alternates gc --prune=now &&
> > +     pack=$(echo alternates/objects/pack/*.pack) &&
> > +     git verify-pack -v "$pack" >have &&
> > +     ! grep "$oid" have
> > +'
>
> OK, except refs/heads/tmp could probably have a better name.

I'll rethink the naming here.

>
> > +test_expect_success 'prepare a repository with a commit-graph contains commit two' '
> > +     git init source &&
> > +     echo "$(pwd)/dest.git/objects" >source/.git/objects/info/alternates &&
> > +     git -C source remote add origin "$(pwd)/dest.git" &&
> > +     git -C source config remote.origin.promisor true &&
> > +     git -C source config remote.origin.partialclonefilter blob:none &&
> > +     # the source repository has the whole refs contains refs/heads/tmp
> > +     git -C source fetch origin &&
> > +     (
> > +             cd source &&
> > +             test_commit three &&
> > +             git -c gc.writeCommitGraph=true gc
> > +     )
> > +'
>
> Is the purpose of the fetch only to add a ref? If yes, it's clearer just
> to create that branch instead of fetching.

Nod.

>
> > +test_expect_success 'change the alternates of source to that without commit two' '
> > +     # now we have a commit-graph in the source repository but without the commit two
> > +     echo "$(pwd)/alternates/objects" >source/.git/objects/info/alternates
> > +'
>
> OK.
>
> > +test_expect_success 'fetch the missing commit' '
> > +     git -C source fetch origin $oid 2>fetch.out &&
> > +     grep "$oid" fetch.out
> > +'
>
> Is the bug triggered by fetching the missing commit or by fetching any
> commit (which triggers the usage of the commit graph)? If any commit,
> then it's clearer to create an arbitrary commit and then fetch it.
>

Yes, using the missing object in the commit-graph seems to be a little
misleading.

> Also, I thought that the issue was an infinite loop, and the thing being
> tested here looks different from that. If you want to ensure that
> nothing is being fetched, you can use GIT_TRACE="$(pwd)/trace" to
> observe a fetch-pack command being invoked or
> GIT_TRACE_PACKET="$(pwd)/trace" to observe the packet being sent. If
> you're worried about an infinite loop, you can set origin to a directory
> that does not exist (so that the fetch immediately fails).

Maybe we can use "ulimit" ?

Then the test case can be:

    test_expect_success 'fetch the missing commit once' '
        ulimit -u 512 &&
        GIT_TRACE="$(pwd)/trace" git -C source fetch origin $oid 2>err &&
        ! grep "error: cannot fork" err &&
        test $(grep "fetch origin" trace | wc -l) -eq 1
     '

Without this fix, "git fetch" would finally succeed because we didn't
check the return value of promise_remote_get_direct(). We can find
the err output like this:

    error: cannot fork() for -c: Resource temporarily unavailable
    fatal: promisor-remote: unable to fork off fetch subprocess

And we can see a lot of trace logs as follows:

    trace: run_command: git -c fetch.negotiationAlgorithm=noop fetch
origin --no-tags --no-write-fetch-head --recurse-submodules=no
--filter=blob:none --stdin
    trace: built-in: git fetch origin --no-tags --no-write-fetch-head
--recurse-submodules=no --filter=blob:none --stdin

Thanks.
-Han Xin

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v2 0/2] commit-graph.c: no lazy fetch in lookup_commit_in_graph()
  2022-06-18  3:01     ` [PATCH v1] commit-graph.c: no lazy fetch in lookup_commit_in_graph() Han Xin
  2022-06-20  7:07       ` Patrick Steinhardt
  2022-06-21 18:23       ` Jonathan Tan
@ 2022-06-24  5:27       ` Han Xin
  2022-06-24  5:27         ` [PATCH v2 1/2] test-lib.sh: add limited processes to test-lib Han Xin
                           ` (2 more replies)
  2 siblings, 3 replies; 27+ messages in thread
From: Han Xin @ 2022-06-24  5:27 UTC (permalink / raw)
  To: hanxin.hx
  Cc: chiyutianyi, derrickstolee, git, haiyangtand, jonathantanmy, me, ps

This patch fixes the following issue:
When we found the commit in the graph in lookup_commit_in_graph(), but
the commit is missing from the repository, we will try
promisor_remote_get_direct() and then enter another loop.

Then we will go into an endless loop:
  git fetch -> deref_without_lazy_fetch() ->
    lookup_commit_in_graph() -> repo_has_object_file() ->
      promisor_remote_get_direct() -> fetch_objects() ->
        git fetch (a new loop round)

Changes since v1:

* add run_with_limited_processses() to test-lib so that we can use it to
  limit forking subprocesses. As we didn't check the return value of
  promise_remote_get_direct(), "git fetch" would finally succeed due to:

    error: cannot fork() for -c: Resource temporarily unavailable
    fatal: promisor-remote: unable to fork off fetch subprocess

* Rename test repositories, reference name and use GIT_TRACE to observe
  the fetch process.

Han Xin (2):
  test-lib.sh: add limited processes to test-lib
  commit-graph.c: no lazy fetch in lookup_commit_in_graph()

 commit-graph.c                             |  2 +-
 t/t5329-no-lazy-fetch-with-commit-graph.sh | 47 ++++++++++++++++++++++
 t/test-lib.sh                              |  9 +++++
 3 files changed, 57 insertions(+), 1 deletion(-)
 create mode 100755 t/t5329-no-lazy-fetch-with-commit-graph.sh

Range-diff against v1:
1:  ebc14bfd5e < -:  ---------- commit-graph.c: no lazy fetch in lookup_commit_in_graph()
-:  ---------- > 1:  442a4c351d test-lib.sh: add limited processes to test-lib
-:  ---------- > 2:  d3a99a5c5a commit-graph.c: no lazy fetch in lookup_commit_in_graph()
-- 
2.36.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v2 1/2] test-lib.sh: add limited processes to test-lib
  2022-06-24  5:27       ` [PATCH v2 0/2] " Han Xin
@ 2022-06-24  5:27         ` Han Xin
  2022-06-24 16:03           ` Junio C Hamano
  2022-06-24  5:27         ` [PATCH v2 2/2] commit-graph.c: no lazy fetch in lookup_commit_in_graph() Han Xin
  2022-06-28  2:02         ` [PATCH v3 0/2] " Han Xin
  2 siblings, 1 reply; 27+ messages in thread
From: Han Xin @ 2022-06-24  5:27 UTC (permalink / raw)
  To: hanxin.hx
  Cc: chiyutianyi, derrickstolee, git, haiyangtand, jonathantanmy, me, ps

We will use the lazy prerequisite ULIMIT_PROCESSES in a follow-up
commit.

With run_with_limited_processses() we can limit forking subprocesses and
fail reliably in some test cases.

Signed-off-by: Han Xin <hanxin.hx@bytedance.com>
---
 t/test-lib.sh | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/t/test-lib.sh b/t/test-lib.sh
index 8ba5ca1534..f920e3b0ae 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1816,6 +1816,15 @@ test_lazy_prereq ULIMIT_FILE_DESCRIPTORS '
 	run_with_limited_open_files true
 '
 
+run_with_limited_processses () {
+	(ulimit -u 512 && "$@")
+}
+
+test_lazy_prereq ULIMIT_PROCESSES '
+	test_have_prereq !HPPA,!MINGW,!CYGWIN &&
+	run_with_limited_processses true
+'
+
 build_option () {
 	git version --build-options |
 	sed -ne "s/^$1: //p"
-- 
2.36.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v2 2/2] commit-graph.c: no lazy fetch in lookup_commit_in_graph()
  2022-06-24  5:27       ` [PATCH v2 0/2] " Han Xin
  2022-06-24  5:27         ` [PATCH v2 1/2] test-lib.sh: add limited processes to test-lib Han Xin
@ 2022-06-24  5:27         ` Han Xin
  2022-06-24 16:56           ` Junio C Hamano
  2022-06-28  2:02         ` [PATCH v3 0/2] " Han Xin
  2 siblings, 1 reply; 27+ messages in thread
From: Han Xin @ 2022-06-24  5:27 UTC (permalink / raw)
  To: hanxin.hx
  Cc: chiyutianyi, derrickstolee, git, haiyangtand, jonathantanmy, me, ps

If a commit is in the commit graph, we would expect the commit to also
be present. So we should use has_object() instead of
repo_has_object_file(), which will help us avoid getting into an endless
loop of lazy fetch.

When we found the commit in the graph in lookup_commit_in_graph(), but
the commit is missing from the repository, we will try
promisor_remote_get_direct() and then enter another loop. While
sometimes it will finally succeed because it cannot fork subprocess,
it has exhausted the local process resources and can be harmful to the
remote service.

Signed-off-by: Han Xin <hanxin.hx@bytedance.com>
---
 commit-graph.c                             |  2 +-
 t/t5329-no-lazy-fetch-with-commit-graph.sh | 47 ++++++++++++++++++++++
 2 files changed, 48 insertions(+), 1 deletion(-)
 create mode 100755 t/t5329-no-lazy-fetch-with-commit-graph.sh

diff --git a/commit-graph.c b/commit-graph.c
index 2b52818731..2dd9bcc7ea 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -907,7 +907,7 @@ struct commit *lookup_commit_in_graph(struct repository *repo, const struct obje
 		return NULL;
 	if (!search_commit_pos_in_graph(id, repo->objects->commit_graph, &pos))
 		return NULL;
-	if (!repo_has_object_file(repo, id))
+	if (!has_object(repo, id, 0))
 		return NULL;
 
 	commit = lookup_commit(repo, id);
diff --git a/t/t5329-no-lazy-fetch-with-commit-graph.sh b/t/t5329-no-lazy-fetch-with-commit-graph.sh
new file mode 100755
index 0000000000..4d25d2c950
--- /dev/null
+++ b/t/t5329-no-lazy-fetch-with-commit-graph.sh
@@ -0,0 +1,47 @@
+#!/bin/sh
+
+test_description='test for no lazy fetch with the commit-graph'
+
+. ./test-lib.sh
+
+test_expect_success 'setup: prepare a repository with a commit' '
+	git init with-commit &&
+	test_commit -C with-commit the-commit &&
+	oid=$(git -C with-commit rev-parse HEAD)
+'
+
+test_expect_success 'setup: prepare a repository with commit-graph contains the commit' '
+	git init with-commit-graph &&
+	echo "$(pwd)/with-commit/.git/objects" \
+		>with-commit-graph/.git/objects/info/alternates &&
+	# create a ref that points to the commit in alternates
+	git -C with-commit-graph update-ref refs/ref_to_the_commit "$oid" &&
+	# prepare some other objects to commit-graph
+	test_commit -C with-commit-graph somthing &&
+	git -c gc.writeCommitGraph=true -C with-commit-graph gc &&
+	test_path_is_file with-commit-graph/.git/objects/info/commit-graph
+'
+
+test_expect_success 'setup: change the alternates to what without the commit' '
+	git init --bare without-commit &&
+	echo "$(pwd)/without-commit/objects" \
+		>with-commit-graph/.git/objects/info/alternates &&
+	test_must_fail git -C with-commit-graph cat-file -e $oid
+'
+
+test_expect_success 'setup: prepare another commit to fetch' '
+	test_commit -C with-commit another-commit &&
+	anycommit=$(git -C with-commit rev-parse HEAD)
+'
+
+test_expect_success ULIMIT_PROCESSES 'fetch any commit from promisor with the usage of the commit graph' '
+	git -C with-commit-graph remote add origin "$(pwd)/with-commit" &&
+	git -C with-commit-graph config remote.origin.promisor true &&
+	git -C with-commit-graph config remote.origin.partialclonefilter blob:none &&
+	GIT_TRACE="$(pwd)/trace" run_with_limited_processses \
+		git -C with-commit-graph fetch origin $anycommit 2>err &&
+	test_i18ngrep ! "fatal: promisor-remote: unable to fork off fetch subprocess" err &&
+	test $(grep "fetch origin" trace | wc -l) -eq 1
+'
+
+test_done
-- 
2.36.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 1/2] test-lib.sh: add limited processes to test-lib
  2022-06-24  5:27         ` [PATCH v2 1/2] test-lib.sh: add limited processes to test-lib Han Xin
@ 2022-06-24 16:03           ` Junio C Hamano
  2022-06-25  1:35             ` Han Xin
  0 siblings, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2022-06-24 16:03 UTC (permalink / raw)
  To: Han Xin
  Cc: chiyutianyi, derrickstolee, git, haiyangtand, jonathantanmy, me, ps

Han Xin <hanxin.hx@bytedance.com> writes:

> We will use the lazy prerequisite ULIMIT_PROCESSES in a follow-up
> commit.
>
> With run_with_limited_processses() we can limit forking subprocesses and
> fail reliably in some test cases.
>
> Signed-off-by: Han Xin <hanxin.hx@bytedance.com>
> ---
>  t/test-lib.sh | 9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/t/test-lib.sh b/t/test-lib.sh
> index 8ba5ca1534..f920e3b0ae 100644
> --- a/t/test-lib.sh
> +++ b/t/test-lib.sh
> @@ -1816,6 +1816,15 @@ test_lazy_prereq ULIMIT_FILE_DESCRIPTORS '
>  	run_with_limited_open_files true
>  '
>  
> +run_with_limited_processses () {
> +	(ulimit -u 512 && "$@")

The "-u" presumably is a way to say that the current user can have
only 512 processes at once that is supported by bash and ksh?  dash
seems to use "-p" for this but "-p" of course means something
completely different to other shells (and is read-only), which is a
mess X-<.

I suspect that it is OK to make it practically bash-only, but then ...

> +}
> +
> +test_lazy_prereq ULIMIT_PROCESSES '
> +	test_have_prereq !HPPA,!MINGW,!CYGWIN &&
> +	run_with_limited_processses true

... as this lazy-prereq makes a trial run that would fail when the
system does not allow "ulimit -u 512", do we need the platform
specific prereq check?  I am wondering if the second line alone is
sufficient.

Also, 512 is not a number I would exactly call "limit forking".
Does it have to be so high, I wonder.  Of course it cannot be so low
like 3 or 8 or even 32, as per-user limitation counts your window
manager and shells running in other windows.

What you ideally want is an option that lets you limit the number of
processes the shell that issued the ulimit call can spawn
simultaneously, but I didn't find it in "man bash/dash/ksh".

> +'
> +
>  build_option () {
>  	git version --build-options |
>  	sed -ne "s/^$1: //p"

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 2/2] commit-graph.c: no lazy fetch in lookup_commit_in_graph()
  2022-06-24  5:27         ` [PATCH v2 2/2] commit-graph.c: no lazy fetch in lookup_commit_in_graph() Han Xin
@ 2022-06-24 16:56           ` Junio C Hamano
  2022-06-25  2:25             ` Han Xin
  0 siblings, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2022-06-24 16:56 UTC (permalink / raw)
  To: Han Xin
  Cc: chiyutianyi, derrickstolee, git, haiyangtand, jonathantanmy, me, ps

Han Xin <hanxin.hx@bytedance.com> writes:

> If a commit is in the commit graph, we would expect the commit to also
> be present.

Hmph, is that a fundamental requirement, or is that a limitation of
the current implementation?  Naïvely, I do not quite see why we
cannot first partially clone from a remote, access objects that
locally do not exist and lazily fetch them from the promissor, and
then build a reachability graph.  I expect that the resulting commit
graph records the lazily fetched objects at that point.  And then we
should be able to "lose" the lazily fetched objects that we know we
can fetch from the promissor again when we need them in the future.
And we would be in a situation where commits are pruned away, not
locally available in our object store, but can be (re)fetched from
the promisor, no?

> So we should use has_object() instead of
> repo_has_object_file(), which will help us avoid getting into an endless
> loop of lazy fetch.

It all depends on the reason we call lookup_commit_in_graph(), I
think.  Is there an easy way to remember the fact that we are
checking if object X is here with repo_has_object_file(X), so that
an on-demand fetch that happens when X does not locally exist would
not bother checking with lookup_commit_in_graph()?  IOW, temporarily
disable the use of commit-graph when we are lazily fetching?

> When we found the commit in the graph in lookup_commit_in_graph(),
> but the commit is missing from the repository, we will try
> promisor_remote_get_direct() and then enter another loop.  While
> sometimes it will finally succeed because it cannot fork
> subprocess,

Is that a mode of "succeed"-ing?  Or merely a way to exit an endless
loop that does not make any progress with a failure?

> it has exhausted the local process resources and can be harmful to the
> remote service.
>
> Signed-off-by: Han Xin <hanxin.hx@bytedance.com>
> ---

I think the single-liner change in the patch is a good one, but I am
having a hard time to agree with the reasoning above that explains
why it is a good change.

Here is an attempt to express a reasoning I can understand, can
agree with, and (I think) better describes why the change is a good
one.  Does my understanding of the problem and the solution totally
misses the mark?

	The commit-graph is used to opportunistically optimize
	accesses to certain pieces of information on commit objects,
	and lookup_commit_in_graph() tries to say "no" when the
	requested commit does not locally exist by returning NULL,
	in which case the caller can ask for (which may result in
	on-demand fetching from a promisor remote) and parse the
	commit object itself.

	However, it uses a wrong helper, repo_has_object_file(), to
	do so.  This helper not only checks if an object is
	immediately available in the local object store, but also
	tries to fetch from a promisor remote.  But the fetch
	machinery calls lookup_commit_in_graph(), thus causing an
	infinite loop.

	We should make lookup_commit_in_graph() expect that a commit
	given to it can be legitimately missing from the local
	object store, by using the has_object_file() helper instead.
	
> diff --git a/t/t5329-no-lazy-fetch-with-commit-graph.sh b/t/t5329-no-lazy-fetch-with-commit-graph.sh
> new file mode 100755
> index 0000000000..4d25d2c950
> --- /dev/null
> +++ b/t/t5329-no-lazy-fetch-with-commit-graph.sh

Hmph, does this short-test need a completely new file?

> @@ -0,0 +1,47 @@
> +#!/bin/sh
> +
> +test_description='test for no lazy fetch with the commit-graph'
> +
> +. ./test-lib.sh
> +
> +test_expect_success 'setup: prepare a repository with a commit' '
> +	git init with-commit &&
> +	test_commit -C with-commit the-commit &&
> +	oid=$(git -C with-commit rev-parse HEAD)
> +'
> +
> +test_expect_success 'setup: prepare a repository with commit-graph contains the commit' '
> +	git init with-commit-graph &&
> +	echo "$(pwd)/with-commit/.git/objects" \
> +		>with-commit-graph/.git/objects/info/alternates &&
> +	# create a ref that points to the commit in alternates
> +	git -C with-commit-graph update-ref refs/ref_to_the_commit "$oid" &&
> +	# prepare some other objects to commit-graph
> +	test_commit -C with-commit-graph somthing &&

somthing? something?

> +	git -c gc.writeCommitGraph=true -C with-commit-graph gc &&
> +	test_path_is_file with-commit-graph/.git/objects/info/commit-graph
> +'
> +
> +test_expect_success 'setup: change the alternates to what without the commit' '
> +	git init --bare without-commit &&
> +	echo "$(pwd)/without-commit/objects" \
> +		>with-commit-graph/.git/objects/info/alternates &&

Doesn't this deliberately _corrupt_ the with-commit-graph repository
that depended on the object whose name is $oid in with-commit
repository?  Do we require a corrupt repository to trigger the "bug"?

> +	test_must_fail git -C with-commit-graph cat-file -e $oid
> +'
> +
> +test_expect_success 'setup: prepare another commit to fetch' '
> +	test_commit -C with-commit another-commit &&
> +	anycommit=$(git -C with-commit rev-parse HEAD)

anycommit?  another_commit?  Be consistent in naming.

> +'
> +
> +test_expect_success ULIMIT_PROCESSES 'fetch any commit from promisor with the usage of the commit graph' '

So we did all of the above set-up sequences only to skip the most
interesting test, if we were testing with "dash"?  I suspect that it
may be cleaner to put the prerequisite to the whole file with the
"early test_done" trick like t0051 and t3008.

> +	git -C with-commit-graph remote add origin "$(pwd)/with-commit" &&
> +	git -C with-commit-graph config remote.origin.promisor true &&
> +	git -C with-commit-graph config remote.origin.partialclonefilter blob:none &&
> +	GIT_TRACE="$(pwd)/trace" run_with_limited_processses \
> +		git -C with-commit-graph fetch origin $anycommit 2>err &&
> +	test_i18ngrep ! "fatal: promisor-remote: unable to fork off fetch subprocess" err &&
> +	test $(grep "fetch origin" trace | wc -l) -eq 1
> +'
> +
> +test_done

Thanks.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Re: [PATCH v2 1/2] test-lib.sh: add limited processes to test-lib
  2022-06-24 16:03           ` Junio C Hamano
@ 2022-06-25  1:35             ` Han Xin
  2022-06-27 12:22               ` Junio C Hamano
  0 siblings, 1 reply; 27+ messages in thread
From: Han Xin @ 2022-06-25  1:35 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: chiyutianyi, derrickstolee, git, haiyangtand, Jonathan Tan,
	Taylor Blau, Patrick Steinhardt

On Sat, Jun 25, 2022 at 12:03 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Han Xin <hanxin.hx@bytedance.com> writes:
>
> > We will use the lazy prerequisite ULIMIT_PROCESSES in a follow-up
> > commit.
> >
> > With run_with_limited_processses() we can limit forking subprocesses and
> > fail reliably in some test cases.
> >
> > Signed-off-by: Han Xin <hanxin.hx@bytedance.com>
> > ---
> >  t/test-lib.sh | 9 +++++++++
> >  1 file changed, 9 insertions(+)
> >
> > diff --git a/t/test-lib.sh b/t/test-lib.sh
> > index 8ba5ca1534..f920e3b0ae 100644
> > --- a/t/test-lib.sh
> > +++ b/t/test-lib.sh
> > @@ -1816,6 +1816,15 @@ test_lazy_prereq ULIMIT_FILE_DESCRIPTORS '
> >       run_with_limited_open_files true
> >  '
> >
> > +run_with_limited_processses () {
> > +     (ulimit -u 512 && "$@")
>
> The "-u" presumably is a way to say that the current user can have
> only 512 processes at once that is supported by bash and ksh?  dash
> seems to use "-p" for this but "-p" of course means something
> completely different to other shells (and is read-only), which is a
> mess X-<.
>
> I suspect that it is OK to make it practically bash-only, but then ...
>
> > +}
> > +
> > +test_lazy_prereq ULIMIT_PROCESSES '
> > +     test_have_prereq !HPPA,!MINGW,!CYGWIN &&
> > +     run_with_limited_processses true
>
> ... as this lazy-prereq makes a trial run that would fail when the
> system does not allow "ulimit -u 512", do we need the platform
> specific prereq check?  I am wondering if the second line alone is
> sufficient.
>

Yes,the second line alone is sufficient.

> Also, 512 is not a number I would exactly call "limit forking".
> Does it have to be so high, I wonder.  Of course it cannot be so low
> like 3 or 8 or even 32, as per-user limitation counts your window
> manager and shells running in other windows.
>

It's hard to say.
I've tried adjusting it to 256, but the test cases in next patch will always
fail with the following "err":

    ./test-lib.sh: fork: Resource temporarily unavailable

> What you ideally want is an option that lets you limit the number of
> processes the shell that issued the ulimit call can spawn
> simultaneously, but I didn't find it in "man bash/dash/ksh".
>

Maybe I should use "lib-bash.sh" instead of "test-lib.sh" just like t9902
and t9903?
The different meanings of "-p" in bash and dash really make this tricky.

Thanks.
-Han Xin

> > +'
> > +
> >  build_option () {
> >       git version --build-options |
> >       sed -ne "s/^$1: //p"

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Re: [PATCH v2 2/2] commit-graph.c: no lazy fetch in lookup_commit_in_graph()
  2022-06-24 16:56           ` Junio C Hamano
@ 2022-06-25  2:25             ` Han Xin
  2022-06-25  2:31               ` Han Xin
  0 siblings, 1 reply; 27+ messages in thread
From: Han Xin @ 2022-06-25  2:25 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: chiyutianyi, derrickstolee, git, haiyangtand, Jonathan Tan,
	Taylor Blau, Patrick Steinhardt

On Sat, Jun 25, 2022 at 12:56 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Han Xin <hanxin.hx@bytedance.com> writes:
>
> > If a commit is in the commit graph, we would expect the commit to also
> > be present.
>
> > When we found the commit in the graph in lookup_commit_in_graph(),
> > but the commit is missing from the repository, we will try
> > promisor_remote_get_direct() and then enter another loop.  While
> > sometimes it will finally succeed because it cannot fork
> > subprocess,
>
> Is that a mode of "succeed"-ing?  Or merely a way to exit an endless
> loop that does not make any progress with a failure?

For the user, "fetch-pack" does succeed, because in
deref_without_lazy_fetch(), even if lookup_commit_in_graph() fails to
lazy fetch the lost commit, the following oid_object_info_extended()
will help us complete the previous work.

In a sense, this infinite loop is based on the fact that infinite processes
can be created.

However, your attempt to express the reasoning bellow is clearer.

>
> > it has exhausted the local process resources and can be harmful to the
> > remote service.
> >
> > Signed-off-by: Han Xin <hanxin.hx@bytedance.com>
> > ---
>
> I think the single-liner change in the patch is a good one, but I am
> having a hard time to agree with the reasoning above that explains
> why it is a good change.
>
> Here is an attempt to express a reasoning I can understand, can
> agree with, and (I think) better describes why the change is a good
> one.  Does my understanding of the problem and the solution totally
> misses the mark?
>
>         The commit-graph is used to opportunistically optimize
>         accesses to certain pieces of information on commit objects,
>         and lookup_commit_in_graph() tries to say "no" when the
>         requested commit does not locally exist by returning NULL,
>         in which case the caller can ask for (which may result in
>         on-demand fetching from a promisor remote) and parse the
>         commit object itself.
>
>         However, it uses a wrong helper, repo_has_object_file(), to
>         do so.  This helper not only checks if an object is
>         immediately available in the local object store, but also
>         tries to fetch from a promisor remote.  But the fetch
>         machinery calls lookup_commit_in_graph(), thus causing an
>         infinite loop.
>
>         We should make lookup_commit_in_graph() expect that a commit
>         given to it can be legitimately missing from the local
>         object store, by using the has_object_file() helper instead.
>
> > diff --git a/t/t5329-no-lazy-fetch-with-commit-graph.sh b/t/t5329-no-lazy-fetch-with-commit-graph.sh
> > new file mode 100755
> > index 0000000000..4d25d2c950
> > --- /dev/null
> > +++ b/t/t5329-no-lazy-fetch-with-commit-graph.sh
>
> Hmph, does this short-test need a completely new file?
>
> > @@ -0,0 +1,47 @@
> > +#!/bin/sh
> > +
> > +test_description='test for no lazy fetch with the commit-graph'
> > +
> > +. ./test-lib.sh
> > +
> > +test_expect_success 'setup: prepare a repository with a commit' '
> > +     git init with-commit &&
> > +     test_commit -C with-commit the-commit &&
> > +     oid=$(git -C with-commit rev-parse HEAD)
> > +'
> > +
> > +test_expect_success 'setup: prepare a repository with commit-graph contains the commit' '
> > +     git init with-commit-graph &&
> > +     echo "$(pwd)/with-commit/.git/objects" \
> > +             >with-commit-graph/.git/objects/info/alternates &&
> > +     # create a ref that points to the commit in alternates
> > +     git -C with-commit-graph update-ref refs/ref_to_the_commit "$oid" &&
> > +     # prepare some other objects to commit-graph
> > +     test_commit -C with-commit-graph somthing &&
>
> somthing? something?

Nod.

>
> > +     git -c gc.writeCommitGraph=true -C with-commit-graph gc &&
> > +     test_path_is_file with-commit-graph/.git/objects/info/commit-graph
> > +'
> > +
> > +test_expect_success 'setup: change the alternates to what without the commit' '
> > +     git init --bare without-commit &&
> > +     echo "$(pwd)/without-commit/objects" \
> > +             >with-commit-graph/.git/objects/info/alternates &&
>
> Doesn't this deliberately _corrupt_ the with-commit-graph repository
> that depended on the object whose name is $oid in with-commit
> repository?  Do we require a corrupt repository to trigger the "bug"?
>

The "bug" depends on the commit exist in the commit-graph but
missing in the repository.

I didn't find a better way to make this kind of scene.

This bug was first found when alternates and commit-graph were
both used. Since the promise did not maintain all the references,
I suspect that the "auto gc" during the update process of the promise
caused the loss of the unreachable commits in the promise.

> > +     test_must_fail git -C with-commit-graph cat-file -e $oid
> > +'
> > +
> > +test_expect_success 'setup: prepare another commit to fetch' '
> > +     test_commit -C with-commit another-commit &&
> > +     anycommit=$(git -C with-commit rev-parse HEAD)
>
> anycommit?  another_commit?  Be consistent in naming.
>

Nod.

> > +'
> > +
> > +test_expect_success ULIMIT_PROCESSES 'fetch any commit from promisor with the usage of the commit graph' '
>
> So we did all of the above set-up sequences only to skip the most
> interesting test, if we were testing with "dash"?  I suspect that it
> may be cleaner to put the prerequisite to the whole file with the
> "early test_done" trick like t0051 and t3008.
>

It make sense to me.

Thanks.
-Han Xin

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Re: [PATCH v2 2/2] commit-graph.c: no lazy fetch in lookup_commit_in_graph()
  2022-06-25  2:25             ` Han Xin
@ 2022-06-25  2:31               ` Han Xin
  0 siblings, 0 replies; 27+ messages in thread
From: Han Xin @ 2022-06-25  2:31 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: chiyutianyi, derrickstolee, git, haiyangtand, Jonathan Tan,
	Taylor Blau, Patrick Steinhardt

On Sat, Jun 25, 2022 at 10:25 AM Han Xin <hanxin.hx@bytedance.com> wrote:
>
> The "bug" depends on the commit exist in the commit-graph but
> missing in the repository.
>
> I didn't find a better way to make this kind of scene.
>
> This bug was first found when alternates and commit-graph were
> both used. Since the promise did not maintain all the references,
> I suspect that the "auto gc" during the update process of the promise
> caused the loss of the unreachable commits in the promise.
>

Some mistakes here:
This bug was first found when alternates and commit-graph were
both used. Since the promise did not maintain all the references,
I suspect that the "auto gc" during the update of the alternates
caused the loss of the unreachable commits.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 1/2] test-lib.sh: add limited processes to test-lib
  2022-06-25  1:35             ` Han Xin
@ 2022-06-27 12:22               ` Junio C Hamano
  0 siblings, 0 replies; 27+ messages in thread
From: Junio C Hamano @ 2022-06-27 12:22 UTC (permalink / raw)
  To: Han Xin
  Cc: chiyutianyi, derrickstolee, git, haiyangtand, Jonathan Tan,
	Taylor Blau, Patrick Steinhardt

Han Xin <hanxin.hx@bytedance.com> writes:

> Maybe I should use "lib-bash.sh" instead of "test-lib.sh" just like t9902
> and t9903?
> The different meanings of "-p" in bash and dash really make this tricky.

I do not think lib-bash.sh is appropriate for this.

t9902/9903 are about command line prompt and completion support
_FOR_ bash users.  By including lib-bash.sh, even if your "usual"
shell is not bash, you can run these two scripts under bash, as long
as you have it installed.

For the purpose of testing these bash specific features, that
framework makes quite a lot of sense.  Those who are happy to have
dash on their system without having to install bash would have no
reason to see these two tests to pass, as they do not care about
bash at all.

What the test under discussion is doing is quite different.  Instead
of forcing to re-spawn bash when the user's shell is not bash, you'd
want to adjust how you invoke "ulimit" if it is not bash, something
like

run_with_limited_processes () {
	(ulimit ${ulimit_max_process-"-p"} 512 && "$@")
}

test_lazy_prereq ULIMIT_PROCESSES '
	# bash and ksh use "ulimit -u", dash uses "ulimit -p"
	if test -n "$BASH_VERSION"
	then
		ulimit_max_process="-u"
	elif test -n "$KSH_VERSION"
	then
		ulimit_max_process="-u"
	fi
        run_with_limited_processes true
'

perhaps?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v3 0/2] no lazy fetch in lookup_commit_in_graph()
  2022-06-24  5:27       ` [PATCH v2 0/2] " Han Xin
  2022-06-24  5:27         ` [PATCH v2 1/2] test-lib.sh: add limited processes to test-lib Han Xin
  2022-06-24  5:27         ` [PATCH v2 2/2] commit-graph.c: no lazy fetch in lookup_commit_in_graph() Han Xin
@ 2022-06-28  2:02         ` Han Xin
  2022-06-28  2:02           ` [PATCH v3 1/2] test-lib.sh: add limited processes to test-lib Han Xin
  2022-06-28  2:02           ` [PATCH v3 2/2] commit-graph.c: no lazy fetch in lookup_commit_in_graph() Han Xin
  2 siblings, 2 replies; 27+ messages in thread
From: Han Xin @ 2022-06-28  2:02 UTC (permalink / raw)
  To: hanxin.hx
  Cc: chiyutianyi, derrickstolee, git, haiyangtand, jonathantanmy, me,
	Junio C Hamano, ps

This patch fixes the following issue:
When we found the commit in the graph in lookup_commit_in_graph(), but
the commit is missing from the repository, we will try
promisor_remote_get_direct() and then enter another loop.

Then we will go into an endless loop:
  git fetch -> deref_without_lazy_fetch() ->
    lookup_commit_in_graph() -> repo_has_object_file() ->
      promisor_remote_get_direct() -> fetch_objects() ->
        git fetch (a new loop round)

Changes since v2:

* Remove test_have_prereq() from ULIMIT_PROCESSES as
  "run_with_limited_processses true" is enough.

* Teach run_with_limited_processses() to support dash and zsh.

* Skip the whole test file if ulimit is not avaliable.

* Minor grammar/comment etc. fixes throughout.

Han Xin (2):
  test-lib.sh: add limited processes to test-lib
  commit-graph.c: no lazy fetch in lookup_commit_in_graph()

 commit-graph.c                             |  2 +-
 t/t5329-no-lazy-fetch-with-commit-graph.sh | 53 ++++++++++++++++++++++
 t/test-lib.sh                              | 16 +++++++
 3 files changed, 70 insertions(+), 1 deletion(-)
 create mode 100755 t/t5329-no-lazy-fetch-with-commit-graph.sh

Range-diff against v2:
1:  442a4c351d ! 1:  ad0a539759 test-lib.sh: add limited processes to test-lib
    @@ t/test-lib.sh: test_lazy_prereq ULIMIT_FILE_DESCRIPTORS '
      '
      
     +run_with_limited_processses () {
    -+	(ulimit -u 512 && "$@")
    ++	# bash and ksh use "ulimit -u", dash uses "ulimit -p"
    ++	if test -n "$BASH_VERSION"
    ++	then
    ++		ulimit_max_process="-u"
    ++	elif test -n "$KSH_VERSION"
    ++	then
    ++		ulimit_max_process="-u"
    ++	fi
    ++	(ulimit ${ulimit_max_process-"-p"} 512 && "$@")
     +}
     +
     +test_lazy_prereq ULIMIT_PROCESSES '
    -+	test_have_prereq !HPPA,!MINGW,!CYGWIN &&
     +	run_with_limited_processses true
     +'
     +
2:  a7d456db9b ! 2:  3cdb1abd43 commit-graph.c: no lazy fetch in lookup_commit_in_graph()
    @@ Metadata
      ## Commit message ##
         commit-graph.c: no lazy fetch in lookup_commit_in_graph()
     
    -    If a commit is in the commit graph, we would expect the commit to also
    -    be present. So we should use has_object() instead of
    -    repo_has_object_file(), which will help us avoid getting into an endless
    -    loop of lazy fetch.
    +    The commit-graph is used to opportunistically optimize accesses to
    +    certain pieces of information on commit objects, and
    +    lookup_commit_in_graph() tries to say "no" when the requested commit
    +    does not locally exist by returning NULL, in which case the caller
    +    can ask for (which may result in on-demand fetching from a promisor
    +    remote) and parse the commit object itself.
     
    -    When we found the commit in the graph in lookup_commit_in_graph(), but
    -    the commit is missing from the repository, we will try
    -    promisor_remote_get_direct() and then enter another loop. While
    -    sometimes it will finally succeed because it cannot fork subprocess,
    -    it has exhausted the local process resources and can be harmful to the
    -    remote service.
    +    However, it uses a wrong helper, repo_has_object_file(), to do so.
    +    This helper not only checks if an object is mmediately available in
    +    the local object store, but also tries to fetch from a promisor remote.
    +    But the fetch machinery calls lookup_commit_in_graph(), thus causing an
    +    infinite loop.
    +
    +    We should make lookup_commit_in_graph() expect that a commit given to it
    +    can be legitimately missing from the local object store, by using the
    +    has_object_file() helper instead.
     
         Signed-off-by: Han Xin <hanxin.hx@bytedance.com>
     
    @@ t/t5329-no-lazy-fetch-with-commit-graph.sh (new)
     +
     +. ./test-lib.sh
     +
    ++if ! test_have_prereq ULIMIT_PROCESSES
    ++then
    ++	skip_all='skipping tests for no lazy fetch with the commit-graph, ulimit processes not available'
    ++	test_done
    ++fi
    ++
     +test_expect_success 'setup: prepare a repository with a commit' '
     +	git init with-commit &&
     +	test_commit -C with-commit the-commit &&
    @@ t/t5329-no-lazy-fetch-with-commit-graph.sh (new)
     +	# create a ref that points to the commit in alternates
     +	git -C with-commit-graph update-ref refs/ref_to_the_commit "$oid" &&
     +	# prepare some other objects to commit-graph
    -+	test_commit -C with-commit-graph somthing &&
    ++	test_commit -C with-commit-graph something &&
     +	git -c gc.writeCommitGraph=true -C with-commit-graph gc &&
     +	test_path_is_file with-commit-graph/.git/objects/info/commit-graph
     +'
    @@ t/t5329-no-lazy-fetch-with-commit-graph.sh (new)
     +	test_must_fail git -C with-commit-graph cat-file -e $oid
     +'
     +
    -+test_expect_success 'setup: prepare another commit to fetch' '
    -+	test_commit -C with-commit another-commit &&
    ++test_expect_success 'setup: prepare any commit to fetch' '
    ++	test_commit -C with-commit any-commit &&
     +	anycommit=$(git -C with-commit rev-parse HEAD)
     +'
     +
    -+test_expect_success ULIMIT_PROCESSES 'fetch any commit from promisor with the usage of the commit graph' '
    ++test_expect_success 'fetch any commit from promisor with the usage of the commit graph' '
     +	git -C with-commit-graph remote add origin "$(pwd)/with-commit" &&
     +	git -C with-commit-graph config remote.origin.promisor true &&
     +	git -C with-commit-graph config remote.origin.partialclonefilter blob:none &&
-- 
2.36.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v3 1/2] test-lib.sh: add limited processes to test-lib
  2022-06-28  2:02         ` [PATCH v3 0/2] " Han Xin
@ 2022-06-28  2:02           ` Han Xin
  2022-06-28  2:02           ` [PATCH v3 2/2] commit-graph.c: no lazy fetch in lookup_commit_in_graph() Han Xin
  1 sibling, 0 replies; 27+ messages in thread
From: Han Xin @ 2022-06-28  2:02 UTC (permalink / raw)
  To: hanxin.hx
  Cc: chiyutianyi, derrickstolee, git, haiyangtand, jonathantanmy, me,
	Junio C Hamano, ps

We will use the lazy prerequisite ULIMIT_PROCESSES in a follow-up
commit.

With run_with_limited_processses() we can limit forking subprocesses and
fail reliably in some test cases.

Signed-off-by: Han Xin <hanxin.hx@bytedance.com>
---
 t/test-lib.sh | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/t/test-lib.sh b/t/test-lib.sh
index 8ba5ca1534..655d6d543f 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1816,6 +1816,22 @@ test_lazy_prereq ULIMIT_FILE_DESCRIPTORS '
 	run_with_limited_open_files true
 '
 
+run_with_limited_processses () {
+	# bash and ksh use "ulimit -u", dash uses "ulimit -p"
+	if test -n "$BASH_VERSION"
+	then
+		ulimit_max_process="-u"
+	elif test -n "$KSH_VERSION"
+	then
+		ulimit_max_process="-u"
+	fi
+	(ulimit ${ulimit_max_process-"-p"} 512 && "$@")
+}
+
+test_lazy_prereq ULIMIT_PROCESSES '
+	run_with_limited_processses true
+'
+
 build_option () {
 	git version --build-options |
 	sed -ne "s/^$1: //p"
-- 
2.36.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v3 2/2] commit-graph.c: no lazy fetch in lookup_commit_in_graph()
  2022-06-28  2:02         ` [PATCH v3 0/2] " Han Xin
  2022-06-28  2:02           ` [PATCH v3 1/2] test-lib.sh: add limited processes to test-lib Han Xin
@ 2022-06-28  2:02           ` Han Xin
  2022-06-28  7:49             ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 27+ messages in thread
From: Han Xin @ 2022-06-28  2:02 UTC (permalink / raw)
  To: hanxin.hx
  Cc: chiyutianyi, derrickstolee, git, haiyangtand, jonathantanmy, me,
	Junio C Hamano, ps

The commit-graph is used to opportunistically optimize accesses to
certain pieces of information on commit objects, and
lookup_commit_in_graph() tries to say "no" when the requested commit
does not locally exist by returning NULL, in which case the caller
can ask for (which may result in on-demand fetching from a promisor
remote) and parse the commit object itself.

However, it uses a wrong helper, repo_has_object_file(), to do so.
This helper not only checks if an object is mmediately available in
the local object store, but also tries to fetch from a promisor remote.
But the fetch machinery calls lookup_commit_in_graph(), thus causing an
infinite loop.

We should make lookup_commit_in_graph() expect that a commit given to it
can be legitimately missing from the local object store, by using the
has_object_file() helper instead.

Signed-off-by: Han Xin <hanxin.hx@bytedance.com>
---
 commit-graph.c                             |  2 +-
 t/t5329-no-lazy-fetch-with-commit-graph.sh | 53 ++++++++++++++++++++++
 2 files changed, 54 insertions(+), 1 deletion(-)
 create mode 100755 t/t5329-no-lazy-fetch-with-commit-graph.sh

diff --git a/commit-graph.c b/commit-graph.c
index 2b52818731..2dd9bcc7ea 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -907,7 +907,7 @@ struct commit *lookup_commit_in_graph(struct repository *repo, const struct obje
 		return NULL;
 	if (!search_commit_pos_in_graph(id, repo->objects->commit_graph, &pos))
 		return NULL;
-	if (!repo_has_object_file(repo, id))
+	if (!has_object(repo, id, 0))
 		return NULL;
 
 	commit = lookup_commit(repo, id);
diff --git a/t/t5329-no-lazy-fetch-with-commit-graph.sh b/t/t5329-no-lazy-fetch-with-commit-graph.sh
new file mode 100755
index 0000000000..d7877a5758
--- /dev/null
+++ b/t/t5329-no-lazy-fetch-with-commit-graph.sh
@@ -0,0 +1,53 @@
+#!/bin/sh
+
+test_description='test for no lazy fetch with the commit-graph'
+
+. ./test-lib.sh
+
+if ! test_have_prereq ULIMIT_PROCESSES
+then
+	skip_all='skipping tests for no lazy fetch with the commit-graph, ulimit processes not available'
+	test_done
+fi
+
+test_expect_success 'setup: prepare a repository with a commit' '
+	git init with-commit &&
+	test_commit -C with-commit the-commit &&
+	oid=$(git -C with-commit rev-parse HEAD)
+'
+
+test_expect_success 'setup: prepare a repository with commit-graph contains the commit' '
+	git init with-commit-graph &&
+	echo "$(pwd)/with-commit/.git/objects" \
+		>with-commit-graph/.git/objects/info/alternates &&
+	# create a ref that points to the commit in alternates
+	git -C with-commit-graph update-ref refs/ref_to_the_commit "$oid" &&
+	# prepare some other objects to commit-graph
+	test_commit -C with-commit-graph something &&
+	git -c gc.writeCommitGraph=true -C with-commit-graph gc &&
+	test_path_is_file with-commit-graph/.git/objects/info/commit-graph
+'
+
+test_expect_success 'setup: change the alternates to what without the commit' '
+	git init --bare without-commit &&
+	echo "$(pwd)/without-commit/objects" \
+		>with-commit-graph/.git/objects/info/alternates &&
+	test_must_fail git -C with-commit-graph cat-file -e $oid
+'
+
+test_expect_success 'setup: prepare any commit to fetch' '
+	test_commit -C with-commit any-commit &&
+	anycommit=$(git -C with-commit rev-parse HEAD)
+'
+
+test_expect_success 'fetch any commit from promisor with the usage of the commit graph' '
+	git -C with-commit-graph remote add origin "$(pwd)/with-commit" &&
+	git -C with-commit-graph config remote.origin.promisor true &&
+	git -C with-commit-graph config remote.origin.partialclonefilter blob:none &&
+	run_with_limited_processses env GIT_TRACE="$(pwd)/trace" \
+		git -C with-commit-graph fetch origin $anycommit 2>err &&
+	test_i18ngrep ! "fatal: promisor-remote: unable to fork off fetch subprocess" err &&
+	test $(grep "fetch origin" trace | wc -l) -eq 1
+'
+
+test_done
-- 
2.36.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/2] commit-graph.c: no lazy fetch in lookup_commit_in_graph()
  2022-06-28  2:02           ` [PATCH v3 2/2] commit-graph.c: no lazy fetch in lookup_commit_in_graph() Han Xin
@ 2022-06-28  7:49             ` Ævar Arnfjörð Bjarmason
  2022-06-28 17:36               ` Junio C Hamano
  2022-06-29  2:08               ` Han Xin
  0 siblings, 2 replies; 27+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-06-28  7:49 UTC (permalink / raw)
  To: Han Xin
  Cc: chiyutianyi, derrickstolee, git, haiyangtand, jonathantanmy, me,
	Junio C Hamano, ps


On Tue, Jun 28 2022, Han Xin wrote:

> The commit-graph is used to opportunistically optimize accesses to
> certain pieces of information on commit objects, and
> lookup_commit_in_graph() tries to say "no" when the requested commit
> does not locally exist by returning NULL, in which case the caller
> can ask for (which may result in on-demand fetching from a promisor
> remote) and parse the commit object itself.
>
> However, it uses a wrong helper, repo_has_object_file(), to do so.
> This helper not only checks if an object is mmediately available in
> the local object store, but also tries to fetch from a promisor remote.
> But the fetch machinery calls lookup_commit_in_graph(), thus causing an
> infinite loop.
>
> We should make lookup_commit_in_graph() expect that a commit given to it
> can be legitimately missing from the local object store, by using the
> has_object_file() helper instead.
>
> Signed-off-by: Han Xin <hanxin.hx@bytedance.com>
> ---
>  commit-graph.c                             |  2 +-
>  t/t5329-no-lazy-fetch-with-commit-graph.sh | 53 ++++++++++++++++++++++
>  2 files changed, 54 insertions(+), 1 deletion(-)
>  create mode 100755 t/t5329-no-lazy-fetch-with-commit-graph.sh
>
> diff --git a/commit-graph.c b/commit-graph.c
> index 2b52818731..2dd9bcc7ea 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -907,7 +907,7 @@ struct commit *lookup_commit_in_graph(struct repository *repo, const struct obje
>  		return NULL;
>  	if (!search_commit_pos_in_graph(id, repo->objects->commit_graph, &pos))
>  		return NULL;
> -	if (!repo_has_object_file(repo, id))
> +	if (!has_object(repo, id, 0))
>  		return NULL;
>  
>  	commit = lookup_commit(repo, id);
> diff --git a/t/t5329-no-lazy-fetch-with-commit-graph.sh b/t/t5329-no-lazy-fetch-with-commit-graph.sh
> new file mode 100755
> index 0000000000..d7877a5758
> --- /dev/null
> +++ b/t/t5329-no-lazy-fetch-with-commit-graph.sh
> @@ -0,0 +1,53 @@
> +#!/bin/sh
> +
> +test_description='test for no lazy fetch with the commit-graph'
> +
> +. ./test-lib.sh
> +
> +if ! test_have_prereq ULIMIT_PROCESSES

I think the prereq in 1/2 would be better off squashed into this commit.

> +then
> +	skip_all='skipping tests for no lazy fetch with the commit-graph, ulimit processes not available'
> +	test_done
> +fi
> +
> +test_expect_success 'setup: prepare a repository with a commit' '
> +	git init with-commit &&
> +	test_commit -C with-commit the-commit &&
> +	oid=$(git -C with-commit rev-parse HEAD)
> +'
> +
> +test_expect_success 'setup: prepare a repository with commit-graph contains the commit' '
> +	git init with-commit-graph &&
> +	echo "$(pwd)/with-commit/.git/objects" \

nit: you can use $PWD instead of $(pwd).

> +		>with-commit-graph/.git/objects/info/alternates &&
> +	# create a ref that points to the commit in alternates
> +	git -C with-commit-graph update-ref refs/ref_to_the_commit "$oid" &&
> +	# prepare some other objects to commit-graph
> +	test_commit -C with-commit-graph something &&
> +	git -c gc.writeCommitGraph=true -C with-commit-graph gc &&
> +	test_path_is_file with-commit-graph/.git/objects/info/commit-graph
> +'
> +
> +test_expect_success 'setup: change the alternates to what without the commit' '
> +	git init --bare without-commit &&

Maybe run a successful cat-file here...
> +	echo "$(pwd)/without-commit/objects" \
> +		>with-commit-graph/.git/objects/info/alternates &&
> +	test_must_fail git -C with-commit-graph cat-file -e $oid
...to show that it fails after the "echo" above?
> +'
> +
> +test_expect_success 'setup: prepare any commit to fetch' '
> +	test_commit -C with-commit any-commit &&
> +	anycommit=$(git -C with-commit rev-parse HEAD)

I think this would be better just added before a \n\n in the next test.
> +'
> +
> +test_expect_success 'fetch any commit from promisor with the usage of the commit graph' '
> +	git -C with-commit-graph remote add origin "$(pwd)/with-commit" &&
> +	git -C with-commit-graph config remote.origin.promisor true &&
> +	git -C with-commit-graph config remote.origin.partialclonefilter blob:none &&
> +	run_with_limited_processses env GIT_TRACE="$(pwd)/trace" \
> +		git -C with-commit-graph fetch origin $anycommit 2>err &&


> +	test_i18ngrep ! "fatal: promisor-remote: unable to fork off fetch subprocess" err &&
> +	test $(grep "fetch origin" trace | wc -l) -eq 1


Use "grep", not "test_i18ngrep", and this should use "test_line_count".

But actually better yet: this whole thing looks like it could use
"test_subcommand" instead, couldn't it?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/2] commit-graph.c: no lazy fetch in lookup_commit_in_graph()
  2022-06-28  7:49             ` Ævar Arnfjörð Bjarmason
@ 2022-06-28 17:36               ` Junio C Hamano
  2022-06-29  2:08               ` Han Xin
  1 sibling, 0 replies; 27+ messages in thread
From: Junio C Hamano @ 2022-06-28 17:36 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Han Xin, chiyutianyi, derrickstolee, git, haiyangtand,
	jonathantanmy, me, ps

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

>> +test_description='test for no lazy fetch with the commit-graph'
>> +
>> +. ./test-lib.sh
>> +
>> +if ! test_have_prereq ULIMIT_PROCESSES
>
> I think the prereq in 1/2 would be better off squashed into this commit.

Good thinking.  It also may make sense to implement it in this file,
without touching test-lib.sh at all.

>> +test_expect_success 'setup: prepare a repository with commit-graph contains the commit' '
>> +	git init with-commit-graph &&
>> +	echo "$(pwd)/with-commit/.git/objects" \
>> +		>with-commit-graph/.git/objects/info/alternates &&
>
> nit: you can use $PWD instead of $(pwd).

We can, and it would not make any difference on non-Windows.  

But which one should we use to cater to Windows?  $(pwd) is a full
path in Windows notation "C:\Program Files\Git\..." while $PWD is
MSYS style "/C/Program Files/Git/..." or something like that, IIRC?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Re: [PATCH v3 2/2] commit-graph.c: no lazy fetch in lookup_commit_in_graph()
  2022-06-28  7:49             ` Ævar Arnfjörð Bjarmason
  2022-06-28 17:36               ` Junio C Hamano
@ 2022-06-29  2:08               ` Han Xin
  1 sibling, 0 replies; 27+ messages in thread
From: Han Xin @ 2022-06-29  2:08 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: chiyutianyi, derrickstolee, git, haiyangtand, Jonathan Tan,
	Taylor Blau, Junio C Hamano, Patrick Steinhardt

On Tue, Jun 28, 2022 at 3:53 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> > +     test_i18ngrep ! "fatal: promisor-remote: unable to fork off fetch subprocess" err &&
> > +     test $(grep "fetch origin" trace | wc -l) -eq 1
>
>
> Use "grep", not "test_i18ngrep", and this should use "test_line_count".
>
> But actually better yet: this whole thing looks like it could use
> "test_subcommand" instead, couldn't it?

When using test_subcommand() we should give all the args,
if we remove or add any args later, this test case will always
pass even without this fix. So, is this test case still strict?

    run_with_limited_processses env GIT_TRACE2_EVENT="$(PWD)/trace.txt" \
        git -C with-commit-graph fetch origin $anycommit &&
    test_subcommand ! git -c fetch.negotiationAlgorithm=noop \
        fetch origin --no-tags --no-write-fetch-head \
        --recurse-submodules=no --filter=blob:none \
        --stdin <trace.txt

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2022-06-29  2:10 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-14  7:25 An endless loop fetching issue with partial clone, alternates and commit graph Haiyng Tan
2022-06-15  2:18 ` Taylor Blau
2022-06-16  3:38   ` [RFC PATCH 0/2] " Han Xin
2022-06-16  3:38     ` [RFC PATCH 1/2] commit-graph.c: add "flags" to lookup_commit_in_graph() Han Xin
2022-06-16  3:38     ` [RFC PATCH 2/2] fetch-pack.c: pass "oi_flags" " Han Xin
2022-06-17 21:47     ` [RFC PATCH 0/2] Re: An endless loop fetching issue with partial clone, alternates and commit graph Jonathan Tan
2022-06-18  3:01     ` [PATCH v1] commit-graph.c: no lazy fetch in lookup_commit_in_graph() Han Xin
2022-06-20  7:07       ` Patrick Steinhardt
2022-06-20  8:53         ` [External] " 欣韩
2022-06-20  9:05           ` Patrick Steinhardt
2022-06-21 18:23       ` Jonathan Tan
2022-06-22  3:17         ` Han Xin
2022-06-24  5:27       ` [PATCH v2 0/2] " Han Xin
2022-06-24  5:27         ` [PATCH v2 1/2] test-lib.sh: add limited processes to test-lib Han Xin
2022-06-24 16:03           ` Junio C Hamano
2022-06-25  1:35             ` Han Xin
2022-06-27 12:22               ` Junio C Hamano
2022-06-24  5:27         ` [PATCH v2 2/2] commit-graph.c: no lazy fetch in lookup_commit_in_graph() Han Xin
2022-06-24 16:56           ` Junio C Hamano
2022-06-25  2:25             ` Han Xin
2022-06-25  2:31               ` Han Xin
2022-06-28  2:02         ` [PATCH v3 0/2] " Han Xin
2022-06-28  2:02           ` [PATCH v3 1/2] test-lib.sh: add limited processes to test-lib Han Xin
2022-06-28  2:02           ` [PATCH v3 2/2] commit-graph.c: no lazy fetch in lookup_commit_in_graph() Han Xin
2022-06-28  7:49             ` Ævar Arnfjörð Bjarmason
2022-06-28 17:36               ` Junio C Hamano
2022-06-29  2:08               ` Han Xin

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).