git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* An endless loop fetching issue with partial clone, alternates and commit graph
@ 2022-06-12 16:17 Han Xin
  0 siblings, 0 replies; 3+ messages in thread
From: Han Xin @ 2022-06-12 16:17 UTC (permalink / raw)
  To: git; +Cc: Han Xin, Patrick Steinhardt

We found an issue that could create an endless loop where alternates
objects are used improperly.

While do fetching in a partial cloned repository with a commit graph,
deref_without_lazy_fetch_extended() will call lookup_commit_in_graph()
to find the commit object. We can found the code in commit-graph.c:

     struct commit *lookup_commit_in_graph(struct repository *repo, const struct object_id *id)
     {
          ...
          if (!search_commit_pos_in_graph(id, repo->objects->commit_graph, &pos))
               return NULL;
          if (!repo_has_object_file(repo, id))
               return NULL;

If we found the object in the commit graph, but missing it in the repository,
we will go into an endless loop:
     git fetch -> deref_without_lazy_fetch_extended() -> 
          lookup_commit_in_graph() -> repo_has_object_file() -> 
               promisor_remote_get_direct() -> fetch_objects() ->
                    git fetch

I know that the reason for this issue is due to improper use of
alternates, we can ensure that objects will not be lost by maintaining
all the references. But shouldn't we do something about this unusual
usage, it will cause a fetch bombardment of the remote git service.

We can reproduce this issue with the following test case, it will
generate a lot of git processes, please be careful to stop it.
------------------------------------------------------
#!/bin/sh

test_description='test for an endless loop fetching'

GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME

. ./test-lib.sh

test_expect_success 'setup' '
    git init --bare dest.git &&
    test_commit one &&
    git checkout -b testbranch &&
    test_commit two &&
    git push dest.git --all
'

test_expect_success 'prepare a alternates repository without testbranch' '
    git clone -b $GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME dest.git alternates &&
    oid=$(git -C alternates rev-parse refs/remotes/origin/testbranch) &&
    git -C alternates update-ref -d refs/remotes/origin/testbranch &&
    git -C alternates gc --prune=now
'

test_expect_success 'prepare a repository with commit-graph' '
    git init source &&
    echo "$(pwd)/dest.git/objects" >source/.git/objects/info/alternates &&
    git -C source remote add origin "$(pwd)/dest.git" &&
    git -C source config remote.origin.promisor true &&
    git -C source config remote.origin.partialclonefilter blob:none &&
    git -C source fetch origin &&
    (
        cd source &&
        test_commit three &&
        git -c gc.writeCommitGraph=true gc
    )
'

test_expect_success 'change alternates' '
    echo "$(pwd)/alternates/.git/objects" >source/.git/objects/info/alternates &&
    # this will bring an endless loop fetching
    git -C source fetch origin $oid
'

test_done

------------------------------------------------------

Thanks
-Han Xin

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: An endless loop fetching issue with partial clone, alternates and commit graph
@ 2022-06-14  7:25 Haiyng Tan
  2022-06-15  2:18 ` Taylor Blau
  0 siblings, 1 reply; 3+ messages in thread
From: Haiyng Tan @ 2022-06-14  7:25 UTC (permalink / raw)
  To: git, chiyutianyi; +Cc: ps, jonathantanmy

On Mon, 13 Jun 2022 00:17:07 +0800, Han Xin wrote:
> We found an issue that could create an endless loop where alternates
> objects are used improperly.
>
> While do fetching in a partial cloned repository with a commit graph,
> deref_without_lazy_fetch_extended() will call lookup_commit_in_graph()
> to find the commit object. We can found the code in commit-graph.c:
>
>      struct commit *lookup_commit_in_graph(struct repository *repo, const struct object_id *id)
>      {
>           …
>           if (!search_commit_pos_in_graph(id, repo->objects->commit_graph, &pos))
>                return NULL;
>           if (!repo_has_object_file(repo, id))
>                return NULL;

> If we found the object in the commit graph, but missing it in the repository,
> we will go into an endless loop:
>      git fetch -> deref_without_lazy_fetch_extended() ->
>           lookup_commit_in_graph() -> repo_has_object_file() ->
>                promisor_remote_get_direct() -> fetch_objects() ->
>                     git fetch
>
> I know that the reason for this issue is due to improper use of
> alternates, we can ensure that objects will not be lost by maintaining
> all the references. But shouldn't we do something about this unusual
> usage, it will cause a fetch bombardment of the remote git service.
>
> We can reproduce this issue with the following test case, it will
> generate a lot of git processes, please be careful to stop it.
> ———————————————————————————
> #!/bin/sh
>
> test_description='test for an endless loop fetching’
>
> GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
> export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
>
> . ./test-lib.sh
>
> test_expect_success 'setup’ ‘
>     git init --bare dest.git &&
>     test_commit one &&
>    git checkout -b testbranch &&
>    test_commit two &&
>    git push dest.git --all
> '
>
> test_expect_success 'prepare a alternates repository without testbranch' '
>    git clone -b $GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME dest.git alternates &&
>    oid=$(git -C alternates rev-parse refs/remotes/origin/testbranch) &&
>    git -C alternates update-ref -d refs/remotes/origin/testbranch &&
>    git -C alternates gc --prune=now
> '
>
> test_expect_success 'prepare a repository with commit-graph' '
>    git init source &&
>    echo "$(pwd)/dest.git/objects" >source/.git/objects/info/alternates &&
>    git -C source remote add origin "$(pwd)/dest.git" &&
>    git -C source config remote.origin.promisor true &&
>    git -C source config remote.origin.partialclonefilter blob:none &&
>    git -C source fetch origin &&
>    (
>        cd source &&
>        test_commit three &&
>        git -c gc.writeCommitGraph=true gc
>    )
> '
>
> test_expect_success 'change alternates' '
>    echo "$(pwd)/alternates/.git/objects" >source/.git/objects/info/alternates &&
>    # this will bring an endless loop fetching
>    git -C source fetch origin $oid
> '
>
> test_done
>
> ------------------------------------------------------
>
> Thanks
> -Han Xin

I think it's caused by using lazy-fetch in deref_without_lazy_fetch_extended().
In lookup_commit_in_graph(), lazy-fetch is initiated by
repo_has_object_file() used.
has_object() should be used, it's no-lazy-fetch.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: An endless loop fetching issue with partial clone, alternates and commit graph
  2022-06-14  7:25 Haiyng Tan
@ 2022-06-15  2:18 ` Taylor Blau
  0 siblings, 0 replies; 3+ messages in thread
From: Taylor Blau @ 2022-06-15  2:18 UTC (permalink / raw)
  To: Haiyng Tan; +Cc: git, chiyutianyi, ps, jonathantanmy, Derrick Stolee

[+cc Stolee]

On Tue, Jun 14, 2022 at 03:25:13PM +0800, Haiyng Tan wrote:
> I think it's caused by using lazy-fetch in
> deref_without_lazy_fetch_extended().  In lookup_commit_in_graph(),
> lazy-fetch is initiated by repo_has_object_file() used.  has_object()
> should be used, it's no-lazy-fetch.

Hmm. Are there cases where lookup_commit_in_graph() is expected to
lazily fetch missing objects from promisor remotes? If so, then this
wouldn't quite work. If not, then this seems like an appropriate fix to
me.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-06-15  2:18 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-12 16:17 An endless loop fetching issue with partial clone, alternates and commit graph Han Xin
  -- strict thread matches above, loose matches on Subject: below --
2022-06-14  7:25 Haiyng Tan
2022-06-15  2:18 ` Taylor Blau

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).