git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Michael J Gruber <git@grubix.eu>
To: hanxin.hx@bytedance.com
Cc: chiyutianyi@gmail.com, derrickstolee@github.com,
	git@vger.kernel.org, haiyangtand@gmail.com,
	jonathantanmy@google.com, me@ttaylorr.com,
	Junio C Hamano <gitster@pobox.com>,
	ps@pks.im
Subject: Re: [PATCH v4 1/1] commit-graph.c: no lazy fetch in lookup_commit_in_graph()
Date: Sat, 09 Jul 2022 14:23:36 +0200	[thread overview]
Message-ID: <165736941632.704481.18414237954289110814.git@grubix.eu> (raw)
In-Reply-To: <96d4bb71505d87ed501c058bbd89bfc13d08b24a.1656593279.git.hanxin.hx@bytedance.com>

Han Xin venit, vidit, dixit 2022-07-01 03:34:30:
> The commit-graph is used to opportunistically optimize accesses to
> certain pieces of information on commit objects, and
> lookup_commit_in_graph() tries to say "no" when the requested commit
> does not locally exist by returning NULL, in which case the caller
> can ask for (which may result in on-demand fetching from a promisor
> remote) and parse the commit object itself.
> 
> However, it uses a wrong helper, repo_has_object_file(), to do so.
> This helper not only checks if an object is mmediately available in
> the local object store, but also tries to fetch from a promisor remote.
> But the fetch machinery calls lookup_commit_in_graph(), thus causing an
> infinite loop.
> 
> We should make lookup_commit_in_graph() expect that a commit given to it
> can be legitimately missing from the local object store, by using the
> has_object_file() helper instead.
> 
> Signed-off-by: Han Xin <hanxin.hx@bytedance.com>
> ---
>  commit-graph.c                             |  2 +-
>  t/t5330-no-lazy-fetch-with-commit-graph.sh | 70 ++++++++++++++++++++++
>  2 files changed, 71 insertions(+), 1 deletion(-)
>  create mode 100755 t/t5330-no-lazy-fetch-with-commit-graph.sh
> 
> diff --git a/commit-graph.c b/commit-graph.c
> index 92d4503336..2b04ef072d 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -898,7 +898,7 @@ struct commit *lookup_commit_in_graph(struct repository *repo, const struct obje
>                 return NULL;
>         if (!search_commit_pos_in_graph(id, repo->objects->commit_graph, &pos))
>                 return NULL;
> -       if (!repo_has_object_file(repo, id))
> +       if (!has_object(repo, id, 0))
>                 return NULL;
>  
>         commit = lookup_commit(repo, id);
> diff --git a/t/t5330-no-lazy-fetch-with-commit-graph.sh b/t/t5330-no-lazy-fetch-with-commit-graph.sh
> new file mode 100755
> index 0000000000..be33334229
> --- /dev/null
> +++ b/t/t5330-no-lazy-fetch-with-commit-graph.sh
> @@ -0,0 +1,70 @@
> +#!/bin/sh
> +
> +test_description='test for no lazy fetch with the commit-graph'
> +
> +. ./test-lib.sh
> +
> +run_with_limited_processses () {
> +       # bash and ksh use "ulimit -u", dash uses "ulimit -p"
> +       if test -n "$BASH_VERSION"
> +       then
> +               ulimit_max_process="-u"
> +       elif test -n "$KSH_VERSION"
> +       then
> +               ulimit_max_process="-u"
> +       fi
> +       (ulimit ${ulimit_max_process-"-p"} 512 && "$@")
> +}

This new test fails for me unless I increase max_processes. 1024 works.

I haven't bisected the number of prcesses ... This is higly system
dependent. I even run a slim environment (i3wm) but having chrome or
such running probably makes quite a difference.

512 is probably OK in CI in an isolated environment but is too low on a
typical "What you mean I'm not working? I'm waiting for the test run!"
developper workstation.

Conversely, which number would be too high to catch what the test is
supposed to catch? Does it incur a big performance penalty to go as high
as possible?

> +
> +test_lazy_prereq ULIMIT_PROCESSES '
> +       run_with_limited_processses true
> +'
> +
> +if ! test_have_prereq ULIMIT_PROCESSES
> +then
> +       skip_all='skipping tests for no lazy fetch with the commit-graph, ulimit processes not available'
> +       test_done
> +fi
> +
> +test_expect_success 'setup: prepare a repository with a commit' '
> +       git init with-commit &&
> +       test_commit -C with-commit the-commit &&
> +       oid=$(git -C with-commit rev-parse HEAD)
> +'
> +
> +test_expect_success 'setup: prepare a repository with commit-graph contains the commit' '
> +       git init with-commit-graph &&
> +       echo "$(pwd)/with-commit/.git/objects" \
> +               >with-commit-graph/.git/objects/info/alternates &&
> +       # create a ref that points to the commit in alternates
> +       git -C with-commit-graph update-ref refs/ref_to_the_commit "$oid" &&
> +       # prepare some other objects to commit-graph
> +       test_commit -C with-commit-graph something &&
> +       git -c gc.writeCommitGraph=true -C with-commit-graph gc &&
> +       test_path_is_file with-commit-graph/.git/objects/info/commit-graph
> +'
> +
> +test_expect_success 'setup: change the alternates to what without the commit' '
> +       git init --bare without-commit &&
> +       git -C with-commit-graph cat-file -e $oid &&
> +       echo "$(pwd)/without-commit/objects" \
> +               >with-commit-graph/.git/objects/info/alternates &&
> +       test_must_fail git -C with-commit-graph cat-file -e $oid
> +'
> +
> +test_expect_success 'fetch any commit from promisor with the usage of the commit graph' '
> +       # setup promisor and prepare any commit to fetch
> +       git -C with-commit-graph remote add origin "$(pwd)/with-commit" &&
> +       git -C with-commit-graph config remote.origin.promisor true &&
> +       git -C with-commit-graph config remote.origin.partialclonefilter blob:none &&
> +       test_commit -C with-commit any-commit &&
> +       anycommit=$(git -C with-commit rev-parse HEAD) &&
> +
> +       run_with_limited_processses env GIT_TRACE="$(pwd)/trace.txt" \
> +               git -C with-commit-graph fetch origin $anycommit 2>err &&

That empty line abobe makes me nervous, especially when a test fails for
very unclear reasons like here. Is it necessary?

If the answer is "to separate setup and test" then the solution is to
separate setup and test ...

> +       ! grep "fatal: promisor-remote: unable to fork off fetch subprocess" err &&
> +       grep "git fetch origin" trace.txt >actual &&
> +       test_line_count = 1 actual
> +'
> +
> +test_done
> -- 
> 2.36.1
> 
>

  reply	other threads:[~2022-07-09 12:23 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-14  7:25 An endless loop fetching issue with partial clone, alternates and commit graph Haiyng Tan
2022-06-15  2:18 ` Taylor Blau
2022-06-16  3:38   ` [RFC PATCH 0/2] " Han Xin
2022-06-16  3:38     ` [RFC PATCH 1/2] commit-graph.c: add "flags" to lookup_commit_in_graph() Han Xin
2022-06-16  3:38     ` [RFC PATCH 2/2] fetch-pack.c: pass "oi_flags" " Han Xin
2022-06-17 21:47     ` [RFC PATCH 0/2] Re: An endless loop fetching issue with partial clone, alternates and commit graph Jonathan Tan
2022-06-18  3:01     ` [PATCH v1] commit-graph.c: no lazy fetch in lookup_commit_in_graph() Han Xin
2022-06-20  7:07       ` Patrick Steinhardt
2022-06-20  8:53         ` [External] " 欣韩
2022-06-20  9:05           ` Patrick Steinhardt
2022-06-21 18:23       ` Jonathan Tan
2022-06-22  3:17         ` Han Xin
2022-06-24  5:27       ` [PATCH v2 0/2] " Han Xin
2022-06-24  5:27         ` [PATCH v2 1/2] test-lib.sh: add limited processes to test-lib Han Xin
2022-06-24 16:03           ` Junio C Hamano
2022-06-25  1:35             ` Han Xin
2022-06-27 12:22               ` Junio C Hamano
2022-06-24  5:27         ` [PATCH v2 2/2] commit-graph.c: no lazy fetch in lookup_commit_in_graph() Han Xin
2022-06-24 16:56           ` Junio C Hamano
2022-06-25  2:25             ` Han Xin
2022-06-25  2:31               ` Han Xin
2022-06-28  2:02         ` [PATCH v3 0/2] " Han Xin
2022-06-28  2:02           ` [PATCH v3 1/2] test-lib.sh: add limited processes to test-lib Han Xin
2022-06-28  2:02           ` [PATCH v3 2/2] commit-graph.c: no lazy fetch in lookup_commit_in_graph() Han Xin
2022-06-28  7:49             ` Ævar Arnfjörð Bjarmason
2022-06-28 17:36               ` Junio C Hamano
2022-06-30 12:21                 ` Johannes Schindelin
2022-06-30 13:43                   ` Ævar Arnfjörð Bjarmason
2022-06-30 15:40                     ` Junio C Hamano
2022-06-30 18:47                       ` Ævar Arnfjörð Bjarmason
2022-07-01 19:31                       ` Johannes Schindelin
2022-07-01 20:47                         ` Junio C Hamano
2022-06-29  2:08               ` Han Xin
2022-06-30 17:37           ` test name conflict + js/ci-github-workflow-markup regression (was: [PATCH v3 0/2] no lazy fetch in lookup_commit_in_graph()) Ævar Arnfjörð Bjarmason
2022-07-01  1:34           ` [PATCH v4 0/1] no lazy fetch in lookup_commit_in_graph() Han Xin
2022-07-01  1:34             ` [PATCH v4 1/1] commit-graph.c: " Han Xin
2022-07-09 12:23               ` Michael J Gruber [this message]
2022-07-11 15:09                 ` Jeff King
2022-07-11 20:17                   ` Junio C Hamano
2022-07-12  1:52                     ` [External] " Han Xin
2022-07-12  5:23                       ` Junio C Hamano
2022-07-12  5:32                         ` Han Xin
2022-07-12  6:37                         ` [External] " Jeff King
2022-07-12 14:19                           ` Junio C Hamano
2022-07-12  6:50             ` [PATCH v5 0/1] " Han Xin
2022-07-12  6:50               ` [PATCH v5 1/1] commit-graph.c: " Han Xin
2022-07-12  9:50                 ` Ævar Arnfjörð Bjarmason
2022-07-13  1:26                   ` Han Xin
2022-07-12  6:58               ` [PATCH v5 0/1] " Jeff King
2022-07-12  8:01             ` [PATCH v1] t5330: remove run_with_limited_processses() Han Xin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=165736941632.704481.18414237954289110814.git@grubix.eu \
    --to=git@grubix.eu \
    --cc=chiyutianyi@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=haiyangtand@gmail.com \
    --cc=hanxin.hx@bytedance.com \
    --cc=jonathantanmy@google.com \
    --cc=me@ttaylorr.com \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).