git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Kousik Sanagavarapu <five231003@gmail.com>
To: jonathantanmy@google.com
Cc: git@vger.kernel.org, five231003@gmail.com
Subject: Re: [PATCH] index-pack: remove fetch_if_missing=0
Date: Tue, 28 Feb 2023 09:24:48 +0530	[thread overview]
Message-ID: <20230228035448.10700-1-five231003@gmail.com> (raw)
In-Reply-To: <20230227221451.2433306-1-jonathantanmy@google.com>

On Tue, 28 Feb 2023 at 03:44, Jonathan Tan <jonathantanmy@google.com> wrote:
>
> Kousik Sanagavarapu <five231003@gmail.com> writes:
> > A collision test is triggered in sha1_object(), whenever there is an
> > object file in our repo. If our repo is a partial clone, then checking
> > for this file existence has the behavior of lazy-fetching the object
> > because we have one or more promisor remotes.
>
> Hmm...this is not true, because (as you said)...
>
> > This behavior is controlled by setting fetch_if_missing to 0,
>
> ...this makes it so that we don't fetch in this situation.

Yes, that statement is false if fetch_if_missing is set to 0. But my original
thought in writing it was so that the anyone who is reading the commit message
understands the motivation as to why we are setting fetch_if_missing to 0.

> [...]
>
> > @@ -1728,14 +1727,6 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
> >       int report_end_of_input = 0;
> >       int hash_algo = 0;
> >
> > -     /*
> > -      * index-pack never needs to fetch missing objects except when
> > -      * REF_DELTA bases are missing (which are explicitly handled). It only
> > -      * accesses the repo to do hash collision checks and to check which
> > -      * REF_DELTA bases need to be fetched.
> > -      */
> > -     fetch_if_missing = 0;
>
> I think that the author of such a commit (you) should also independently
> verify that this comment is true (and if it is, then yes, all the
> remaining cases are handled and we can remove this assignment to
> fetch_if_missing). I believe this comment to be true, but I haven't
> checked the code in a while so I'm not sure myself.

It seems indeed that this is the only place where lazy-fetching is possible.
I checked this by looking up the calls for oid_object_info_extended() or
any other function in object-file.c which depends on it.

In builtin/index-pack.c, we have (in the order that these functions appear)

- check_object()
    Call to oid_object_info(), but we return early with
    0 if we don't have an object.

- sha1_object()
    Call to has_object_file_with_flags() (which this patch replaces with
    has_object()), where lazy-fetching is possible.
    
    Calls to oid_object_info() and read_object_file(), which trigger only
    when the above has_object_file_with_flags() succeeds.

- fix_unresolved_deltas()
    Call to oid_object_info_extended(), we prefetch delta bases.

    Call to read_object_file(), but we only read data from ref_delta_entry.
    In case it was a delta base, we already prefetched it.

There are cases where we fsck objects, but lazy-fetching is already handled
in fsck (although by setting fetch_if_missing to 0).

Do we need to be explicit about this in the commit message? That sha1_object()
is the only place where there is a chance to lazy-fetch if it is a partial clone?

> > +test_expect_success 'index-pack does not lazy-fetch when checking for sha1 collisions' '
> > +     rm -rf server promisor-remote client &&
> > +     rm -rf object-count &&
> > +
> > +     git init server &&
> > +     for i in 1 2 3 4
> > +     do
> > +             echo $i >$(pwd)/server/file$i &&
> > +             git -C server add file$i &&
> > +             git -C server commit -am "Commit $i" || return 1
> > +     done &&
> > +     git -C server config --local uploadpack.allowFilter 1 &&
> > +     git -C server config --local uploadpack.allowAnySha1InWant 1 &&
> > +     HASH=$(git -C server hash-object file3) &&
> > +
> > +     git init promisor-remote &&
> > +     git -C promisor-remote fetch --keep "file://$(pwd)/server" $HASH &&
> > +
> > +     git clone --no-checkout --filter=blob:none "file://$(pwd)/server" client &&
> > +     git -C client remote set-url origin "file://$(pwd)/promisor-remote" &&
> > +     git -C client config extensions.partialClone 1 &&
> > +     git -C client config remote.origin.promisor 1 &&
> > +
> > +     # make sure that index-pack is run from within the repository
> > +     git -C client index-pack $(pwd)/client/.git/objects/pack/*.pack &&
> > +     test_path_is_missing $(pwd)/client/file3
> > +'
>
> How does this check that no lazy fetch has occurred? It seems to me
> that you're just checking the existence of a file in the worktree,
> which does not indicate the presence or absence of a lazy fetch.

What I had in mind was if the file was lazy-fetched (because of the failure
of has_object_file_with_flags() and fetch_if_missing not set to 0), then
it would be unpacked and we would find it in the worktree. Since, we
prevent this exact behavior by using has_object(), we should not find
such a file in our repo.

> I think the way to test needs to be more complicated: you need
> to create a partial clone, fetch into it from another repo, and
> then verify that no fetches were made to the original partial
> clone.

So, after the fetch, during the pack indexing phase, we look for
any additional fetches made. This makes more sense and it would
be way more clear, to anyone reading, than what I wrote.

Will do a reroll. If there needs to be a change in the commit message
as well, please let me know.

Thanks for the review

  reply	other threads:[~2023-02-28  3:55 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-25  5:24 [PATCH] index-pack: remove fetch_if_missing=0 Kousik Sanagavarapu
2023-02-27 16:56 ` Kousik Sanagavarapu
2023-02-27 22:14 ` Jonathan Tan
2023-02-28  3:54   ` Kousik Sanagavarapu [this message]
2023-03-10 18:30 ` [PATCH v2] " Kousik Sanagavarapu
2023-03-10 20:30   ` Junio C Hamano
2023-03-10 21:13     ` Jonathan Tan
2023-03-10 21:41       ` Junio C Hamano
2023-03-11  2:59         ` Jonathan Tan
2023-03-12 17:16         ` Kousik Sanagavarapu
2023-03-11  6:22       ` [PATCH] " Kousik Sanagavarapu
2023-03-11  6:00     ` Kousik Sanagavarapu
2023-03-13 18:15   ` [PATCH v3] " Kousik Sanagavarapu
2023-03-13 19:17     ` Junio C Hamano
2023-03-13 19:18     ` Junio C Hamano
2023-03-17 17:56     ` [PATCH v4] " Kousik Sanagavarapu
2023-03-17 22:58       ` Junio C Hamano
2023-03-19  6:17         ` Kousik Sanagavarapu
2023-03-11 20:01 ` [PATCH] " Sean Allred
2023-03-11 20:37   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230228035448.10700-1-five231003@gmail.com \
    --to=five231003@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).