From: Duy Nguyen <pclouds@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: Jonathan Tan <jonathantanmy@google.com>,
Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH] fixup! diff: batch fetching of missing blobs
Date: Mon, 8 Apr 2019 16:58:35 +0700 [thread overview]
Message-ID: <CACsJy8BHqaqOHVbwtONU5=RiG7Q8WNNAN5EGV_nm7NyNWeyuiQ@mail.gmail.com> (raw)
In-Reply-To: <xmqqmul1b0pg.fsf@gitster-ct.c.googlers.com>
On Mon, Apr 8, 2019 at 11:06 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Duy Nguyen <pclouds@gmail.com> writes:
>
> >> Avoid a usage of the_repository by propagating the configured repository
> >> to add_if_missing(). Also, prefetch only if the repository being diffed
> >> is the_repository (because we do not support lazy fetching for any other
> >> repository anyway).
>
> If we are willing to stay limited to the default repository anyway,
> allowing add_if_missing() to take an arbitrary repository does not
> really matter, but before the caller of add_if_missing() befcomes
> ready to work on an arbitrary repository, this change has to happen.
>
> To update the caller, it seems to me that fetch_objects() must learn
> to take an arbitrary repository, but is that the only thing needed?
> After that, the function that the caller resides in and callchain
> upwards can learn to take a repository instance if we want to be
> able to diff inside an arbitrary repository.
>
> But. Such a change still would not allow us to compare a tree in
> one repository against a tree in another repository.
I feel lost (and the answer "go read partial clone code!" is perfectly
acceptable) but why would we need to diff trees of two different
repositories?
> It is likely
> that a caller with such a need would simply make sure that objects
> in both repositories are available by using the in-core alternate
> object store mechanism, making it a more-or-less moot point to be
> able to pass a repository instance through the callchain X-<. We
> probably should make it, and spell it out somewhere in a long term
> vision shared among the developers, an explicit goal to get rid of
> the internal (ab)use of the alternate object store mechanism.
I think submodule code so far is doing this way. Though I don't see
any reason we need it for submodule code. Objects are not supposed to
be shared between the super- and the sub-repo.
>
> With squashing the fix-up commit in, the 2/2 patch has become like
> so.
>
> Thanks, both.
>
> -- >8 --
> From: Jonathan Tan <jonathantanmy@google.com>
> Date: Fri, 5 Apr 2019 10:09:34 -0700
> Subject: [PATCH] diff: batch fetching of missing blobs
>
> When running a command like "git show" or "git diff" in a partial clone,
> batch all missing blobs to be fetched as one request.
>
> This is similar to c0c578b33c ("unpack-trees: batch fetching of missing
> blobs", 2017-12-08), but for another command.
>
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
> diff.c | 33 +++++++++++
> t/t4067-diff-partial-clone.sh | 103 ++++++++++++++++++++++++++++++++++
> 2 files changed, 136 insertions(+)
> create mode 100755 t/t4067-diff-partial-clone.sh
>
> diff --git a/diff.c b/diff.c
> index ec5c095199..811afbdfb1 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -25,6 +25,7 @@
> #include "packfile.h"
> #include "parse-options.h"
> #include "help.h"
> +#include "fetch-object.h"
>
> #ifdef NO_FAST_WORKING_DIRECTORY
> #define FAST_WORKING_DIRECTORY 0
> @@ -6366,8 +6367,40 @@ void diffcore_fix_diff_index(void)
> QSORT(q->queue, q->nr, diffnamecmp);
> }
>
> +static void add_if_missing(struct oid_array *to_fetch, struct repository *r,
> + const struct diff_filespec *filespec)
> +{
> + if (filespec && filespec->oid_valid &&
> + oid_object_info_extended(r, &filespec->oid, NULL,
> + OBJECT_INFO_FOR_PREFETCH))
> + oid_array_append(to_fetch, &filespec->oid);
> +}
> +
> void diffcore_std(struct diff_options *options)
> {
> + if (options->repo == the_repository &&
> + repository_format_partial_clone) {
> + /*
> + * Prefetch the diff pairs that are about to be flushed.
> + */
> + int i;
> + struct diff_queue_struct *q = &diff_queued_diff;
> + struct oid_array to_fetch = OID_ARRAY_INIT;
> +
> + for (i = 0; i < q->nr; i++) {
> + struct diff_filepair *p = q->queue[i];
> + add_if_missing(&to_fetch, options->repo, p->one);
> + add_if_missing(&to_fetch, options->repo, p->two);
> + }
> + if (to_fetch.nr)
> + /*
> + * NEEDSWORK: Consider deduplicating the OIDs sent.
> + */
> + fetch_objects(repository_format_partial_clone,
> + to_fetch.oid, to_fetch.nr);
> + oid_array_clear(&to_fetch);
> + }
> +
> /* NOTE please keep the following in sync with diff_tree_combined() */
> if (options->skip_stat_unmatch)
> diffcore_skip_stat_unmatch(options);
> diff --git a/t/t4067-diff-partial-clone.sh b/t/t4067-diff-partial-clone.sh
> new file mode 100755
> index 0000000000..90c8fb2901
> --- /dev/null
> +++ b/t/t4067-diff-partial-clone.sh
> @@ -0,0 +1,103 @@
> +#!/bin/sh
> +
> +test_description='behavior of diff when reading objects in a partial clone'
> +
> +. ./test-lib.sh
> +
> +test_expect_success 'git show batches blobs' '
> + test_when_finished "rm -rf server client trace" &&
> +
> + test_create_repo server &&
> + echo a >server/a &&
> + echo b >server/b &&
> + git -C server add a b &&
> + git -C server commit -m x &&
> +
> + test_config -C server uploadpack.allowfilter 1 &&
> + test_config -C server uploadpack.allowanysha1inwant 1 &&
> + git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client &&
> +
> + # Ensure that there is exactly 1 negotiation by checking that there is
> + # only 1 "done" line sent. ("done" marks the end of negotiation.)
> + GIT_TRACE_PACKET="$(pwd)/trace" git -C client show HEAD &&
> + grep "git> done" trace >done_lines &&
> + test_line_count = 1 done_lines
> +'
> +
> +test_expect_success 'diff batches blobs' '
> + test_when_finished "rm -rf server client trace" &&
> +
> + test_create_repo server &&
> + echo a >server/a &&
> + echo b >server/b &&
> + git -C server add a b &&
> + git -C server commit -m x &&
> + echo c >server/c &&
> + echo d >server/d &&
> + git -C server add c d &&
> + git -C server commit -m x &&
> +
> + test_config -C server uploadpack.allowfilter 1 &&
> + test_config -C server uploadpack.allowanysha1inwant 1 &&
> + git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client &&
> +
> + # Ensure that there is exactly 1 negotiation by checking that there is
> + # only 1 "done" line sent. ("done" marks the end of negotiation.)
> + GIT_TRACE_PACKET="$(pwd)/trace" git -C client diff HEAD^ HEAD &&
> + grep "git> done" trace >done_lines &&
> + test_line_count = 1 done_lines
> +'
> +
> +test_expect_success 'diff skips same-OID blobs' '
> + test_when_finished "rm -rf server client trace" &&
> +
> + test_create_repo server &&
> + echo a >server/a &&
> + echo b >server/b &&
> + git -C server add a b &&
> + git -C server commit -m x &&
> + echo another-a >server/a &&
> + git -C server add a &&
> + git -C server commit -m x &&
> +
> + test_config -C server uploadpack.allowfilter 1 &&
> + test_config -C server uploadpack.allowanysha1inwant 1 &&
> + git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client &&
> +
> + echo a | git hash-object --stdin >hash-old-a &&
> + echo another-a | git hash-object --stdin >hash-new-a &&
> + echo b | git hash-object --stdin >hash-b &&
> +
> + # Ensure that only a and another-a are fetched.
> + GIT_TRACE_PACKET="$(pwd)/trace" git -C client diff HEAD^ HEAD &&
> + grep "want $(cat hash-old-a)" trace &&
> + grep "want $(cat hash-new-a)" trace &&
> + ! grep "want $(cat hash-b)" trace
> +'
> +
> +test_expect_success 'diff with rename detection batches blobs' '
> + test_when_finished "rm -rf server client trace" &&
> +
> + test_create_repo server &&
> + echo a >server/a &&
> + printf "b\nb\nb\nb\nb\n" >server/b &&
> + git -C server add a b &&
> + git -C server commit -m x &&
> + rm server/b &&
> + printf "b\nb\nb\nb\nbX\n" >server/c &&
> + git -C server add c &&
> + git -C server commit -a -m x &&
> +
> + test_config -C server uploadpack.allowfilter 1 &&
> + test_config -C server uploadpack.allowanysha1inwant 1 &&
> + git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client &&
> +
> + # Ensure that there is exactly 1 negotiation by checking that there is
> + # only 1 "done" line sent. ("done" marks the end of negotiation.)
> + GIT_TRACE_PACKET="$(pwd)/trace" git -C client diff -M HEAD^ HEAD >out &&
> + grep "similarity index" out &&
> + grep "git> done" trace >done_lines &&
> + test_line_count = 1 done_lines
> +'
> +
> +test_done
> --
> 2.21.0-196-g041f5ea1cf
>
--
Duy
next prev parent reply other threads:[~2019-04-08 9:59 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-03-26 22:09 [PATCH] diff: batch fetching of missing blobs Jonathan Tan
2019-03-27 10:10 ` SZEDER Gábor
2019-03-27 22:02 ` Johannes Schindelin
2019-03-28 6:52 ` Jeff King
2019-03-29 21:39 ` [PATCH v2 0/2] Batch fetching of missing blobs in diff and show Jonathan Tan
2019-03-29 21:39 ` [PATCH v2 1/2] sha1-file: support OBJECT_INFO_FOR_PREFETCH Jonathan Tan
2019-04-05 14:13 ` Johannes Schindelin
2019-04-05 22:00 ` Jeff King
2019-03-29 21:39 ` [PATCH v2 2/2] diff: batch fetching of missing blobs Jonathan Tan
2019-04-04 2:47 ` SZEDER Gábor
2019-04-05 13:38 ` Johannes Schindelin
2019-04-07 6:00 ` Christian Couder
2019-04-08 2:36 ` Junio C Hamano
2019-04-08 5:51 ` Junio C Hamano
2019-04-08 6:03 ` Junio C Hamano
2019-04-08 6:45 ` Christian Couder
2019-04-08 6:40 ` Christian Couder
2019-04-08 7:59 ` Junio C Hamano
2019-04-08 9:56 ` Christian Couder
2019-04-05 9:39 ` Duy Nguyen
2019-04-05 17:09 ` [PATCH] fixup! " Jonathan Tan
2019-04-05 20:16 ` Johannes Schindelin
2019-04-06 4:17 ` Duy Nguyen
2019-04-08 3:46 ` Junio C Hamano
2019-04-08 4:06 ` Junio C Hamano
2019-04-08 9:58 ` Duy Nguyen [this message]
2019-04-09 6:36 ` Junio C Hamano
2019-04-05 14:17 ` [PATCH v2 2/2] " Johannes Schindelin
2019-04-05 22:12 ` [PATCH v2 0/2] Batch fetching of missing blobs in diff and show Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CACsJy8BHqaqOHVbwtONU5=RiG7Q8WNNAN5EGV_nm7NyNWeyuiQ@mail.gmail.com' \
--to=pclouds@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jonathantanmy@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).