git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Duy Nguyen <pclouds@gmail.com>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: "Git Mailing List" <git@vger.kernel.org>,
	"Jeff King" <peff@peff.net>,
	"Johannes Schindelin" <Johannes.Schindelin@gmx.de>,
	"SZEDER Gábor" <szeder.dev@gmail.com>
Subject: Re: [PATCH v2 2/2] diff: batch fetching of missing blobs
Date: Fri, 5 Apr 2019 16:39:34 +0700	[thread overview]
Message-ID: <CACsJy8CgXLZxqab4vcP1jh3OMCGh1i=easb5BpCs1J8Uf_jsxw@mail.gmail.com> (raw)
In-Reply-To: <44de02e584f449481e6fb00cf35d74adf0192e9d.1553895166.git.jonathantanmy@google.com>

On Sat, Mar 30, 2019 at 4:40 AM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> When running a command like "git show" or "git diff" in a partial clone,
> batch all missing blobs to be fetched as one request.
>
> This is similar to c0c578b33c ("unpack-trees: batch fetching of missing
> blobs", 2017-12-08), but for another command.
>
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  diff.c                        |  32 +++++++++++
>  t/t4067-diff-partial-clone.sh | 103 ++++++++++++++++++++++++++++++++++
>  2 files changed, 135 insertions(+)
>  create mode 100755 t/t4067-diff-partial-clone.sh
>
> diff --git a/diff.c b/diff.c
> index ec5c095199..1eccefb4ef 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -25,6 +25,7 @@
>  #include "packfile.h"
>  #include "parse-options.h"
>  #include "help.h"
> +#include "fetch-object.h"
>
>  #ifdef NO_FAST_WORKING_DIRECTORY
>  #define FAST_WORKING_DIRECTORY 0
> @@ -6366,8 +6367,39 @@ void diffcore_fix_diff_index(void)
>         QSORT(q->queue, q->nr, diffnamecmp);
>  }
>
> +static void add_if_missing(struct oid_array *to_fetch,
> +                          const struct diff_filespec *filespec)
> +{
> +       if (filespec && filespec->oid_valid &&
> +           oid_object_info_extended(the_repository, &filespec->oid, NULL,

I'm quite sure we can pass 'struct repository *' around in diff code
now. I think it's the "repo" field in "struct diff_options". Please
use it and avoid more references to the_repository.

> +                                    OBJECT_INFO_FOR_PREFETCH))
> +               oid_array_append(to_fetch, &filespec->oid);
> +}
> +
>  void diffcore_std(struct diff_options *options)
>  {
> +       if (repository_format_partial_clone) {
> +               /*
> +                * Prefetch the diff pairs that are about to be flushed.
> +                */
> +               int i;
> +               struct diff_queue_struct *q = &diff_queued_diff;
> +               struct oid_array to_fetch = OID_ARRAY_INIT;
> +
> +               for (i = 0; i < q->nr; i++) {
> +                       struct diff_filepair *p = q->queue[i];
> +                       add_if_missing(&to_fetch, p->one);
> +                       add_if_missing(&to_fetch, p->two);
> +               }
> +               if (to_fetch.nr)
> +                       /*
> +                        * NEEDSWORK: Consider deduplicating the OIDs sent.
> +                        */
> +                       fetch_objects(repository_format_partial_clone,
> +                                     to_fetch.oid, to_fetch.nr);
> +               oid_array_clear(&to_fetch);
> +       }
> +
>         /* NOTE please keep the following in sync with diff_tree_combined() */
>         if (options->skip_stat_unmatch)
>                 diffcore_skip_stat_unmatch(options);
> diff --git a/t/t4067-diff-partial-clone.sh b/t/t4067-diff-partial-clone.sh
> new file mode 100755
> index 0000000000..349851be7d
> --- /dev/null
> +++ b/t/t4067-diff-partial-clone.sh
> @@ -0,0 +1,103 @@
> +#!/bin/sh
> +
> +test_description='behavior of diff when reading objects in a partial clone'
> +
> +. ./test-lib.sh
> +
> +test_expect_success 'git show batches blobs' '
> +       test_when_finished "rm -rf server client trace" &&
> +
> +       test_create_repo server &&
> +       echo a >server/a &&
> +       echo b >server/b &&
> +       git -C server add a b &&
> +       git -C server commit -m x &&
> +
> +       test_config -C server uploadpack.allowfilter 1 &&
> +       test_config -C server uploadpack.allowanysha1inwant 1 &&
> +       git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client &&
> +
> +       # Ensure that there is exactly 1 negotiation by checking that there is
> +       # only 1 "done" line sent. ("done" marks the end of negotiation.)
> +       GIT_TRACE_PACKET="$(pwd)/trace" git -C client show HEAD &&
> +       grep "git> done" trace >done_lines &&
> +       test_line_count = 1 done_lines
> +'
> +
> +test_expect_success 'diff batches blobs' '
> +       test_when_finished "rm -rf server client trace" &&
> +
> +       test_create_repo server &&
> +       echo a >server/a &&
> +       echo b >server/b &&
> +       git -C server add a b &&
> +       git -C server commit -m x &&
> +       echo c >server/c &&
> +       echo d >server/d &&
> +       git -C server add c d &&
> +       git -C server commit -m x &&
> +
> +       test_config -C server uploadpack.allowfilter 1 &&
> +       test_config -C server uploadpack.allowanysha1inwant 1 &&
> +       git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client &&
> +
> +       # Ensure that there is exactly 1 negotiation by checking that there is
> +       # only 1 "done" line sent. ("done" marks the end of negotiation.)
> +       GIT_TRACE_PACKET="$(pwd)/trace" git -C client diff HEAD^ HEAD &&
> +       grep "git> done" trace >done_lines &&
> +       test_line_count = 1 done_lines
> +'
> +
> +test_expect_success 'diff skips same-OID blobs' '
> +       test_when_finished "rm -rf server client trace" &&
> +
> +       test_create_repo server &&
> +       echo a >server/a &&
> +       echo b >server/b &&
> +       git -C server add a b &&
> +       git -C server commit -m x &&
> +       echo another-a >server/a &&
> +       git -C server add a &&
> +       git -C server commit -m x &&
> +
> +       test_config -C server uploadpack.allowfilter 1 &&
> +       test_config -C server uploadpack.allowanysha1inwant 1 &&
> +       git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client &&
> +
> +       echo a | git hash-object --stdin >hash-old-a &&
> +       echo another-a | git hash-object --stdin >hash-new-a &&
> +       echo b | git hash-object --stdin >hash-b &&
> +
> +       # Ensure that only a and another-a are fetched.
> +       GIT_TRACE_PACKET="$(pwd)/trace" git -C client diff HEAD^ HEAD &&
> +       grep "want $(cat hash-old-a)" trace &&
> +       grep "want $(cat hash-new-a)" trace &&
> +       ! grep "want $(cat hash-b)" trace
> +'
> +
> +test_expect_success 'diff with rename detection batches blobs' '
> +       test_when_finished "rm -rf server client trace" &&
> +
> +       test_create_repo server &&
> +       echo a >server/a &&
> +       printf "b\nb\nb\nb\nb\n" >server/b &&
> +       git -C server add a b &&
> +       git -C server commit -m x &&
> +       rm server/b &&
> +       printf "b\nb\nb\nb\nbX\n" >server/c &&
> +       git -C server add c &&
> +       git -C server commit -a -m x &&
> +
> +       test_config -C server uploadpack.allowfilter 1 &&
> +       test_config -C server uploadpack.allowanysha1inwant 1 &&
> +       git clone --bare --filter=blob:limit=0 "file://$(pwd)/server" client &&
> +
> +       # Ensure that there is exactly 1 negotiation by checking that there is
> +       # only 1 "done" line sent. ("done" marks the end of negotiation.)
> +       GIT_TRACE_PACKET="$(pwd)/trace" git -C client diff -M HEAD^ HEAD >out &&
> +       grep "similarity index" out &&
> +       grep "git> done" trace >done_lines &&
> +       test_line_count = 1 done_lines
> +'
> +
> +test_done
> --
> 2.21.0.197.gd478713db0
>


-- 
Duy

  parent reply	other threads:[~2019-04-05  9:40 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-26 22:09 [PATCH] diff: batch fetching of missing blobs Jonathan Tan
2019-03-27 10:10 ` SZEDER Gábor
2019-03-27 22:02 ` Johannes Schindelin
2019-03-28  6:52 ` Jeff King
2019-03-29 21:39 ` [PATCH v2 0/2] Batch fetching of missing blobs in diff and show Jonathan Tan
2019-03-29 21:39   ` [PATCH v2 1/2] sha1-file: support OBJECT_INFO_FOR_PREFETCH Jonathan Tan
2019-04-05 14:13     ` Johannes Schindelin
2019-04-05 22:00     ` Jeff King
2019-03-29 21:39   ` [PATCH v2 2/2] diff: batch fetching of missing blobs Jonathan Tan
2019-04-04  2:47     ` SZEDER Gábor
2019-04-05 13:38       ` Johannes Schindelin
2019-04-07  6:00         ` Christian Couder
2019-04-08  2:36           ` Junio C Hamano
2019-04-08  5:51             ` Junio C Hamano
2019-04-08  6:03               ` Junio C Hamano
2019-04-08  6:45                 ` Christian Couder
2019-04-08  6:40             ` Christian Couder
2019-04-08  7:59               ` Junio C Hamano
2019-04-08  9:56                 ` Christian Couder
2019-04-05  9:39     ` Duy Nguyen [this message]
2019-04-05 17:09       ` [PATCH] fixup! " Jonathan Tan
2019-04-05 20:16         ` Johannes Schindelin
2019-04-06  4:17         ` Duy Nguyen
2019-04-08  3:46           ` Junio C Hamano
2019-04-08  4:06           ` Junio C Hamano
2019-04-08  9:58             ` Duy Nguyen
2019-04-09  6:36               ` Junio C Hamano
2019-04-05 14:17     ` [PATCH v2 2/2] " Johannes Schindelin
2019-04-05 22:12   ` [PATCH v2 0/2] Batch fetching of missing blobs in diff and show Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACsJy8CgXLZxqab4vcP1jh3OMCGh1i=easb5BpCs1J8Uf_jsxw@mail.gmail.com' \
    --to=pclouds@gmail.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=peff@peff.net \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).