git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: ZheNing Hu <adlternative@gmail.com>
To: Elijah Newren <newren@gmail.com>
Cc: "ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com>,
	git@vger.kernel.org,
	"Christian Couder" <christian.couder@gmail.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Jeff King" <peff@peff.net>, "Junio C Hamano" <gitster@pobox.com>,
	"Derrick Stolee" <derrickstolee@github.com>,
	"Johannes Schindelin" <johannes.schindelin@gmx.de>,
	"Victoria Dye" <vdye@github.com>,
	"Emily Shaffer" <emilyshaffer@google.com>,
	"Matheus Tavares Bernardino" <matheus.bernardino@usp.br>,
	"Shaoxuan Yuan" <shaoxuan.yuan02@gmail.com>
Subject: Re: [PATCH] [RFC] diff: introduce scope option
Date: Mon, 14 Nov 2022 17:08:21 +0800	[thread overview]
Message-ID: <CAOLTT8R1K7L5p3+FkUAKMOA5+KcQU3YM8Pq2Oe4u24zi+t9Nrg@mail.gmail.com> (raw)
In-Reply-To: <CABPp-BErUhzm5=c6Tn4nPFKx-Kx0tYwf-J=G+nOFJ2bu-r+Dhg@mail.gmail.com>

Elijah Newren <newren@gmail.com> 于2022年11月6日周日 14:58写道:
>
> Hi,
>
> On Sat, Nov 5, 2022 at 7:11 PM ZheNing Hu <adlternative@gmail.com> wrote:
> > Elijah Newren <newren@gmail.com> 于2022年11月1日周二 13:18写道:
> > > On Sun, Oct 30, 2022 at 9:11 PM ZheNing Hu via GitGitGadget
> > > <gitgitgadget@gmail.com> wrote:
> > > >
> > > > From: ZheNing Hu <adlternative@gmail.com>
> > > >
> > > > When we use sparse-checkout, we often want the set of files
> > > > that some commands operate on to be restricted to the
> > > > sparse-checkout specification.
> > >
> > > It seems a bit premature to send this, when the guideline document[*]
> > > detailing how these options should work is still in the "Needs Review"
> > > state.  I know, it's been waiting for a while, but it's a _long_
> > > document.
> > >
> > > [*] https://lore.kernel.org/git/pull.1367.v3.git.1665269538608.gitgitgadget@gmail.com/
> > >
> >
> > Fine, I just want to start trying to experiment with this feature in
> > git-diff earlier,
> > and I can wait for the sparse-checkout.txt documentation to finish
> > first if needed :)
>
> Note that you may be able to help reduce the wait by reviewing the
> document.  (You commented on v1 -- thanks! -- but no one has commented
> on any of the newer versions, despite being out for over a month and
> still being marked as "Needs Review" in the "What's cooking" reports.)
>

I've skimmed v3 in general, but haven't looked at v4 yet, I'll take a closer
look at it.

> > > > `--no-scope` is the default
> > > > option for now.
> > >
> > > What does --no-scope even mean?  You didn't explain it, and I don't
> > > see how it could make sense.  We explicitly avoided a `--no-` prefix
> > > by allowing the --scope option to take a value.  I don't think there
> > > should be a --no-scope option.
> >
> > I think the “--no-scope” here does nothing, as if it were unaffected by scope
> > (just like correctly "--scope=all", right?)
>
> Reading your patch, I was unable to determine where you made
> --no-scope behave differently than how you implemented --scope=all.
> Maybe there was a difference and I missed it.
>
> Anyway, I don't think there should be a --no-scope option.
>

Fine, I'll get rid of it.

> > > > diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
> > > > index 3674ac48e92..04bf83e8be1 100644
> > > > --- a/Documentation/diff-options.txt
> > > > +++ b/Documentation/diff-options.txt
> > > > @@ -195,6 +195,24 @@ For instance, if you configured the `diff.algorithm` variable to a
> > > >  non-default value and want to use the default one, then you
> > > >  have to use `--diff-algorithm=default` option.
> > > >
> > > > +ifndef::git-format-patch[]
> > > > +ifndef::git-log[]
> > > > +
> > > > +--scope={sparse|all}::
> > > > +       Choose diff scope. The variants are as follows:
> > > > ++
> > > > +--
> > > > +`--sparse`;;
> > > > +       Restrict diff paths to those matching sparse-checkout specification.
> > > > +`--all`;;
> > > > +       Without restriction, diff is performed regardless of whether the path
> > > > +       meets the sparse-checkout specification.
> > > > +--
> > > > ++
> > > > +
> > > > +endif::git-log[]
> > > > +endif::git-format-patch[]
> > >
> > > What about diff-files, diff-index, diff-tree, and show?
> > >
> >
> > diff-options.txt included in their documents, and git-format-patch.txt,
> > git-log.txt, but should not in git-format-patch.txt. I don't know if it
> > should be included in git-diff-files.txt, because git diff-files compare
> > the files in the working tree and the index (so it's the same as
> > "simple" git-diff, which should not be affected by scope?)
>
> Oh, good point.  Yeah, git-diff-files.txt should not get this option
> as it won't have any affect.  git-diff-index and git-diff-tree and
> git-show and git-log should get it, though.
>
> > > > +int diff_path_in_sparse_checkout(const char *path) {
> > > > +       if (core_sparse_checkout_cone)
> > > > +               return path_in_cone_mode_sparse_checkout(path, the_repository->index);
> > > > +       else
> > > > +               return path_in_sparse_checkout(path, the_repository->index);
> > > > +}
> > >
> > > This says we are including the path if it matches the sparsity
> > > patterns.  Thus, we have to be careful when we use this function,
> > > because the relevant paths are ones that match the sparsity
> > > specification.  The sparsity specification will always match the
> > > sparsity patterns when diffing two commits, but when either the index
> > > or the working tree is part of the diff, the sparsity specification
> > > *might* be temporarily expanded.
> >
> > Yes, I may have to look at more code to better understand how and when the
> > "sparsity specification" is extended. Any recommendations for places to read?
>
> Yes, please review the newer version of my sparse-checkout.txt
> directions file; it covers this in more detail.
>

Ok, thanks.

> > > > @@ -439,6 +459,20 @@ static void do_oneway_diff(struct unpack_trees_options *o,
> > >
> > > do_oneway_diff is for cases where we are diffing against the index...
> > >
> > > >                         return; /* nothing to diff.. */
> > > >         }
> > > >
> > > > +       if (revs->diffopt.scope == DIFF_SCOPE_SPARSE) {
> > > > +               if (idx && tree) {
> > > > +                       if (!diff_paths_in_sparse_checkout(idx->name, tree->name))
> > > > +                               return;
> > > > +               } else if (idx) {
> > > > +                       if (!diff_path_in_sparse_checkout(idx->name))
> > > > +                               return;
> > > > +               } else if (tree) {
> > > > +                       if (!diff_path_in_sparse_checkout(tree->name))
> > > > +                               return;
> > > > +               } else
> > > > +                       return;
> > > > +       }
> > >
> > > ...and you again mistakenly only compare to the sparsity patterns
> > > instead of the sparse specification.
> > >
> >
> > So here we should use ce_skip_worktree(idx) to check if this index entry matches
> > sparse specification.
>
> ce_skip_worktree(idx) only checks for expansions to the sparse
> specification in the working tree.  If dealing with index files, the
> sparse specification is also expanded to handle all files in the index
> that differ from HEAD.
>

I will read more code to understand this detail.

> > > > +               } else
> > > > +                       return error(_("invalid --scope value: %s"), optarg);
> > > > +       }
> > >
> > > As written with no follow-on else clause here, wouldn't this silently
> > > accept "--scope" without an "=<something>" argument without an error
> > > and without properly initializing opt->scope?
> > >
> >
> > Because opt will be initializing with default_diff_options in repo_diff_setup(),
> > and opt->scope should respect config value first. So I don't think there's a
> > mistake here.
>
> Okay, it's good that you've got the variables initialized somehow, but
> that's only half the point here.  The main point is that the user can
> specify something that makes no sense and if they do, we should throw
> an error informing them of their mistake.  --scope should not be
> passed without an argument (either "all" or "sparse", currently), but
> this code allowed it.
>

Ah, it makes sense, I will fix it.

> > > > diff --git a/t/t4070-diff-sparse-checkout-scope.sh b/t/t4070-diff-sparse-checkout-scope.sh
> > > > new file mode 100644
> > >
> > > This needs to be fixed.
>
> Since you didn't comment on this but usually do comment on thing, I'll
> just re-ping it to make sure you don't miss the comment about the
> incorrect file mode here.
>

Yes, I've noticed the file permissions issue here, but there may be some other
issues with this test file: it fails when running CI tests, presumably
because of
some shell syntax incompatibility.

> > > > +reset_repo () {
> > > > +       rm -rf repo &&
> > > > +       git clone --no-checkout temp repo
> > >
> > > Why --no-checkout rather than say --sparse?
> > >
> >
> > This is because I accidentally associated it with a
> > partial clone. I often use "git clone -filter=blob:none -no-checkout"
> > to do a partial clone, then "git sparse- checkout set <pattern>"
> > to set sparse-checkout patterns, and "git checkout" to download
> > the missing blobs and checkout to a branch. But in this
> > test file, we only need sparse-checkout, so it's true that I should
> > not do such strange no-checkout thing.
>
> Yeah, no need to involve partial clones here.  (As an aside, though, I
> think even with a partial clone that --sparse makes more sense than
> --no-checkout.)
>

Using --sparse may indeed be better than -no-checkout, as there is no
need to perform an additional checkout after using clone --no-checkout
and sparse-checkout. But it uses sparse-checkout cone mode by default.

A little curious, why can't we specify --no-cone mode when doing git clone
and why can't we specify sparse-checkout patterns here? If such a feature
is available, git clone and git sparse-checkout will be combined in one step.

> > > > +change_worktree_and_index() {
> > > > +       (
> > > > +               cd repo &&
> > > > +               mkdir sub2 sub3 &&
> > > > +               echo sub1/file3 >sub1/file3 &&
> > > > +               echo sub2/file3 >sub2/file3 &&
> > > > +               echo sub3/file3 >sub3/file3 &&
> > > > +               echo file3 >file3 &&
> > > > +               git add --all --sparse &&
> > > > +               echo sub1/file3 >>sub1/file3 &&
> > > > +               echo sub2/file3 >>sub2/file3 &&
> > > > +               echo sub3/file3 >>sub3/file3 &&
> > > > +               echo file3 >>file3
> > > > +       )
> > > > +}
> > >
> > > It would be nice to modify different paths in the working tree and
> > > index, to see if we can handle cases where the sparse specification
> > > does not match the sparsity patterns better.  (So, modify files inside
> > > and outside the sparsity patterns, stage those changes, and then do a
> > > `git sparse-checkout reapply` to make the files outside the sparsity
> > > patterns disappear from the working tree...then modify different files
> > > in the working tree both inside and outside the sparsity patterns.
> > > And also remove some file that matches the sparsity patterns and
> > > manually mark it as SKIP_WORKTREE.)  That'd give us much better
> > > coverage.
> > >
> >
> > Nice addition. So I should use git update-index --skip-worktree to set
> > skip_worktree bit for index entries.
>
> Well, I'd say use normal "sparse-checkout" commands to set most of
> them.  However, adding one or two extra SKIP_WORKTREE paths via
> running `git update-index --skip-worktree $PATH` (and removing the
> corresponding local file) would make the testcases more interesting.
>

Get it.

> > > I think I'm going to stop reviewing here.  I'm probably just going to
> > > keep repeating the same issues I identified earlier if I continue.
> >
> > Thank you very much for your review, you have pointed out very many
> > problems with this patch :)
>
> I'm glad it was helpful.  :)

Thanks.

ZheNing Hu

  reply	other threads:[~2022-11-14  9:09 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-31  4:11 [PATCH] [RFC] diff: introduce scope option ZheNing Hu via GitGitGadget
2022-11-01  1:34 ` Taylor Blau
2022-11-01  2:13   ` ZheNing Hu
2022-11-01  5:18 ` Elijah Newren
2022-11-06  2:11   ` ZheNing Hu
2022-11-06  6:58     ` Elijah Newren
2022-11-14  9:08       ` ZheNing Hu [this message]
2022-11-25  2:45 ` [PATCH v2] [RFC] diff: introduce --scope option ZheNing Hu via GitGitGadget
2022-11-29 12:00   ` [PATCH v3 0/2] [RFC] diff: introduce scope option ZheNing Hu via GitGitGadget
2022-11-29 12:00     ` [PATCH v3 1/2] [RFC] diff: introduce --scope option ZheNing Hu via GitGitGadget
2022-11-29 12:00     ` [PATCH v3 2/2] [RPC] grep: " ZheNing Hu via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOLTT8R1K7L5p3+FkUAKMOA5+KcQU3YM8Pq2Oe4u24zi+t9Nrg@mail.gmail.com \
    --to=adlternative@gmail.com \
    --cc=avarab@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=emilyshaffer@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=johannes.schindelin@gmx.de \
    --cc=matheus.bernardino@usp.br \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    --cc=shaoxuan.yuan02@gmail.com \
    --cc=vdye@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).