git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <derrickstolee@github.com>
To: Elijah Newren <newren@gmail.com>
Cc: Elijah Newren via GitGitGadget <gitgitgadget@gmail.com>,
	Git Mailing List <git@vger.kernel.org>,
	Victoria Dye <vdye@github.com>,
	Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>,
	Matheus Tavares <matheus.bernardino@usp.br>,
	ZheNing Hu <adlternative@gmail.com>
Subject: Re: [PATCH] sparse-checkout.txt: new document with sparse-checkout directions
Date: Thu, 6 Oct 2022 14:27:35 -0400	[thread overview]
Message-ID: <66eaae96-7b6a-ca87-fee5-e185a560744a@github.com> (raw)
In-Reply-To: <CABPp-BGoJqtx_=p+GfqAhgs+4Zic1mcbs6pkMKy7QAnxTwB4AA@mail.gmail.com>

On 10/6/22 3:10 AM, Elijah Newren wrote:
> On Wed, Sep 28, 2022 at 6:22 AM Derrick Stolee <derrickstolee@github.com> wrote:
>>
>> On 9/28/22 1:38 AM, Elijah Newren wrote:
>>> On Tue, Sep 27, 2022 at 9:36 AM Derrick Stolee <derrickstolee@github.com> wrote:
>>>>
>>>> On 9/24/2022 8:09 PM, Elijah Newren via GitGitGadget wrote:
>>>>> From: Elijah Newren <newren@gmail.com>
>>>>
> [...]
>>>>> +  * Commands whose default for --restrict vs. --no-restrict should vary depending
>>>>> +    on Behavior A or Behavior B
>>>>> +    * diff (with --cached or REVISION arguments)
>>>>> +    * grep (with --cached or REVISION arguments)
>>>>> +    * show (when given commit arguments)
>>>>> +    * bisect
>>>>> +    * blame
>>>>> +      * and annotate
>>>>> +    * log
>>>>> +      * and variants: shortlog, gitk, show-branch, whatchanged
>>>>> +
>>>>> +    For now, we default to behavior B for these, which want a default of
>>>>> +    --no-restrict.
>>>>
>>>> I do feel pretty strongly that we'll want a --no-restrict default here
>>>> because otherwise we will present confusion. I'm not even sure if we would
>>>> want to make this available via a config setting, but likely a config
>>>> setting makes sense in the long term.
>>>
>>> You've got me slightly confused.  You did say the same thing a long time ago:
>>>
>>>     "But I also want to avoid doing this as a default or even behind a
>>> config setting."[A]
>>>
>>> BUT, when Shaoxuan proposed making --restrict/--focus the default for
>>> one of these commands, you seemed to be on board[B].
>>
>> I'm specifically talking about 'git log'. I think that having that be
>> in a restricted mode is extremely dangerous and will only confuse users.
>> This includes 'git show' (with commit arguments) and 'git bisect', I
>> think.
> 
> Thanks, that helps me understand your position better.
> 
> I'm curious if, due to the length of the document and this thread,
> you're just skimming past the idea I mentioned of showing a warning at
> the beginning of `diff`, `log`, or `show` output when restricting
> based on config or defaults.  Without such a warning, I agree that
> restricting might be confusing at times, but I think such a warning
> may be sufficient to address the concerns around partial/incomplete
> results.  The one command that this warning idea doesn't help with is
> `grep` since it cannot safely be applied there, which potentially
> leaves `grep` giving confusing results when users pass either
> `--cached` or revisions, but you seem to not be concerned about that.

I'm not convinced that warnings are enough for some cases, especially
for output that is fed to a pager. Do the warnings stick around in
the pager? I'm not sure.

> I'm also curious if the problem partially stems from the fact that
> with `git log` there is no way to control revision limiting and diff
> generation paths independently.  If there was a way to make `git log
> -p` continue showing the regular list of commits but restrict which
> paths were shown in the diffs, and we made the --scope-sparse handling
> do this so that only diffs were limited but not the revisions
> traversed/printed, would that help address your concerns?

My biggest issue is with the idea of simplifying the commit history
based on the sparse-checkout path definitions. The '-p' option having
a diff scoped to the sparse-checkout paths would be fine.

>> The rest, (diff, grep, blame) are worktree-focused, so having a restrict
>> mode by default makes sense to me.
> 
> I was specifically calling out diff & grep when passed revision
> arguments, which are definitely *not* worktree-focused operations.

You're right. I'm not using the right terminology. They _are_
operations on a single tree, where path scopes make sense.

> Also, blame incorporates a component of changes from the worktree, but
> it's mostly about history (and one or more -C's make it check other
> paths as well).

Since each input is a specific file path, I'm not sure we need
anything here except perhaps a warning that they are requesting
a file outside the sparse-checkout definition (if even that).

>>> Anyway...I will note that without a configurable option to give these
>>> commands a behavior of `--restrict`, I think you make working in
>>> disconnected partial clones practically impossible.  I want to be able
>>> to do "git log -p", "git diff REV1 REV2", and "git grep TERM REV" in
>>> disconnected partial clones, and I've wanted that kind of capability
>>> for well over a decade[H].  So, don't be surprised if I keep bringing
>>> up a config option of some sort for these commands.  :-)
>>
>> Now, if we're talking about "don't download extra objects" as a goal,
>> then we're thinking about things not just related to sparse-checkout
>> but even history within the sparse-checkout. Even if we make the
>> 'backfill' command something that users could run, there isn't a
>> guarantee that users will want to have even that much data downloaded.
>> We would need a way to say "yes, I ran 'git blame' on this path in my
>> sparse-checkout, but please don't just fail if you can't get new objects,
>> instead inform me that the results are incomplete."
>>
>> I think the sparse-checkout boundary is a good way to minimize the
>> number of objects downloaded by these commands, but to actually
>> remove the need for downloads at all we need a way to gracefully
>> return partial results.
> 
> There may be some merits to a partial clone with shallow blob history,
> but I've never really been all that interested in it. ......
> But you've got me curious.  You seem to be suggesting that partial
> results are okay if the user is informed.  I have suggested making
> diff-with-revisions, log -p, etc. show a warning that results may be
> incomplete when restricting them to the sparse checkout based on
> config.  So, aren't you suggesting that my proposal is safe after all?

I think the following things are true:

1. It's really important to keep the current partial clone default of
   only downloading blobs on-demand. Even with a limited sparse-checkout,
   it's rare that users will need every version of every file in that
   sparse-checkout, and they may not want that tax on their local storage.

2. Adding an opt-in backfill for a sparse-checkout definition will
   prevent most on-demand downloads (although it might want to be
   integrated into 'git fetch' behind an option to be really sure that
   state continues in the future).

3. Updating Git features to scope down to sparse-checkout will prevent
   many of the remaining on-demand downloads.

4. To be _absolutely sure_ that on-demand downloads don't happen, we
   need an extra mode for Git and new ways of reporting partial results.
   Without this mode, Git commands fail when triggering an on-demand
   download and the network is unavailable.

So, I'm saying that (4) is a direction that we could go. It also seems
extremely difficult to do, so we should do (2) & (3) first, which will
get us 99% of the way there.

Thanks,
-Stolee

  reply	other threads:[~2022-10-06 18:27 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-25  0:09 [PATCH] sparse-checkout.txt: new document with sparse-checkout directions Elijah Newren via GitGitGadget
2022-09-26 17:20 ` Junio C Hamano
2022-09-26 17:38 ` Junio C Hamano
2022-09-27  3:05   ` Elijah Newren
2022-09-27  4:30     ` Junio C Hamano
2022-09-26 20:08 ` Victoria Dye
2022-09-26 22:36   ` Junio C Hamano
2022-09-27  7:30     ` Elijah Newren
2022-09-27 16:07       ` Junio C Hamano
2022-09-28  6:13         ` Elijah Newren
2022-09-27  6:09   ` Elijah Newren
2022-09-27 16:42   ` Derrick Stolee
2022-09-28  5:42     ` Elijah Newren
2022-09-27 15:43 ` Junio C Hamano
2022-09-28  7:49   ` Elijah Newren
2022-09-27 16:36 ` Derrick Stolee
2022-09-28  5:38   ` Elijah Newren
2022-09-28 13:22     ` Derrick Stolee
2022-10-06  7:10       ` Elijah Newren
2022-10-06 18:27         ` Derrick Stolee [this message]
2022-10-07  2:56           ` Elijah Newren
2022-09-30  9:54     ` ZheNing Hu
2022-10-06  7:53       ` Elijah Newren
2022-10-15  2:17         ` ZheNing Hu
2022-10-15  4:37           ` Elijah Newren
2022-10-15 14:49             ` ZheNing Hu
2022-09-30  9:09   ` ZheNing Hu
2022-09-28  8:32 ` [PATCH v2] " Elijah Newren via GitGitGadget
2022-10-08 22:52   ` [PATCH v3] " Elijah Newren via GitGitGadget
2022-11-06  6:04     ` [PATCH v4] " Elijah Newren via GitGitGadget
2022-11-07 20:44       ` Derrick Stolee
2022-11-16  4:39         ` Elijah Newren
2022-11-15  4:03       ` ZheNing Hu
2022-11-16  3:18         ` ZheNing Hu
2022-11-16  6:51           ` Elijah Newren
2022-11-16  5:49         ` Elijah Newren
2022-11-16 10:04           ` ZheNing Hu
2022-11-16 10:10             ` ZheNing Hu
2022-11-16 14:33               ` ZheNing Hu
2022-11-19  2:36                 ` Elijah Newren
2022-11-19  2:15             ` Elijah Newren
2022-11-23  9:08               ` ZheNing Hu
2023-01-14 10:18           ` ZheNing Hu
2023-01-20  4:30             ` Elijah Newren
2023-01-23 15:05               ` ZheNing Hu
2023-01-24  3:17                 ` Elijah Newren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=66eaae96-7b6a-ca87-fee5-e185a560744a@github.com \
    --to=derrickstolee@github.com \
    --cc=adlternative@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=matheus.bernardino@usp.br \
    --cc=newren@gmail.com \
    --cc=shaoxuan.yuan02@gmail.com \
    --cc=vdye@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).