git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <derrickstolee@github.com>
To: Elijah Newren <newren@gmail.com>
Cc: Elijah Newren via GitGitGadget <gitgitgadget@gmail.com>,
	Git Mailing List <git@vger.kernel.org>,
	Victoria Dye <vdye@github.com>, Derrick Stolee <stolee@gmail.com>,
	Lessley Dennington <lessleydennington@gmail.com>
Subject: Re: [PATCH 4/7] sparse-checkout: error or warn when given individual files
Date: Tue, 15 Feb 2022 10:03:06 -0500	[thread overview]
Message-ID: <2c3b4204-5560-7031-c05e-3b709c85d753@github.com> (raw)
In-Reply-To: <CABPp-BE8aG3R4ASngqwyvRemp5WW=O0UWWSTesJ2hoch_av_kQ@mail.gmail.com>

On 2/14/2022 11:17 PM, Elijah Newren wrote:
> On Mon, Feb 14, 2022 at 7:56 AM Derrick Stolee <derrickstolee@github.com> wrote:
>>
>> On 2/12/2022 7:39 PM, Elijah Newren via GitGitGadget wrote:
>>> From: Elijah Newren <newren@gmail.com>
>>>
>>> The set and add subcommands accept multiple positional arguments.
>>> The meaning of these arguments differs slightly in the two modes:
>>>
>>> Cone mode only accepts directories.  If given a file, it would
>>> previously treat it as a directory, causing not just the file itself to
>>> be included but all sibling files as well -- likely against users'
>>> expectations.  Throw an error if the specified path is a file in the
>>> index.  Provide a --skip-checks argument to allow users to override
>>> (e.g. for the case when the given path IS a directory on another
>>> branch).
>>
>> I agree that this is likely to be an improvement for users. The
>> sparse-checkout builtin isn't integrated with the sparse index
>> yet. At least not integrated upstream: we have commits in microsoft/git
>> that we plan to send when other things in flight are merged. This
>> change likely introduces a new opportunity for the index to expand,
>> so I will keep that in mind when upstreaming.
> 
> Actually, I thought about that during development, and my presumption
> was that we would not expand the index.  We've survived a few years
> without reporting any argument errors to the user and folks seem to
> usually get things right, so while I think it adds value to report on
> likely errors, I don't think it's important for us to catch and warn
> on every potential misuse.  I think the probable errors are the ones
> where they specify a <file> that exists in both the working tree and
> index.  The remaining ones are less probable, and also possibly quite
> expensive to catch.  I'm not sure it's worth the cost to try to report
> those.

Since we start with only the files at root, if a user specifies a
directory at least two levels deep, then we don't have enough information
in the index to find that directory without parsing trees. Doing a simple
index_name_pos() call would expand the sparse index to a full one.

The checks of this kind typically follow this pattern:

	int pos = index_name_pos(index, name, strlen(name));
	if (pos >= 0)
		/* we have 'name' in the index! */
	else
		/* we don't have 'name' in the index! */

(Of course, there is some special logic for directories to see if they
exist, since 'pos' is a position within the list of files, unless it
points to a sparse directory entry.)

A different method that checks only for existence could find a sparse
directory and then search trees for the contained path. Returning a
boolean answer provides a mechanism for not expanding the index.

But we are getting sidetracked. I'll worry about this later. The
important thing is that we can solve this problem, so it shouldn't block
your feature.

>>> Non-cone mode accepts general gitignore patterns.  However, it has an
>>> O(N*M) performance baked into its design, where all N index files must
>>> be matched against all M sparse-checkout patterns with EVERY call to
>>> unpack_trees() that updates the working tree.  As such, it is important
>>> to keep the number of patterns small, and thus we should warn users to
>>> prefer passing directories and more generic glob patterns to get the
>>> paths they want instead of listing each individual file.  (The
>>> --skip-checks argument can also be used to bypass this warning.)  Also,
>>> even when users do want to specify individual files, they will often
>>> want to do so by providing a leading '/' (to avoid selecting the same
>>> filename in all subdirectories), in which case this error message would
>>> never trigger anyway.
>>
>> I think the case of "I want only one file from this directory" and "I
>> want files with the given name pattern" are the main reason to still
>> use non-cone-mode. Users with this need usually have a directory full
>> of large files, and they choose which of those large files they need
>> using sparse-checkout. The repository reorganization required to use
>> cone mode for this use is perhaps too great (or they haven't thought
>> about doing it). For this reason, I would prefer that we do not do
>> these checks when not in cone mode.
> 
> If they "only want one file from this directory", isn't the correct
> way to specify that by mentioning the path with a leading slash?
> Otherwise, they'd potentially grab files with similar names from many
> directories, right?  So, even in that usecase, we should still error
> out if they specify a <filename> rather than /<filename>.  Perhaps my
> reasoning should lead with that and I should fix up the warning
> message a bit, but I still think we should probably give a warning
> even for those who are explicitly wanting the usecase you mention.

Checking for the leading slash can be a big help to make sure users
are getting the matches they expect.

> Also, note this is a warning and not an error -- and the warning can
> be suppressed with --skip-checks.

I want to avoid surprising users with new warnings for things they
have been doing for years, especially if these tasks run as part of
a script that checks stderr output as a failure condition.

>>> +test_expect_success 'by default, cone mode will error out when passed files' '
>>> +     git -C repo sparse-checkout reapply --cone &&
>>> +     test_must_fail git -C repo sparse-checkout add .gitignore 2>error &&
>>> +
>>> +     grep ".gitignore.*is not a directory" error
>>> +'
>>> +
>>> +test_expect_success 'by default, non-cone mode will warn on individual files' '
>>> +     git -C repo sparse-checkout reapply --no-cone &&
>>> +     git -C repo sparse-checkout add .gitignore 2>warning &&
>>> +
>>> +     grep "passing directories or less specific patterns is recommended" warning
>>> +'
>>
>> So I would expect this second test to have
>>
>>         test_must_be_empty warning
>>
>> to show that no warning occurs when specifying a file in non-cone-mode.
> 
> or perhaps
> 
> grep "please specify a leading slash to select a single file" warning
> 
> ?

Right. Whatever matches the behavior we land on.

Thanks,
-Stolee

  reply	other threads:[~2022-02-15 15:04 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-13  0:39 [PATCH 0/7] RFC: sparse checkout: make --cone mode the default, and check add/set argument validity Elijah Newren via GitGitGadget
2022-02-13  0:39 ` [PATCH 1/7] sparse-checkout: correct reapply's handling of options Elijah Newren via GitGitGadget
2022-02-13  0:39 ` [PATCH 2/7] sparse-checkout: correctly set non-cone mode when expected Elijah Newren via GitGitGadget
2022-02-14 15:44   ` Derrick Stolee
2022-02-15  3:18     ` Elijah Newren
2022-02-13  0:39 ` [PATCH 3/7] sparse-checkout: pay attention to prefix for {set, add} Elijah Newren via GitGitGadget
2022-02-14 15:49   ` Derrick Stolee
2022-02-15  3:52     ` Elijah Newren
2022-02-15 14:53       ` Derrick Stolee
2022-02-13  0:39 ` [PATCH 4/7] sparse-checkout: error or warn when given individual files Elijah Newren via GitGitGadget
2022-02-14 15:56   ` Derrick Stolee
2022-02-15  4:17     ` Elijah Newren
2022-02-15 15:03       ` Derrick Stolee [this message]
2022-02-13  0:39 ` [PATCH 5/7] sparse-checkout: reject non-cone-mode patterns starting with a '#' Elijah Newren via GitGitGadget
2022-02-14 17:59   ` Junio C Hamano
2022-02-15  4:31     ` Elijah Newren
2022-02-16  1:07       ` Junio C Hamano
2022-02-16  2:23         ` Elijah Newren
2022-02-16  3:05           ` Junio C Hamano
2022-02-13  0:39 ` [PATCH 6/7] sparse-checkout: reject arguments in cone-mode that look like patterns Elijah Newren via GitGitGadget
2022-02-13  0:39 ` [PATCH 7/7] sparse-checkout: make --cone the default and deprecate --no-cone Elijah Newren via GitGitGadget
2022-02-14 16:14   ` Derrick Stolee
2022-02-15  5:01     ` Elijah Newren
2022-02-14 16:19 ` [PATCH 0/7] RFC: sparse checkout: make --cone mode the default, and check add/set argument validity Derrick Stolee
2022-02-15  5:12   ` Elijah Newren
2022-02-15 15:12     ` Derrick Stolee
2022-02-15  8:32 ` [PATCH v2 0/6] sparse checkout: fix a few bugs and check argument validity for set/add Elijah Newren via GitGitGadget
2022-02-15  8:32   ` [PATCH v2 1/6] sparse-checkout: correct reapply's handling of options Elijah Newren via GitGitGadget
2022-02-15  8:32   ` [PATCH v2 2/6] sparse-checkout: correctly set non-cone mode when expected Elijah Newren via GitGitGadget
2022-02-15  8:32   ` [PATCH v2 3/6] sparse-checkout: pay attention to prefix for {set, add} Elijah Newren via GitGitGadget
2022-02-17  9:04     ` Ævar Arnfjörð Bjarmason
2022-02-18  6:04       ` Elijah Newren
2022-02-15  8:32   ` [PATCH v2 4/6] sparse-checkout: error or warn when given individual files Elijah Newren via GitGitGadget
2022-02-17  9:05     ` Ævar Arnfjörð Bjarmason
2022-02-15  8:32   ` [PATCH v2 5/6] sparse-checkout: reject non-cone-mode patterns starting with a '#' Elijah Newren via GitGitGadget
2022-02-15  8:32   ` [PATCH v2 6/6] sparse-checkout: reject arguments in cone-mode that look like patterns Elijah Newren via GitGitGadget
2022-02-15 15:15   ` [PATCH v2 0/6] sparse checkout: fix a few bugs and check argument validity for set/add Derrick Stolee
2022-02-16  4:21   ` [PATCH v3 0/5] " Elijah Newren via GitGitGadget
2022-02-16  4:21     ` [PATCH v3 1/5] sparse-checkout: correct reapply's handling of options Elijah Newren via GitGitGadget
2022-02-16  4:21     ` [PATCH v3 2/5] sparse-checkout: correctly set non-cone mode when expected Elijah Newren via GitGitGadget
2022-02-16  4:21     ` [PATCH v3 3/5] sparse-checkout: pay attention to prefix for {set, add} Elijah Newren via GitGitGadget
2022-02-16  4:21     ` [PATCH v3 4/5] sparse-checkout: error or warn when given individual files Elijah Newren via GitGitGadget
2022-02-16  4:21     ` [PATCH v3 5/5] sparse-checkout: reject arguments in cone-mode that look like patterns Elijah Newren via GitGitGadget
2022-02-16  9:53       ` Ævar Arnfjörð Bjarmason
2022-02-16 16:54         ` Elijah Newren
2022-02-16 17:20           ` Victoria Dye
2022-02-16 18:49             ` Junio C Hamano
2022-02-17  1:46               ` Elijah Newren
2022-02-17 17:34                 ` Junio C Hamano
2022-02-17  1:43             ` Elijah Newren
2022-02-17  2:26           ` Elijah Newren
2022-02-16  7:19     ` [PATCH v3 0/5] sparse checkout: fix a few bugs and check argument validity for set/add Junio C Hamano
2022-02-17  6:54     ` [PATCH v4 " Elijah Newren via GitGitGadget
2022-02-17  6:54       ` [PATCH v4 1/5] sparse-checkout: correct reapply's handling of options Elijah Newren via GitGitGadget
2022-02-17  6:54       ` [PATCH v4 2/5] sparse-checkout: correctly set non-cone mode when expected Elijah Newren via GitGitGadget
2022-02-17  6:54       ` [PATCH v4 3/5] sparse-checkout: pay attention to prefix for {set, add} Elijah Newren via GitGitGadget
2022-02-17 17:53         ` Junio C Hamano
2022-02-17  6:54       ` [PATCH v4 4/5] sparse-checkout: error or warn when given individual files Elijah Newren via GitGitGadget
2022-02-17 18:07         ` Junio C Hamano
2022-02-18  6:11           ` Elijah Newren
2022-02-17  6:54       ` [PATCH v4 5/5] sparse-checkout: reject arguments in cone-mode that look like patterns Elijah Newren via GitGitGadget
2022-02-17  9:13       ` [PATCH v4 0/5] sparse checkout: fix a few bugs and check argument validity for set/add Ævar Arnfjörð Bjarmason
2022-02-19 16:44       ` [PATCH v5 " Elijah Newren via GitGitGadget
2022-02-19 16:44         ` [PATCH v5 1/5] sparse-checkout: correct reapply's handling of options Elijah Newren via GitGitGadget
2022-02-19 16:44         ` [PATCH v5 2/5] sparse-checkout: correctly set non-cone mode when expected Elijah Newren via GitGitGadget
2022-02-19 16:44         ` [PATCH v5 3/5] sparse-checkout: pay attention to prefix for {set, add} Elijah Newren via GitGitGadget
2022-02-19 16:44         ` [PATCH v5 4/5] sparse-checkout: error or warn when given individual files Elijah Newren via GitGitGadget
2022-02-19 16:44         ` [PATCH v5 5/5] sparse-checkout: reject arguments in cone-mode that look like patterns Elijah Newren via GitGitGadget
2022-02-20 19:44         ` [PATCH v5 0/5] sparse checkout: fix a few bugs and check argument validity for set/add Derrick Stolee
2022-02-20 20:13           ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2c3b4204-5560-7031-c05e-3b709c85d753@github.com \
    --to=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=lessleydennington@gmail.com \
    --cc=newren@gmail.com \
    --cc=stolee@gmail.com \
    --cc=vdye@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).