git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, newren@gmaill.com, peff@peff.net,
	me@ttaylorr.com, jrnieder@gmail.com,
	Derrick Stolee <dstolee@microsoft.com>
Subject: Re: [PATCH 04/10] sparse-checkout: allow in-tree definitions
Date: Thu, 07 May 2020 15:58:06 -0700	[thread overview]
Message-ID: <xmqqeerv2w01.fsf@gitster.c.googlers.com> (raw)
In-Reply-To: <2188577cd848d7cee77f06f1ad2b181864e5e36d.1588857462.git.gitgitgadget@gmail.com> (Derrick Stolee via GitGitGadget's message of "Thu, 07 May 2020 13:17:36 +0000")

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> One of the difficulties of using the sparse-checkout feature is not
> knowing which directories are absolutely needed for working in a portion
> of the repository. Some of this can be documented in README files or
> included in a bootstrapping tool along with the repository. This is done
> in an ad-hoc way by every project that wants to use it.
>
> Let's make this process easier for users by creating a way to define a
> useful sparse-checkout definition inside the Git tree data. This has
> several benefits. In particular, the data is available to anyone who has
> a copy of the repository without needing a different data source.
> Second, the needs of the repository can change over time and Git can
> present a way to automatically update the working directory as these
> sparse-checkout definitions change over time.

And two lines of development can merge them together?

Any time a new "feature" pops up that would eventually affect how
"git clone" and "git checkout" work based on untrusted user data, we
need to make sure there is no negative security implications.  

If it only boils down to "we have files that can record list of
leading directory names and without offering extra 'flexibility'", I
guess there aren't all that much that a malicious sparse definition
can do and we would be safe, though.

> To use this feature, add the "--in-tree" option when setting or adding
> directories to the sparse-checkout definition. For example:
>
>   $ git sparse-checkout set --in-tree .sparse/base
>   $ git sparse-checkout add --in-tree .sparse/extra
>
> These commands add values to the multi-valued config setting
> "sparse.inTree". When updating the sparse-checkout definition, these
> values describe paths in the repository to find the sparse-checkout
> data. After the commands listed earlier, we expect to see the following
> in .git/config.worktree:
>
> 	[sparse]
> 		intree = .sparse/base
> 		intree = .sparse/extra

What does this say in human words?  "These two tracked files specify
which paths should be in the working tree"?  Spelling it out here
would help readers of this commit.

> When applying the sparse-checkout definitions from this config, the
> blobs at HEAD:.sparse/base and HEAD:.sparse/extra are loaded. 

OK, so end-user edit to the working tree copy or what is added to
the index does not count and only the committed version gets used.

That makes it simple---I was wondering how we would operate when
merging a branch with different contents in the .sparse/* files
until the conflicts are resolved.

> In those
> files, the multi-valued config values "sparse.dir" are considered as
> the directories to construct a cone mode sparse-checkout file. The end
> result is as if these paths were provided to "git sparse-checkout set"
> in cone mode.

OK.

> For example, suppose .sparse/base had the following content:
>
> 	[sparse]
> 		dir = A
> 		dir = B/C
> 		dir = D/E/F
>
> and .sparse/extra had the following content:
>
> 	[sparse]
> 		dir = D
> 		dir = X
>
> Then, the output of "git sparse-checkout list" would be
>
> 	A
> 	B/C
> 	D
> 	X
>
> Note that since "D" contains "D/E/F", that directory replaces the
> position of "D/E/F" in the list.
>
> Since these are parsed using the config library, the parser is robust
> enough to understand comments and complicated string values.
>
> The key benefit to this approach is that it can be extended by defining
> new config values. In a later change, we will introduce "sparse.inherit"
> to point to another file in the tree. This will solve the problem of
> editing many files when core dependencies change.

With only a multi-valued sparse.dir elements allowed in these
in-tree .sparse/* files, I guess there isn't much mischeaf a
malicious .sparse/* file can do.  Can it try to [includeIf] some
paths external to the repository to cause a remote attacker to learn
about the paths on the local system, perhaps? 

> diff --git a/Documentation/config/sparse.txt b/Documentation/config/sparse.txt
> new file mode 100644
> index 00000000000..c1fce87cd33
> --- /dev/null
> +++ b/Documentation/config/sparse.txt
> @@ -0,0 +1,15 @@
> +sparse.inTree::
> +	The `core.sparseCheckout` config option enables the `sparse-checkout`
> +	feature, but if there are any values for the multi-valued
> +	`sparse.inTree` config option, then the sparse-checkout patterns are
> +	defined by parsing the files listed in these values. See
> +	linkgit:git-sparse-checkout[1] for more information.

Does "... but if ... then" mean "by parsing the files and ignoring
all other things that may otherwise define patterns"?

> +sparse.dir::
> +	This config setting is ignored if present in the repository config.
> +	Instead, this multi-valued option is present in the files listed by
> +	`sparse.inTree` and specifies the directories needed in the
> +	working directory. The union of all `sparse.dir` values across all
> +	`sparse.inTree` files forms the input for `git sparse-checkout set`
> +	in cone mode.  See linkgit:git-sparse-checkout[1] for more
> +	information.

If this is *not* a config in the usual sense, we probably should not
include it in this document and in the "git config --help" output.
That will allow us to drop the first sentence.

Those .sparse/* in-tree files are like .gitmodules in the sense that
they happen to use the same syntax so that the parser can be shared,
but they are not allowed to affect end-user configuration
(e.g. writing "[diff] external=rm -fr ." in the file has no effect)
at all, right?

And we should describe what we can write in these in-tree files
separately and make it clear that they are _different_ from the
configuration variables.


  reply	other threads:[~2020-05-07 22:58 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-07 13:17 [PATCH 00/10] [RFC] In-tree sparse-checkout definitions Derrick Stolee via GitGitGadget
2020-05-07 13:17 ` [PATCH 01/10] unpack-trees: avoid array out-of-bounds error Derrick Stolee via GitGitGadget
2020-05-07 22:27   ` Junio C Hamano
2020-05-08 12:19     ` Derrick Stolee
2020-05-08 15:09       ` Junio C Hamano
2020-05-20 16:32     ` Elijah Newren
2020-05-07 13:17 ` [PATCH 02/10] sparse-checkout: move code from builtin Derrick Stolee via GitGitGadget
2020-05-07 13:17 ` [PATCH 03/10] sparse-checkout: move code from unpack-trees.c Derrick Stolee via GitGitGadget
2020-05-07 13:17 ` [PATCH 04/10] sparse-checkout: allow in-tree definitions Derrick Stolee via GitGitGadget
2020-05-07 22:58   ` Junio C Hamano [this message]
2020-05-08 15:40     ` Derrick Stolee
2020-05-20 17:52       ` Elijah Newren
2020-06-17 23:07         ` Elijah Newren
2020-06-18  8:18           ` Son Luong Ngoc
2020-05-07 13:17 ` [PATCH 05/10] sparse-checkout: automatically update in-tree definition Derrick Stolee via GitGitGadget
2020-05-20 16:28   ` Elijah Newren
2020-05-07 13:17 ` [PATCH 06/10] sparse-checkout: use oidset to prevent repeat blobs Derrick Stolee via GitGitGadget
2020-05-20 16:40   ` Elijah Newren
2020-05-21  3:49     ` Elijah Newren
2020-05-21 17:54       ` Derrick Stolee
2020-05-07 13:17 ` [PATCH 07/10] sparse-checkout: define in-tree dependencies Derrick Stolee via GitGitGadget
2020-05-20 18:10   ` Elijah Newren
2020-05-30 17:26     ` Elijah Newren
2020-05-07 13:17 ` [PATCH 08/10] Makefile: skip git-gui if dir is missing Derrick Stolee via GitGitGadget
2020-05-07 13:17 ` [PATCH 09/10] Makefile: disable GETTEXT when 'po' " Derrick Stolee via GitGitGadget
2020-05-07 13:17 ` [PATCH 10/10] .sparse: add in-tree sparse-checkout for Git Derrick Stolee via GitGitGadget
2020-05-20 17:38 ` [PATCH 00/10] [RFC] In-tree sparse-checkout definitions Elijah Newren
2020-06-17 23:14 ` Elijah Newren
2020-06-18  1:42   ` Derrick Stolee
2020-06-18  1:59     ` Elijah Newren
2020-06-18  3:01       ` Derrick Stolee
2020-06-18  5:03         ` Elijah Newren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqeerv2w01.fsf@gitster.c.googlers.com \
    --to=gitster@pobox.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=jrnieder@gmail.com \
    --cc=me@ttaylorr.com \
    --cc=newren@gmaill.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).