git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/7] forbidding symlinked .gitattributes and .gitignore
@ 2020-10-05  7:17 Jeff King
  2020-10-05  7:19 ` [PATCH 1/7] fsck_tree(): fix shadowed variable Jeff King
                   ` (8 more replies)
  0 siblings, 9 replies; 55+ messages in thread
From: Jeff King @ 2020-10-05  7:17 UTC (permalink / raw)
  To: git; +Cc: Jonathan Nieder

About 2 years ago as part of a security release we made it illegal to
have a symlinked .gitmodules file (refusing it both in the index and via
fsck). At the time we discussed (on the security list) outlawing
symlinks for other .git files in the same way, but we decided not to do
so as part of the security release, as it wasn't strictly necessary.

We publicly revisited the topic in:

  https://lore.kernel.org/git/20190114230902.GG162110@google.com/

but there were a few fixes needed, and it got forgotten. So here it is
again, with those fixes:

  [1/7]: fsck_tree(): fix shadowed variable
  [2/7]: fsck_tree(): wrap some long lines

    These first two are actually an unrelated fix and cleanup in the
    nearby code. Could be picked up independently.

  [3/7]: t7415: rename to expand scope
  [4/7]: t7450: test verify_path() handling of gitmodules

    Preparatory test cleanup and improvement for existing features.

  [5/7]: t0060: test obscured .gitattributes and .gitignore matching
  [6/7]: verify_path(): disallow symlinks in .gitattributes and .gitignore
  [7/7]: fsck: complain when .gitattributes or .gitignore is a symlink

    The actual feature, covering the index and fsck.

 fsck.c                                        | 79 ++++++++++++++-----
 read-cache.c                                  | 12 ++-
 t/helper/test-path-utils.c                    | 41 +++++++---
 t/t0060-path-utils.sh                         | 20 +++++
 ...odule-names.sh => t7450-bad-meta-files.sh} | 69 ++++++++++++++--
 5 files changed, 179 insertions(+), 42 deletions(-)
 rename t/{t7415-submodule-names.sh => t7450-bad-meta-files.sh} (77%)

-Peff

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 1/7] fsck_tree(): fix shadowed variable
  2020-10-05  7:17 [PATCH 0/7] forbidding symlinked .gitattributes and .gitignore Jeff King
@ 2020-10-05  7:19 ` Jeff King
  2020-10-05  7:44   ` Jonathan Nieder
  2020-10-05  7:19 ` [PATCH 2/7] fsck_tree(): wrap some long lines Jeff King
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 55+ messages in thread
From: Jeff King @ 2020-10-05  7:19 UTC (permalink / raw)
  To: git; +Cc: Jonathan Nieder

Commit b2f2039c2b (fsck: accept an oid instead of a "struct tree" for
fsck_tree(), 2019-10-18) introduced a new "oid" parameter to
fsck_tree(), and we pass it to the report() function when we find
problems. However, that is shadowed within the tree-walking loop by the
existing "oid" variable which we use to store the oid of each tree
entry. As a result, we may report the wrong oid for some problems we
detect within the loop (the entry oid, instead of the tree oid).

Our tests didn't catch this because they checked only that we found the
expected fsck problem, not that it was attached to the correct object.

Let's rename both variables in the function to avoid confusion. This
makes the diff a little noisy (e.g., all of the report() calls outside
the loop wee already correct but need touched), but makes sure we catch
all cases and will avoid similar confusion in the future.

Signed-off-by: Jeff King <peff@peff.net>
---
 fsck.c                     | 40 +++++++++++++++++++-------------------
 t/t7415-submodule-names.sh |  5 +++--
 2 files changed, 23 insertions(+), 22 deletions(-)

diff --git a/fsck.c b/fsck.c
index f82e2fe9e3..46a108839f 100644
--- a/fsck.c
+++ b/fsck.c
@@ -633,7 +633,7 @@ static int verify_ordered(unsigned mode1, const char *name1,
 	return c1 < c2 ? 0 : TREE_UNORDERED;
 }
 
-static int fsck_tree(const struct object_id *oid,
+static int fsck_tree(const struct object_id *tree_oid,
 		     const char *buffer, unsigned long size,
 		     struct fsck_options *options)
 {
@@ -654,7 +654,7 @@ static int fsck_tree(const struct object_id *oid,
 	struct name_stack df_dup_candidates = { NULL };
 
 	if (init_tree_desc_gently(&desc, buffer, size)) {
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree");
+		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree");
 		return retval;
 	}
 
@@ -664,11 +664,11 @@ static int fsck_tree(const struct object_id *oid,
 	while (desc.size) {
 		unsigned short mode;
 		const char *name, *backslash;
-		const struct object_id *oid;
+		const struct object_id *entry_oid;
 
-		oid = tree_entry_extract(&desc, &name, &mode);
+		entry_oid = tree_entry_extract(&desc, &name, &mode);
 
-		has_null_sha1 |= is_null_oid(oid);
+		has_null_sha1 |= is_null_oid(entry_oid);
 		has_full_path |= !!strchr(name, '/');
 		has_empty_name |= !*name;
 		has_dot |= !strcmp(name, ".");
@@ -678,10 +678,10 @@ static int fsck_tree(const struct object_id *oid,
 
 		if (is_hfs_dotgitmodules(name) || is_ntfs_dotgitmodules(name)) {
 			if (!S_ISLNK(mode))
-				oidset_insert(&gitmodules_found, oid);
+				oidset_insert(&gitmodules_found, entry_oid);
 			else
 				retval += report(options,
-						 oid, OBJ_TREE,
+						 tree_oid, OBJ_TREE,
 						 FSCK_MSG_GITMODULES_SYMLINK,
 						 ".gitmodules is a symbolic link");
 		}
@@ -692,9 +692,9 @@ static int fsck_tree(const struct object_id *oid,
 				has_dotgit |= is_ntfs_dotgit(backslash);
 				if (is_ntfs_dotgitmodules(backslash)) {
 					if (!S_ISLNK(mode))
-						oidset_insert(&gitmodules_found, oid);
+						oidset_insert(&gitmodules_found, entry_oid);
 					else
-						retval += report(options, oid, OBJ_TREE,
+						retval += report(options, tree_oid, OBJ_TREE,
 								 FSCK_MSG_GITMODULES_SYMLINK,
 								 ".gitmodules is a symbolic link");
 				}
@@ -703,7 +703,7 @@ static int fsck_tree(const struct object_id *oid,
 		}
 
 		if (update_tree_entry_gently(&desc)) {
-			retval += report(options, oid, OBJ_TREE, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree");
+			retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree");
 			break;
 		}
 
@@ -751,25 +751,25 @@ static int fsck_tree(const struct object_id *oid,
 	name_stack_clear(&df_dup_candidates);
 
 	if (has_null_sha1)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_NULL_SHA1, "contains entries pointing to null sha1");
+		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_NULL_SHA1, "contains entries pointing to null sha1");
 	if (has_full_path)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_FULL_PATHNAME, "contains full pathnames");
+		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_FULL_PATHNAME, "contains full pathnames");
 	if (has_empty_name)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_EMPTY_NAME, "contains empty pathname");
+		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_EMPTY_NAME, "contains empty pathname");
 	if (has_dot)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_HAS_DOT, "contains '.'");
+		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_HAS_DOT, "contains '.'");
 	if (has_dotdot)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_HAS_DOTDOT, "contains '..'");
+		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_HAS_DOTDOT, "contains '..'");
 	if (has_dotgit)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_HAS_DOTGIT, "contains '.git'");
+		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_HAS_DOTGIT, "contains '.git'");
 	if (has_zero_pad)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_ZERO_PADDED_FILEMODE, "contains zero-padded file modes");
+		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_ZERO_PADDED_FILEMODE, "contains zero-padded file modes");
 	if (has_bad_modes)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_BAD_FILEMODE, "contains bad file modes");
+		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_BAD_FILEMODE, "contains bad file modes");
 	if (has_dup_entries)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_DUPLICATE_ENTRIES, "contains duplicate file entries");
+		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_DUPLICATE_ENTRIES, "contains duplicate file entries");
 	if (not_properly_sorted)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_TREE_NOT_SORTED, "not properly sorted");
+		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_TREE_NOT_SORTED, "not properly sorted");
 	return retval;
 }
 
diff --git a/t/t7415-submodule-names.sh b/t/t7415-submodule-names.sh
index f70368bc2e..5c95247180 100755
--- a/t/t7415-submodule-names.sh
+++ b/t/t7415-submodule-names.sh
@@ -148,13 +148,14 @@ test_expect_success 'fsck detects symlinked .gitmodules file' '
 		{
 			printf "100644 blob $content\t$tricky\n" &&
 			printf "120000 blob $target\t.gitmodules\n"
-		} | git mktree &&
+		} >bad-tree &&
+		tree=$(git mktree <bad-tree) &&
 
 		# Check not only that we fail, but that it is due to the
 		# symlink detector; this grep string comes from the config
 		# variable name and will not be translated.
 		test_must_fail git fsck 2>output &&
-		test_i18ngrep gitmodulesSymlink output
+		test_i18ngrep "tree $tree: gitmodulesSymlink" output
 	)
 '
 
-- 
2.28.0.1295.g4824feede7


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 2/7] fsck_tree(): wrap some long lines
  2020-10-05  7:17 [PATCH 0/7] forbidding symlinked .gitattributes and .gitignore Jeff King
  2020-10-05  7:19 ` [PATCH 1/7] fsck_tree(): fix shadowed variable Jeff King
@ 2020-10-05  7:19 ` Jeff King
  2020-10-05  7:46   ` Jonathan Nieder
  2020-10-05  7:19 ` [PATCH 3/7] t7415: rename to expand scope Jeff King
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 55+ messages in thread
From: Jeff King @ 2020-10-05  7:19 UTC (permalink / raw)
  To: git; +Cc: Jonathan Nieder

Many calls to report() in fsck_tree() are kept on a single line and are
quite long. Most were pretty big to begin with, but have gotten even
longer over the years as we've added more parameters. Let's accept the
churn of wrapping them in order to conform to our usual line limits.

Signed-off-by: Jeff King <peff@peff.net>
---
Obviously not necessary for the rest of the serious, but this has been
bugging me for years. I'm not sure what made me finally break down and
wrap them.

 fsck.c | 48 ++++++++++++++++++++++++++++++++++++------------
 1 file changed, 36 insertions(+), 12 deletions(-)

diff --git a/fsck.c b/fsck.c
index 46a108839f..024810139b 100644
--- a/fsck.c
+++ b/fsck.c
@@ -654,7 +654,9 @@ static int fsck_tree(const struct object_id *tree_oid,
 	struct name_stack df_dup_candidates = { NULL };
 
 	if (init_tree_desc_gently(&desc, buffer, size)) {
-		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree");
+		retval += report(options, tree_oid, OBJ_TREE,
+				 FSCK_MSG_BAD_TREE,
+				 "cannot be parsed as a tree");
 		return retval;
 	}
 
@@ -703,7 +705,9 @@ static int fsck_tree(const struct object_id *tree_oid,
 		}
 
 		if (update_tree_entry_gently(&desc)) {
-			retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree");
+			retval += report(options, tree_oid, OBJ_TREE,
+					 FSCK_MSG_BAD_TREE,
+					 "cannot be parsed as a tree");
 			break;
 		}
 
@@ -751,25 +755,45 @@ static int fsck_tree(const struct object_id *tree_oid,
 	name_stack_clear(&df_dup_candidates);
 
 	if (has_null_sha1)
-		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_NULL_SHA1, "contains entries pointing to null sha1");
+		retval += report(options, tree_oid, OBJ_TREE,
+				 FSCK_MSG_NULL_SHA1,
+				 "contains entries pointing to null sha1");
 	if (has_full_path)
-		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_FULL_PATHNAME, "contains full pathnames");
+		retval += report(options, tree_oid, OBJ_TREE,
+				 FSCK_MSG_FULL_PATHNAME,
+				 "contains full pathnames");
 	if (has_empty_name)
-		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_EMPTY_NAME, "contains empty pathname");
+		retval += report(options, tree_oid, OBJ_TREE,
+				 FSCK_MSG_EMPTY_NAME,
+				 "contains empty pathname");
 	if (has_dot)
-		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_HAS_DOT, "contains '.'");
+		retval += report(options, tree_oid, OBJ_TREE,
+				 FSCK_MSG_HAS_DOT,
+				 "contains '.'");
 	if (has_dotdot)
-		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_HAS_DOTDOT, "contains '..'");
+		retval += report(options, tree_oid, OBJ_TREE,
+				 FSCK_MSG_HAS_DOTDOT,
+				 "contains '..'");
 	if (has_dotgit)
-		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_HAS_DOTGIT, "contains '.git'");
+		retval += report(options, tree_oid, OBJ_TREE,
+				 FSCK_MSG_HAS_DOTGIT,
+				 "contains '.git'");
 	if (has_zero_pad)
-		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_ZERO_PADDED_FILEMODE, "contains zero-padded file modes");
+		retval += report(options, tree_oid, OBJ_TREE,
+				 FSCK_MSG_ZERO_PADDED_FILEMODE,
+				 "contains zero-padded file modes");
 	if (has_bad_modes)
-		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_BAD_FILEMODE, "contains bad file modes");
+		retval += report(options, tree_oid, OBJ_TREE,
+				 FSCK_MSG_BAD_FILEMODE,
+				 "contains bad file modes");
 	if (has_dup_entries)
-		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_DUPLICATE_ENTRIES, "contains duplicate file entries");
+		retval += report(options, tree_oid, OBJ_TREE,
+				 FSCK_MSG_DUPLICATE_ENTRIES,
+				 "contains duplicate file entries");
 	if (not_properly_sorted)
-		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_TREE_NOT_SORTED, "not properly sorted");
+		retval += report(options, tree_oid, OBJ_TREE,
+				 FSCK_MSG_TREE_NOT_SORTED,
+				 "not properly sorted");
 	return retval;
 }
 
-- 
2.28.0.1295.g4824feede7


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 3/7] t7415: rename to expand scope
  2020-10-05  7:17 [PATCH 0/7] forbidding symlinked .gitattributes and .gitignore Jeff King
  2020-10-05  7:19 ` [PATCH 1/7] fsck_tree(): fix shadowed variable Jeff King
  2020-10-05  7:19 ` [PATCH 2/7] fsck_tree(): wrap some long lines Jeff King
@ 2020-10-05  7:19 ` Jeff King
  2020-10-05  7:50   ` Jonathan Nieder
  2020-10-05  7:20 ` [PATCH 4/7] t7450: test verify_path() handling of gitmodules Jeff King
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 55+ messages in thread
From: Jeff King @ 2020-10-05  7:19 UTC (permalink / raw)
  To: git; +Cc: Jonathan Nieder

This script has already expanded beyond its original intent of ".. in
submodule names" to include other malicious submodule bits. Let's update
the name and description to reflect that, as well as the fact that we'll
soon be adding similar tests for other meta-files (.gitattributes, etc).
We'll also renumber it to move it out of the group of submodule-specific
tests.

Signed-off-by: Jeff King <peff@peff.net>
---
 ...5-submodule-names.sh => t7450-bad-meta-files.sh} | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)
 rename t/{t7415-submodule-names.sh => t7450-bad-meta-files.sh} (95%)

diff --git a/t/t7415-submodule-names.sh b/t/t7450-bad-meta-files.sh
similarity index 95%
rename from t/t7415-submodule-names.sh
rename to t/t7450-bad-meta-files.sh
index 5c95247180..6b703b12bc 100755
--- a/t/t7415-submodule-names.sh
+++ b/t/t7450-bad-meta-files.sh
@@ -1,9 +1,16 @@
 #!/bin/sh
 
-test_description='check handling of .. in submodule names
+test_description='check forbidden or malicious patterns in .git* files
 
-Exercise the name-checking function on a variety of names, and then give a
-real-world setup that confirms we catch this in practice.
+Such as:
+
+  - presence of .. in submodule names;
+    Exercise the name-checking function on a variety of names, and then give a
+    real-world setup that confirms we catch this in practice.
+
+  - nested submodule names
+
+  - symlinked .gitmodules, etc
 '
 . ./test-lib.sh
 . "$TEST_DIRECTORY"/lib-pack.sh
-- 
2.28.0.1295.g4824feede7


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 4/7] t7450: test verify_path() handling of gitmodules
  2020-10-05  7:17 [PATCH 0/7] forbidding symlinked .gitattributes and .gitignore Jeff King
                   ` (2 preceding siblings ...)
  2020-10-05  7:19 ` [PATCH 3/7] t7415: rename to expand scope Jeff King
@ 2020-10-05  7:20 ` Jeff King
  2020-10-05  7:53   ` Jonathan Nieder
  2020-10-05  7:21 ` [PATCH 5/7] t0060: test obscured .gitattributes and .gitignore matching Jeff King
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 55+ messages in thread
From: Jeff King @ 2020-10-05  7:20 UTC (permalink / raw)
  To: git; +Cc: Jonathan Nieder

Commit 10ecfa7649 (verify_path: disallow symlinks in .gitmodules,
2018-05-04) made it impossible to load a symlink .gitmodules file into
the index. However, there are no tests of this behavior. Let's make sure
this case is covered. We can easily reuse the test setup created by
the matching b7b1fca175 (fsck: complain when .gitmodules is a symlink,
2018-05-04).

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t7450-bad-meta-files.sh | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/t/t7450-bad-meta-files.sh b/t/t7450-bad-meta-files.sh
index 6b703b12bc..b73985157f 100755
--- a/t/t7450-bad-meta-files.sh
+++ b/t/t7450-bad-meta-files.sh
@@ -139,7 +139,7 @@ test_expect_success 'index-pack --strict works for non-repo pack' '
 	grep gitmodulesName output
 '
 
-test_expect_success 'fsck detects symlinked .gitmodules file' '
+test_expect_success 'create repo with symlinked .gitmodules file' '
 	git init symlink &&
 	(
 		cd symlink &&
@@ -155,8 +155,14 @@ test_expect_success 'fsck detects symlinked .gitmodules file' '
 		{
 			printf "100644 blob $content\t$tricky\n" &&
 			printf "120000 blob $target\t.gitmodules\n"
-		} >bad-tree &&
-		tree=$(git mktree <bad-tree) &&
+		} >bad-tree
+	) &&
+	tree=$(git -C symlink mktree <symlink/bad-tree)
+'
+
+test_expect_success 'fsck detects symlinked .gitmodules file' '
+	(
+		cd symlink &&
 
 		# Check not only that we fail, but that it is due to the
 		# symlink detector; this grep string comes from the config
@@ -166,6 +172,11 @@ test_expect_success 'fsck detects symlinked .gitmodules file' '
 	)
 '
 
+test_expect_success 'refuse to load symlinked .gitmodule into index' '
+	test_must_fail git -C symlink read-tree $tree 2>err &&
+	test_i18ngrep "invalid path.*gitmodules" err
+'
+
 test_expect_success 'fsck detects non-blob .gitmodules' '
 	git init non-blob &&
 	(
-- 
2.28.0.1295.g4824feede7


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 5/7] t0060: test obscured .gitattributes and .gitignore matching
  2020-10-05  7:17 [PATCH 0/7] forbidding symlinked .gitattributes and .gitignore Jeff King
                   ` (3 preceding siblings ...)
  2020-10-05  7:20 ` [PATCH 4/7] t7450: test verify_path() handling of gitmodules Jeff King
@ 2020-10-05  7:21 ` Jeff King
  2020-10-05  8:03   ` Jonathan Nieder
  2020-10-05  7:24 ` [PATCH 6/7] verify_path(): disallow symlinks in .gitattributes and .gitignore Jeff King
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 55+ messages in thread
From: Jeff King @ 2020-10-05  7:21 UTC (permalink / raw)
  To: git; +Cc: Jonathan Nieder

We have tests that cover various filesystem-specific spellings of
".gitmodules", because we need to reliably identify that path for some
security checks. These are from dc2d9ba318 (is_{hfs,ntfs}_dotgitmodules:
add tests, 2018-05-12), with the actual code coming from e7cb0b4455
(is_ntfs_dotgit: match other .git files, 2018-05-11) and 0fc333ba20
(is_hfs_dotgit: match other .git files, 2018-05-02).

Those latter two commits also added similar matching functions for
.gitattributes and .gitignore. These ended up not being used in the
final series, and are currently dead code. But in preparation for them
being used, let's make sure they actually work by throwing a few basic
checks at them.

I didn't bother with the whole battery of tests that we cover for
.gitmodules. These functions are all based on the same generic matcher,
so it's sufficient to test most of the corner cases just once.

Note that the ntfs magic prefix names in the tests come from the
algorithm described in e7cb0b4455 (and are different for each file).

Signed-off-by: Jeff King <peff@peff.net>
---
 t/helper/test-path-utils.c | 41 ++++++++++++++++++++++++++------------
 t/t0060-path-utils.sh      | 20 +++++++++++++++++++
 2 files changed, 48 insertions(+), 13 deletions(-)

diff --git a/t/helper/test-path-utils.c b/t/helper/test-path-utils.c
index 313a153209..9e253f8058 100644
--- a/t/helper/test-path-utils.c
+++ b/t/helper/test-path-utils.c
@@ -172,9 +172,22 @@ static struct test_data dirname_data[] = {
 	{ NULL,              NULL     }
 };
 
-static int is_dotgitmodules(const char *path)
+static int check_dotgitx(const char *x, const char **argv,
+			 int (*is_hfs)(const char *),
+			 int (*is_ntfs)(const char *))
 {
-	return is_hfs_dotgitmodules(path) || is_ntfs_dotgitmodules(path);
+	int res = 0, expect = 1;
+	for (; *argv; argv++) {
+		if (!strcmp("--not", *argv))
+			expect = !expect;
+		else if (expect != (is_hfs(*argv) || is_ntfs(*argv)))
+			 res = error("'%s' is %s.%s", *argv,
+				     expect ? "not " : "", x);
+		else
+			fprintf(stderr, "ok: '%s' is %s.%s\n",
+				*argv, expect ? "" : "not ", x);
+	}
+	return !!res;
 }
 
 static int cmp_by_st_size(const void *a, const void *b)
@@ -382,17 +395,19 @@ int cmd__path_utils(int argc, const char **argv)
 		return test_function(dirname_data, posix_dirname, argv[1]);
 
 	if (argc > 2 && !strcmp(argv[1], "is_dotgitmodules")) {
-		int res = 0, expect = 1, i;
-		for (i = 2; i < argc; i++)
-			if (!strcmp("--not", argv[i]))
-				expect = !expect;
-			else if (expect != is_dotgitmodules(argv[i]))
-				res = error("'%s' is %s.gitmodules", argv[i],
-					    expect ? "not " : "");
-			else
-				fprintf(stderr, "ok: '%s' is %s.gitmodules\n",
-					argv[i], expect ? "" : "not ");
-		return !!res;
+		return check_dotgitx("gitmodules", argv + 2,
+				     is_hfs_dotgitmodules,
+				     is_ntfs_dotgitmodules);
+	}
+	if (argc > 2 && !strcmp(argv[1], "is_dotgitignore")) {
+		return check_dotgitx("gitignore", argv + 2,
+				     is_hfs_dotgitignore,
+				     is_ntfs_dotgitignore);
+	}
+	if (argc > 2 && !strcmp(argv[1], "is_dotgitattributes")) {
+		return check_dotgitx("gitattributes", argv + 2,
+				     is_hfs_dotgitattributes,
+				     is_ntfs_dotgitattributes);
 	}
 
 	if (argc > 2 && !strcmp(argv[1], "file-size")) {
diff --git a/t/t0060-path-utils.sh b/t/t0060-path-utils.sh
index 56db5c8aba..b2e3cf3f4c 100755
--- a/t/t0060-path-utils.sh
+++ b/t/t0060-path-utils.sh
@@ -468,6 +468,26 @@ test_expect_success 'match .gitmodules' '
 		.gitmodules,:\$DATA
 '
 
+test_expect_success 'match .gitattributes' '
+	test-tool path-utils is_dotgitattributes \
+		.gitattributes \
+		.git${u200c}attributes \
+		.Gitattributes \
+		.gitattributeS \
+		GITATT~1 \
+		GI7D29~1
+'
+
+test_expect_success 'match .gitignore' '
+	test-tool path-utils is_dotgitignore \
+		.gitignore \
+		.git${u200c}ignore \
+		.Gitignore \
+		.gitignorE \
+		GITIGN~1 \
+		GI250A~1
+'
+
 test_expect_success MINGW 'is_valid_path() on Windows' '
 	test-tool path-utils is_valid_path \
 		win32 \
-- 
2.28.0.1295.g4824feede7


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 6/7] verify_path(): disallow symlinks in .gitattributes and .gitignore
  2020-10-05  7:17 [PATCH 0/7] forbidding symlinked .gitattributes and .gitignore Jeff King
                   ` (4 preceding siblings ...)
  2020-10-05  7:21 ` [PATCH 5/7] t0060: test obscured .gitattributes and .gitignore matching Jeff King
@ 2020-10-05  7:24 ` Jeff King
  2020-10-05  8:09   ` Jonathan Nieder
  2020-10-05  7:25 ` [PATCH 7/7] fsck: complain when .gitattributes or .gitignore is a symlink Jeff King
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 55+ messages in thread
From: Jeff King @ 2020-10-05  7:24 UTC (permalink / raw)
  To: git; +Cc: Jonathan Nieder

In commit 10ecfa7649 (verify_path: disallow symlinks in .gitmodules,
2018-05-04) we made it impossible to load a .gitmodules file that's a
symlink into the index. The security reasons for doing so are described
there. We also discussed forbidding symlinks of other .git files as part
of that fix, but the tradeoff was less compelling:

  1. Unlike .gitmodules, the other files don't have content-level fsck
     checks. So an attacker using symlinks to evade those checks isn't a
     problem.

  2. Unlike .gitmodules, Git will never write .gitignore or
     .gitattributes itself, making it much less likely to use them to
     write outside the repo. They could be used for out-of-repo reads,
     however.

  3. The .gitmodules change was part of a critical bug-fix that was
     not publicly disclosed until it was released. Changing the other
     files was not needed for the minimal fix.

However, it's still a reasonable idea to forbid symlinks for these
files:

  - As noted, they can still be used to read out-of-repo files (which is
    fairly restricted, but in some circumstances you can probe file
    content by speculatively creating files and seeing if they get
    ignored)

  - They don't currently behave well in all cases. We sometimes read
    these files from the index, where we _don't_ follow symlinks (we'd
    just treat the symlink target as the .gitignore or .gitattributes
    content, which is actively wrong).

This patch forbids symlinked versions of these files from entering the
index. We already have helpers for obscured forms of the names from
e7cb0b4455 (is_ntfs_dotgit: match other .git files, 2018-05-11) and
0fc333ba20 (is_hfs_dotgit: match other .git files, 2018-05-02), which
were done as part of the series touching .gitmodules.

Signed-off-by: Jeff King <peff@peff.net>
---
I note that neither these new tests nor the existing .gitmodules ones
confirm that we catch the obscured ntfs/hfs forms in the actual code
paths (instead, we feed them to a synthetic test-tool helper in t0060).
I think that's OK, but if we wanted to be super-paranoid we could beef
up these tests with trickier names.

 read-cache.c              | 12 +++++++++---
 t/t7450-bad-meta-files.sh | 29 +++++++++++++++++++++++++++++
 2 files changed, 38 insertions(+), 3 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index ecf6f68994..63aec6c35d 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -947,7 +947,9 @@ static int verify_dotfile(const char *rest, unsigned mode)
 			return 0;
 		if (S_ISLNK(mode)) {
 			rest += 3;
-			if (skip_iprefix(rest, "modules", &rest) &&
+			if ((skip_iprefix(rest, "modules", &rest) ||
+			     skip_iprefix(rest, "ignore", &rest) ||
+			     skip_iprefix(rest, "attributes", &rest)) &&
 			    (*rest == '\0' || is_dir_sep(*rest)))
 				return 0;
 		}
@@ -980,7 +982,9 @@ int verify_path(const char *path, unsigned mode)
 				if (is_hfs_dotgit(path))
 					return 0;
 				if (S_ISLNK(mode)) {
-					if (is_hfs_dotgitmodules(path))
+					if (is_hfs_dotgitmodules(path) ||
+					    is_hfs_dotgitignore(path) ||
+					    is_hfs_dotgitattributes(path))
 						return 0;
 				}
 			}
@@ -992,7 +996,9 @@ int verify_path(const char *path, unsigned mode)
 				if (is_ntfs_dotgit(path))
 					return 0;
 				if (S_ISLNK(mode)) {
-					if (is_ntfs_dotgitmodules(path))
+					if (is_ntfs_dotgitmodules(path) ||
+					    is_ntfs_dotgitignore(path) ||
+					    is_ntfs_dotgitattributes(path))
 						return 0;
 				}
 			}
diff --git a/t/t7450-bad-meta-files.sh b/t/t7450-bad-meta-files.sh
index b73985157f..6a038ed55b 100755
--- a/t/t7450-bad-meta-files.sh
+++ b/t/t7450-bad-meta-files.sh
@@ -267,4 +267,33 @@ test_expect_success 'git dirs of sibling submodules must not be nested' '
 	test_i18ngrep "is inside git dir" err
 '
 
+test_expect_success 'create repo with symlinked .gitattributes file' '
+	git init symlink-attr &&
+	target=$(echo target | git -C symlink-attr hash-object -w --stdin) &&
+	tree=$(
+		printf "120000 blob $target\t.gitattributes\n" |
+		git -C symlink-attr mktree
+	)
+'
+
+test_expect_success 'refuse to load symlinked .gitattributes into index' '
+	test_must_fail git -C symlink-attr read-tree $tree 2>err &&
+	test_i18ngrep "invalid path.*gitattributes" err
+'
+
+test_expect_success 'create repo with symlinked .gitignore file' '
+	git init symlink-ignore &&
+	target=$(echo target | git -C symlink-ignore hash-object -w --stdin) &&
+	tree=$(
+		printf "120000 blob $target\t.gitignore\n" |
+		git -C symlink-ignore mktree
+	)
+'
+
+test_expect_success 'refuse to load symlinked .gitignore into index' '
+	test_must_fail git -C symlink-ignore read-tree $tree 2>err &&
+	test_i18ngrep "invalid path.*gitignore" err
+'
+
+
 test_done
-- 
2.28.0.1295.g4824feede7


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 7/7] fsck: complain when .gitattributes or .gitignore is a symlink
  2020-10-05  7:17 [PATCH 0/7] forbidding symlinked .gitattributes and .gitignore Jeff King
                   ` (5 preceding siblings ...)
  2020-10-05  7:24 ` [PATCH 6/7] verify_path(): disallow symlinks in .gitattributes and .gitignore Jeff King
@ 2020-10-05  7:25 ` Jeff King
  2020-10-05  8:12   ` Jonathan Nieder
  2020-10-05  7:32 ` [PATCH 0/7] forbidding symlinked .gitattributes and .gitignore Jonathan Nieder
  2020-10-05 12:16 ` [PATCH v2 0/8] " Jeff King
  8 siblings, 1 reply; 55+ messages in thread
From: Jeff King @ 2020-10-05  7:25 UTC (permalink / raw)
  To: git; +Cc: Jonathan Nieder

The previous commit made it impossible to have a symlinked
.gitattributes or .gitignore file via verify_path(). Let's add the same
check to fsck, which matches how we handle .gitmodules symlinks, via
b7b1fca175 (fsck: complain when .gitmodules is a symlink, 2018-05-04).

Note that we won't add these to the existing gitmodules block. Its logic
is a bit more complicated, as we also check the content of non-symlink
instances we find. But for these new files, there is no content check;
we're just looking at the name and mode of the tree entry (and we can
avoid even the complicated name checks in the common case that the mode
doesn't indicate a symlink).

Signed-off-by: Jeff King <peff@peff.net>
---
 fsck.c                    | 15 +++++++++++++++
 t/t7450-bad-meta-files.sh |  9 +++++++++
 2 files changed, 24 insertions(+)

diff --git a/fsck.c b/fsck.c
index 024810139b..fcd3f268b1 100644
--- a/fsck.c
+++ b/fsck.c
@@ -67,6 +67,8 @@ static struct oidset gitmodules_done = OIDSET_INIT;
 	FUNC(GITMODULES_URL, ERROR) \
 	FUNC(GITMODULES_PATH, ERROR) \
 	FUNC(GITMODULES_UPDATE, ERROR) \
+	FUNC(GITIGNORE_SYMLINK, ERROR) \
+	FUNC(GITATTRIBUTES_SYMLINK, ERROR) \
 	/* warnings */ \
 	FUNC(BAD_FILEMODE, WARN) \
 	FUNC(EMPTY_NAME, WARN) \
@@ -688,6 +690,19 @@ static int fsck_tree(const struct object_id *tree_oid,
 						 ".gitmodules is a symbolic link");
 		}
 
+		if (S_ISLNK(mode)) {
+			if (is_hfs_dotgitignore(name) ||
+			    is_ntfs_dotgitignore(name))
+				retval += report(options, tree_oid, OBJ_TREE,
+						 FSCK_MSG_GITIGNORE_SYMLINK,
+						 ".gitignore is a symlink");
+			if (is_hfs_dotgitattributes(name) ||
+			    is_ntfs_dotgitattributes(name))
+				retval += report(options, tree_oid, OBJ_TREE,
+						 FSCK_MSG_GITATTRIBUTES_SYMLINK,
+						 ".gitattributes is a symlink");
+		}
+
 		if ((backslash = strchr(name, '\\'))) {
 			while (backslash) {
 				backslash++;
diff --git a/t/t7450-bad-meta-files.sh b/t/t7450-bad-meta-files.sh
index 6a038ed55b..c7201803ec 100755
--- a/t/t7450-bad-meta-files.sh
+++ b/t/t7450-bad-meta-files.sh
@@ -281,6 +281,11 @@ test_expect_success 'refuse to load symlinked .gitattributes into index' '
 	test_i18ngrep "invalid path.*gitattributes" err
 '
 
+test_expect_success 'fsck detects symlinked .gitattributes file' '
+	test_must_fail git -C symlink-attr fsck 2>err &&
+	test_i18ngrep "tree $tree: gitattributesSymlink" err
+'
+
 test_expect_success 'create repo with symlinked .gitignore file' '
 	git init symlink-ignore &&
 	target=$(echo target | git -C symlink-ignore hash-object -w --stdin) &&
@@ -295,5 +300,9 @@ test_expect_success 'refuse to load symlinked .gitignore into index' '
 	test_i18ngrep "invalid path.*gitignore" err
 '
 
+test_expect_success 'fsck detects symlinked .gitignore file' '
+	test_must_fail git -C symlink-ignore fsck 2>err &&
+	test_i18ngrep "tree $tree: gitignoreSymlink" err
+'
 
 test_done
-- 
2.28.0.1295.g4824feede7

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 0/7] forbidding symlinked .gitattributes and .gitignore
  2020-10-05  7:17 [PATCH 0/7] forbidding symlinked .gitattributes and .gitignore Jeff King
                   ` (6 preceding siblings ...)
  2020-10-05  7:25 ` [PATCH 7/7] fsck: complain when .gitattributes or .gitignore is a symlink Jeff King
@ 2020-10-05  7:32 ` Jonathan Nieder
  2020-10-05  8:58   ` Jeff King
  2020-10-05 12:16 ` [PATCH v2 0/8] " Jeff King
  8 siblings, 1 reply; 55+ messages in thread
From: Jonathan Nieder @ 2020-10-05  7:32 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King wrote:

> About 2 years ago as part of a security release we made it illegal to
> have a symlinked .gitmodules file (refusing it both in the index and via
> fsck). At the time we discussed (on the security list) outlawing
> symlinks for other .git files in the same way, but we decided not to do
> so as part of the security release, as it wasn't strictly necessary.
>
> We publicly revisited the topic in:
>
>   https://lore.kernel.org/git/20190114230902.GG162110@google.com/
>
> but there were a few fixes needed, and it got forgotten. So here it is
> again, with those fixes:

Oh!  I'm excited --- at $DAYJOB we've been carrying that patch since
then; I'll be happy to drop it. :)

Jonathan

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/7] fsck_tree(): fix shadowed variable
  2020-10-05  7:19 ` [PATCH 1/7] fsck_tree(): fix shadowed variable Jeff King
@ 2020-10-05  7:44   ` Jonathan Nieder
  2020-10-05  8:20     ` Jeff King
  0 siblings, 1 reply; 55+ messages in thread
From: Jonathan Nieder @ 2020-10-05  7:44 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King wrote:

> Commit b2f2039c2b (fsck: accept an oid instead of a "struct tree" for
> fsck_tree(), 2019-10-18) introduced a new "oid" parameter to
> fsck_tree(), and we pass it to the report() function when we find
> problems. However, that is shadowed within the tree-walking loop by the
> existing "oid" variable which we use to store the oid of each tree
> entry. As a result, we may report the wrong oid for some problems we
> detect within the loop (the entry oid, instead of the tree oid).
>
> Our tests didn't catch this because they checked only that we found the
> expected fsck problem, not that it was attached to the correct object.

Oh, goodness.  Does this mean we should be similarly checking oid in
the rest of the fsck test scripts?  (I'm not saying this patch should
do so, just curious about what you think on the subject.)

> Let's rename both variables in the function to avoid confusion. This
> makes the diff a little noisy (e.g., all of the report() calls outside
> the loop wee already correct but need touched), but makes sure we catch

nit: s/wee/are/, s/need touched/need to be touched/

> all cases and will avoid similar confusion in the future.

Thanks for doing that --- it does make reviewing easier.

> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  fsck.c                     | 40 +++++++++++++++++++-------------------
>  t/t7415-submodule-names.sh |  5 +++--
>  2 files changed, 23 insertions(+), 22 deletions(-)
> 
> diff --git a/fsck.c b/fsck.c
> index f82e2fe9e3..46a108839f 100644
> --- a/fsck.c
> +++ b/fsck.c
> @@ -633,7 +633,7 @@ static int verify_ordered(unsigned mode1, const char *name1,
>  	return c1 < c2 ? 0 : TREE_UNORDERED;
>  }
>  
> -static int fsck_tree(const struct object_id *oid,
> +static int fsck_tree(const struct object_id *tree_oid,

optional: we could call it "tree".

>  		     const char *buffer, unsigned long size,
>  		     struct fsck_options *options)
>  {
> @@ -654,7 +654,7 @@ static int fsck_tree(const struct object_id *oid,
>  	struct name_stack df_dup_candidates = { NULL };
>  
>  	if (init_tree_desc_gently(&desc, buffer, size)) {
> -		retval += report(options, oid, OBJ_TREE, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree");
> +		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree");
>  		return retval;
>  	}
>  
> @@ -664,11 +664,11 @@ static int fsck_tree(const struct object_id *oid,
>  	while (desc.size) {
>  		unsigned short mode;
>  		const char *name, *backslash;
> -		const struct object_id *oid;
> +		const struct object_id *entry_oid;
>  
> -		oid = tree_entry_extract(&desc, &name, &mode);
> +		entry_oid = tree_entry_extract(&desc, &name, &mode);

optional: could call it "child".

>  
> -		has_null_sha1 |= is_null_oid(oid);
> +		has_null_sha1 |= is_null_oid(entry_oid);
>  		has_full_path |= !!strchr(name, '/');
>  		has_empty_name |= !*name;
>  		has_dot |= !strcmp(name, ".");
> @@ -678,10 +678,10 @@ static int fsck_tree(const struct object_id *oid,
>  
>  		if (is_hfs_dotgitmodules(name) || is_ntfs_dotgitmodules(name)) {
>  			if (!S_ISLNK(mode))
> -				oidset_insert(&gitmodules_found, oid);
> +				oidset_insert(&gitmodules_found, entry_oid);
>  			else
>  				retval += report(options,
> -						 oid, OBJ_TREE,
> +						 tree_oid, OBJ_TREE,
>  						 FSCK_MSG_GITMODULES_SYMLINK,
>  						 ".gitmodules is a symbolic link");

Right, this is a property of the enclosing tree, not the blob
representing the symlink target.  Good.

not about this patch: Could report() notice when the oid doesn't match
the type passed in, to more easily catch this kind of mistake?

[...]
> @@ -692,9 +692,9 @@ static int fsck_tree(const struct object_id *oid,
>  				has_dotgit |= is_ntfs_dotgit(backslash);
>  				if (is_ntfs_dotgitmodules(backslash)) {
>  					if (!S_ISLNK(mode))
> -						oidset_insert(&gitmodules_found, oid);
> +						oidset_insert(&gitmodules_found, entry_oid);
>  					else
> -						retval += report(options, oid, OBJ_TREE,
> +						retval += report(options, tree_oid, OBJ_TREE,
>  								 FSCK_MSG_GITMODULES_SYMLINK,
>  								 ".gitmodules is a symbolic link");

Likewise, good.

[...]
> --- a/t/t7415-submodule-names.sh
> +++ b/t/t7415-submodule-names.sh
> @@ -148,13 +148,14 @@ test_expect_success 'fsck detects symlinked .gitmodules file' '
>  		{
>  			printf "100644 blob $content\t$tricky\n" &&
>  			printf "120000 blob $target\t.gitmodules\n"
> -		} | git mktree &&
> +		} >bad-tree &&
> +		tree=$(git mktree <bad-tree) &&
>  
>  		# Check not only that we fail, but that it is due to the
>  		# symlink detector; this grep string comes from the config
>  		# variable name and will not be translated.
>  		test_must_fail git fsck 2>output &&
> -		test_i18ngrep gitmodulesSymlink output
> +		test_i18ngrep "tree $tree: gitmodulesSymlink" output

Makes sense.

By the way, why does GETTEXT_POISON lose the gitmodulesSymlink
keyword?  Is this just a limitation of GETTEXT_POISON losing
information that's passed in with %s?

With the commit message tweak mentioned above,
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>

Thanks.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 2/7] fsck_tree(): wrap some long lines
  2020-10-05  7:19 ` [PATCH 2/7] fsck_tree(): wrap some long lines Jeff King
@ 2020-10-05  7:46   ` Jonathan Nieder
  0 siblings, 0 replies; 55+ messages in thread
From: Jonathan Nieder @ 2020-10-05  7:46 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King wrote:

> Many calls to report() in fsck_tree() are kept on a single line and are
> quite long. Most were pretty big to begin with, but have gotten even
> longer over the years as we've added more parameters. Let's accept the
> churn of wrapping them in order to conform to our usual line limits.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  fsck.c | 48 ++++++++++++++++++++++++++++++++++++------------
>  1 file changed, 36 insertions(+), 12 deletions(-)

"git show --word-diff" and "git show --check" show nothing obvious
wrong (which in particular means there's no functional change); eyeing
the patch, it looks good, too.  Thanks.

Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 3/7] t7415: rename to expand scope
  2020-10-05  7:19 ` [PATCH 3/7] t7415: rename to expand scope Jeff King
@ 2020-10-05  7:50   ` Jonathan Nieder
  2020-10-05  8:24     ` Jeff King
  0 siblings, 1 reply; 55+ messages in thread
From: Jonathan Nieder @ 2020-10-05  7:50 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King wrote:

> This script has already expanded beyond its original intent of ".. in
> submodule names" to include other malicious submodule bits. Let's update
> the name and description to reflect that, as well as the fact that we'll
> soon be adding similar tests for other meta-files (.gitattributes, etc).
> We'll also renumber it to move it out of the group of submodule-specific
> tests.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  ...5-submodule-names.sh => t7450-bad-meta-files.sh} | 13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
>  rename t/{t7415-submodule-names.sh => t7450-bad-meta-files.sh} (95%)

I've never heard of a "meta file" before, but I don't tend to discover
test scripts based on their filename anyway. :)  Thanks for updating the
test_description.

t7* is "the porcelainish commands concerning the working tree".  Should
this go in t1* (basic commands concerning database) instead?

t745* is unused number space so this at least won't hit any conflicts,
so fwiw

Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 4/7] t7450: test verify_path() handling of gitmodules
  2020-10-05  7:20 ` [PATCH 4/7] t7450: test verify_path() handling of gitmodules Jeff King
@ 2020-10-05  7:53   ` Jonathan Nieder
  2020-10-05  8:30     ` Jeff King
  0 siblings, 1 reply; 55+ messages in thread
From: Jonathan Nieder @ 2020-10-05  7:53 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Hi,

Jeff King wrote:

> Commit 10ecfa7649 (verify_path: disallow symlinks in .gitmodules,
> 2018-05-04) made it impossible to load a symlink .gitmodules file into
> the index. However, there are no tests of this behavior. Let's make sure
> this case is covered. We can easily reuse the test setup created by
> the matching b7b1fca175 (fsck: complain when .gitmodules is a symlink,
> 2018-05-04).
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  t/t7450-bad-meta-files.sh | 17 ++++++++++++++---
>  1 file changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/t/t7450-bad-meta-files.sh b/t/t7450-bad-meta-files.sh
> index 6b703b12bc..b73985157f 100755
> --- a/t/t7450-bad-meta-files.sh
> +++ b/t/t7450-bad-meta-files.sh
> @@ -139,7 +139,7 @@ test_expect_success 'index-pack --strict works for non-repo pack' '
>  	grep gitmodulesName output
>  '
>  
> -test_expect_success 'fsck detects symlinked .gitmodules file' '
> +test_expect_success 'create repo with symlinked .gitmodules file' '
>  	git init symlink &&
>  	(
>  		cd symlink &&
> @@ -155,8 +155,14 @@ test_expect_success 'fsck detects symlinked .gitmodules file' '
>  		{
>  			printf "100644 blob $content\t$tricky\n" &&
>  			printf "120000 blob $target\t.gitmodules\n"
> -		} >bad-tree &&
> -		tree=$(git mktree <bad-tree) &&
> +		} >bad-tree
> +	) &&
> +	tree=$(git -C symlink mktree <symlink/bad-tree)
> +'

This is super nitpicky, but: test scripts can be hard to maintain when
there's this kind of state carried from assertion to assertion without
it being made obvious.

Can this include "setup" or "set up" in the name to do that?  E.g.

	test_expect_success 'set up repo with symlinked .gitmodules file' '
		...
	'

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 5/7] t0060: test obscured .gitattributes and .gitignore matching
  2020-10-05  7:21 ` [PATCH 5/7] t0060: test obscured .gitattributes and .gitignore matching Jeff King
@ 2020-10-05  8:03   ` Jonathan Nieder
  2020-10-05  8:40     ` Jeff King
  0 siblings, 1 reply; 55+ messages in thread
From: Jonathan Nieder @ 2020-10-05  8:03 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Johannes Schindelin

(+cc: Dscho for NTFS savvy)
Jeff King wrote:

> We have tests that cover various filesystem-specific spellings of
> ".gitmodules", because we need to reliably identify that path for some
> security checks. These are from dc2d9ba318 (is_{hfs,ntfs}_dotgitmodules:
> add tests, 2018-05-12), with the actual code coming from e7cb0b4455
> (is_ntfs_dotgit: match other .git files, 2018-05-11) and 0fc333ba20
> (is_hfs_dotgit: match other .git files, 2018-05-02).
>
> Those latter two commits also added similar matching functions for
> .gitattributes and .gitignore. These ended up not being used in the
> final series, and are currently dead code. But in preparation for them
> being used, let's make sure they actually work by throwing a few basic
> checks at them.
>
> I didn't bother with the whole battery of tests that we cover for
> .gitmodules. These functions are all based on the same generic matcher,
> so it's sufficient to test most of the corner cases just once.

Yeah, that's reasonable.

> Note that the ntfs magic prefix names in the tests come from the
> algorithm described in e7cb0b4455 (and are different for each file).

Doesn't block this patch, but I'm curious: how hard would it be to make
a test with an NTFS prerequisite that makes sure we got the magic prefix
right?

> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  t/helper/test-path-utils.c | 41 ++++++++++++++++++++++++++------------
>  t/t0060-path-utils.sh      | 20 +++++++++++++++++++
>  2 files changed, 48 insertions(+), 13 deletions(-)
> 
> diff --git a/t/helper/test-path-utils.c b/t/helper/test-path-utils.c
> index 313a153209..9e253f8058 100644
> --- a/t/helper/test-path-utils.c
> +++ b/t/helper/test-path-utils.c
> @@ -172,9 +172,22 @@ static struct test_data dirname_data[] = {
>  	{ NULL,              NULL     }
>  };
>  
> -static int is_dotgitmodules(const char *path)
> +static int check_dotgitx(const char *x, const char **argv,
> +			 int (*is_hfs)(const char *),
> +			 int (*is_ntfs)(const char *))
>  {
> -	return is_hfs_dotgitmodules(path) || is_ntfs_dotgitmodules(path);
> +	int res = 0, expect = 1;
> +	for (; *argv; argv++) {
> +		if (!strcmp("--not", *argv))
> +			expect = !expect;
> +		else if (expect != (is_hfs(*argv) || is_ntfs(*argv)))
> +			 res = error("'%s' is %s.%s", *argv,
> +				     expect ? "not " : "", x);
> +		else
> +			fprintf(stderr, "ok: '%s' is %s.%s\n",
> +				*argv, expect ? "" : "not ", x);

micronit: extra space on the "res" line.

This "if" cascade is a little hard to read, even though it does the
right thing.  Can we make it more explicit?  E.g.

		if (!strcmp("--not", *argv)) {
			expect = !expect;
			continue;
		}

		actual = is_hfs(*argv) || is_ntfs(*argv);

		fprintf(stderr, "%s: '%s' is %s%s",
			expect == actual ? "ok" : "error",
			*argv, actual ? "" : "not ", x);
		if (expect != actual)
			res = -1;

I think it's a little easier to read with either (a) the dot included
in the 'x' parameter or (b) the entire ".git" missing from the 'x'
parameter.

[...]
> index 56db5c8aba..b2e3cf3f4c 100755
> --- a/t/t0060-path-utils.sh
> +++ b/t/t0060-path-utils.sh
> @@ -468,6 +468,26 @@ test_expect_success 'match .gitmodules' '
>  		.gitmodules,:\$DATA
>  '
>  
> +test_expect_success 'match .gitattributes' '
> +	test-tool path-utils is_dotgitattributes \
> +		.gitattributes \
> +		.git${u200c}attributes \
> +		.Gitattributes \
> +		.gitattributeS \
> +		GITATT~1 \
> +		GI7D29~1
> +'
> +
> +test_expect_success 'match .gitignore' '
> +	test-tool path-utils is_dotgitignore \
> +		.gitignore \
> +		.git${u200c}ignore \
> +		.Gitignore \
> +		.gitignorE \
> +		GITIGN~1 \
> +		GI250A~1
> +'
> +
>  test_expect_success MINGW 'is_valid_path() on Windows' '
>  	test-tool path-utils is_valid_path \
>  		win32 \

With whatever subset of the changes above makes sense,
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>

Thanks.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 6/7] verify_path(): disallow symlinks in .gitattributes and .gitignore
  2020-10-05  7:24 ` [PATCH 6/7] verify_path(): disallow symlinks in .gitattributes and .gitignore Jeff King
@ 2020-10-05  8:09   ` Jonathan Nieder
  2020-10-05 12:07     ` Jeff King
  0 siblings, 1 reply; 55+ messages in thread
From: Jonathan Nieder @ 2020-10-05  8:09 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King wrote:

> However, it's still a reasonable idea to forbid symlinks for these
> files:
>
>   - As noted, they can still be used to read out-of-repo files (which is
>     fairly restricted, but in some circumstances you can probe file
>     content by speculatively creating files and seeing if they get
>     ignored)
>
>   - They don't currently behave well in all cases. We sometimes read
>     these files from the index, where we _don't_ follow symlinks (we'd
>     just treat the symlink target as the .gitignore or .gitattributes
>     content, which is actively wrong).
>
> This patch forbids symlinked versions of these files from entering the
> index. We already have helpers for obscured forms of the names from
> e7cb0b4455 (is_ntfs_dotgit: match other .git files, 2018-05-11) and
> 0fc333ba20 (is_hfs_dotgit: match other .git files, 2018-05-02), which
> were done as part of the series touching .gitmodules.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> I note that neither these new tests nor the existing .gitmodules ones
> confirm that we catch the obscured ntfs/hfs forms in the actual code
> paths (instead, we feed them to a synthetic test-tool helper in t0060).
> I think that's OK, but if we wanted to be super-paranoid we could beef
> up these tests with trickier names.

I think being exhaustive wouldn't be worth it, but perhaps *one*
example (e.g., ".gitmodules ") would not be a terrible idea.

>  read-cache.c              | 12 +++++++++---
>  t/t7450-bad-meta-files.sh | 29 +++++++++++++++++++++++++++++
>  2 files changed, 38 insertions(+), 3 deletions(-)

Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>

[...]
> --- a/t/t7450-bad-meta-files.sh
> +++ b/t/t7450-bad-meta-files.sh
> @@ -267,4 +267,33 @@ test_expect_success 'git dirs of sibling submodules must not be nested' '
>  	test_i18ngrep "is inside git dir" err
>  '
>  
> +test_expect_success 'create repo with symlinked .gitattributes file' '
> +	git init symlink-attr &&
> +	target=$(echo target | git -C symlink-attr hash-object -w --stdin) &&
> +	tree=$(
> +		printf "120000 blob $target\t.gitattributes\n" |
> +		git -C symlink-attr mktree
> +	)
> +'
> +
> +test_expect_success 'refuse to load symlinked .gitattributes into index' '
> +	test_must_fail git -C symlink-attr read-tree $tree 2>err &&
> +	test_i18ngrep "invalid path.*gitattributes" err

This tests that it fails but doesn't test that it had no effect.
Would that be straightforward to check as well (e.g. an "ls-files -s"
before and after)?

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 7/7] fsck: complain when .gitattributes or .gitignore is a symlink
  2020-10-05  7:25 ` [PATCH 7/7] fsck: complain when .gitattributes or .gitignore is a symlink Jeff King
@ 2020-10-05  8:12   ` Jonathan Nieder
  2020-10-05  8:53     ` Jeff King
  0 siblings, 1 reply; 55+ messages in thread
From: Jonathan Nieder @ 2020-10-05  8:12 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King wrote:

> The previous commit made it impossible to have a symlinked
> .gitattributes or .gitignore file via verify_path(). Let's add the same
> check to fsck, which matches how we handle .gitmodules symlinks, via
> b7b1fca175 (fsck: complain when .gitmodules is a symlink, 2018-05-04).
>
> Note that we won't add these to the existing gitmodules block. Its logic
> is a bit more complicated, as we also check the content of non-symlink
> instances we find. But for these new files, there is no content check;
> we're just looking at the name and mode of the tree entry (and we can
> avoid even the complicated name checks in the common case that the mode
> doesn't indicate a symlink).

On the subject of where the check gets added, the old description said

	It's easier to handle than .gitmodules, because we don't care
	about checking the blob content. This is really just about
	whether the name and mode for the tree entry are valid.

which I think was self-explanatory enough.  The new text is a little
more confusing because I get lost in figuring out what the "it" in
"Its" refers to.

> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  fsck.c                    | 15 +++++++++++++++
>  t/t7450-bad-meta-files.sh |  9 +++++++++
>  2 files changed, 24 insertions(+)

Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Thanks for a pleasant read.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/7] fsck_tree(): fix shadowed variable
  2020-10-05  7:44   ` Jonathan Nieder
@ 2020-10-05  8:20     ` Jeff King
  2020-10-05  8:29       ` Jonathan Nieder
  0 siblings, 1 reply; 55+ messages in thread
From: Jeff King @ 2020-10-05  8:20 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git

On Mon, Oct 05, 2020 at 12:44:04AM -0700, Jonathan Nieder wrote:

> > Our tests didn't catch this because they checked only that we found the
> > expected fsck problem, not that it was attached to the correct object.
> 
> Oh, goodness.  Does this mean we should be similarly checking oid in
> the rest of the fsck test scripts?  (I'm not saying this patch should
> do so, just curious about what you think on the subject.)

I don't feel strongly either way. It would be a bigger change than you
might hope, I'd expect, because it requires collecting the oid of the
bad object in the test script as we create it. So I'm not planning to
work on it, but if somebody else wants to, be my guest.

> > Let's rename both variables in the function to avoid confusion. This
> > makes the diff a little noisy (e.g., all of the report() calls outside
> > the loop wee already correct but need touched), but makes sure we catch
> 
> nit: s/wee/are/, s/need touched/need to be touched/

Foiled by my last-minute editing. It was originally "are", but I meant
to change it to "were".

> > -static int fsck_tree(const struct object_id *oid,
> > +static int fsck_tree(const struct object_id *tree_oid,
> 
> optional: we could call it "tree".

Yeah, I started that way but wondered if it was confusing with a "struct
tree" (which we don't have here; the point of the change that introduced
the shadowing was getting rid of that). Being longer doesn't hurt too
much and is quite clear.

> > @@ -664,11 +664,11 @@ static int fsck_tree(const struct object_id *oid,
> >  	while (desc.size) {
> >  		unsigned short mode;
> >  		const char *name, *backslash;
> > -		const struct object_id *oid;
> > +		const struct object_id *entry_oid;
> >  
> > -		oid = tree_entry_extract(&desc, &name, &mode);
> > +		entry_oid = tree_entry_extract(&desc, &name, &mode);
> 
> optional: could call it "child".

IMHO that's too vague. We have a child name, a child mode, etc.

Another option is to just refer to desc.entry.oid directly, but that's
longer and seemed a bit too intimate with how tree_entry_extract works.

> not about this patch: Could report() notice when the oid doesn't match
> the type passed in, to more easily catch this kind of mistake?

Hmm. It would incur an extra hash lookup, but since we expect to
report() infrequently, I don't think the cost would be too high. I
suspect we could even drop the "type" field and have report() figure it
out from the oid, but that is working in the opposite direction of your
suggestion. ;)

I'm not 100% sure we'd always be able to look up such a struct, though.
This code path also gets called from index-pack as it's checking objects
(so likewise we might not even have something in the object database
yet).

> >  		# Check not only that we fail, but that it is due to the
> >  		# symlink detector; this grep string comes from the config
> >  		# variable name and will not be translated.
> >  		test_must_fail git fsck 2>output &&
> > -		test_i18ngrep gitmodulesSymlink output
> > +		test_i18ngrep "tree $tree: gitmodulesSymlink" output
> 
> Makes sense.
> 
> By the way, why does GETTEXT_POISON lose the gitmodulesSymlink
> keyword?  Is this just a limitation of GETTEXT_POISON losing
> information that's passed in with %s?

Yes, I think so. It comes from 674ba34038 (fsck: mark strings for
translation, 2018-11-10) which is passing through our string. Arguably
that commit made the comment lines rather confusing and pointless.

Though hmm. Looks like the "tree %s: %s" part is in the translated
string. So a translation _could_ change it, though I'd expect it to only
change the words and not the syntax. Maybe an RTL language would. I
dunno.

-Peff

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 3/7] t7415: rename to expand scope
  2020-10-05  7:50   ` Jonathan Nieder
@ 2020-10-05  8:24     ` Jeff King
  2020-10-05  8:34       ` Jonathan Nieder
  0 siblings, 1 reply; 55+ messages in thread
From: Jeff King @ 2020-10-05  8:24 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git

On Mon, Oct 05, 2020 at 12:50:20AM -0700, Jonathan Nieder wrote:

> >  ...5-submodule-names.sh => t7450-bad-meta-files.sh} | 13 ++++++++++---
> >  1 file changed, 10 insertions(+), 3 deletions(-)
> >  rename t/{t7415-submodule-names.sh => t7450-bad-meta-files.sh} (95%)
> 
> I've never heard of a "meta file" before, but I don't tend to discover
> test scripts based on their filename anyway. :)  Thanks for updating the
> test_description.

I couldn't think of a better name for "files that start with .git". I
almost called it "dot-git", but then I worried about confusion with the
actual ".git" directory.

> t7* is "the porcelainish commands concerning the working tree".  Should
> this go in t1* (basic commands concerning database) instead?
> 
> t745* is unused number space so this at least won't hit any conflicts,
> so fwiw

We've generally tried to order tests so that basic functionality in some
area comes before more advanced. So I tried to put these specialty
.gitmodules tests after the basic submodule tests (and likewise after
any attribute or gitignore tests).

In practice, I doubt it matters that much. We don't tend to run the test
suite serially in order these days anyway, so the notion that finding a
bug in an early test might save you CPU time or time spent reading error
messages likely no longer applies.

-Peff

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/7] fsck_tree(): fix shadowed variable
  2020-10-05  8:20     ` Jeff King
@ 2020-10-05  8:29       ` Jonathan Nieder
  0 siblings, 0 replies; 55+ messages in thread
From: Jonathan Nieder @ 2020-10-05  8:29 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King wrote:
> On Mon, Oct 05, 2020 at 12:44:04AM -0700, Jonathan Nieder wrote:
>> Jeff King wrote:

>>> -		test_i18ngrep gitmodulesSymlink output
>>> +		test_i18ngrep "tree $tree: gitmodulesSymlink" output
>>
>> Makes sense.
>>
>> By the way, why does GETTEXT_POISON lose the gitmodulesSymlink
>> keyword?  Is this just a limitation of GETTEXT_POISON losing
>> information that's passed in with %s?
>
> Yes, I think so. It comes from 674ba34038 (fsck: mark strings for
> translation, 2018-11-10) which is passing through our string. Arguably
> that commit made the comment lines rather confusing and pointless.
>
> Though hmm. Looks like the "tree %s: %s" part is in the translated
> string. So a translation _could_ change it, though I'd expect it to only
> change the words and not the syntax. Maybe an RTL language would. I
> dunno.

Ah, right.  I can use $n markers in the format string to reorder
parameters, so we don't want scripts to rely on their order.  In
principle one can imagine GETTEXT_POISON printing a random permutation
of its arguments but this kind of nondeterminism in tests would be
harder to debug than the current state where the args aren't passed
through at all.

And after all, the main point of GETTEXT_POISON is to allow us to test
scripted commands to make sure we are not translating any output that
scripts would want to use.  For that purpose, the current poisoning
does what we need.  (In other words, we don't encourage non-test
scripts to rely on these lines at all, even to grep for
gitmodulesSymlink.)

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 4/7] t7450: test verify_path() handling of gitmodules
  2020-10-05  7:53   ` Jonathan Nieder
@ 2020-10-05  8:30     ` Jeff King
  2020-10-05  8:38       ` Jonathan Nieder
  0 siblings, 1 reply; 55+ messages in thread
From: Jeff King @ 2020-10-05  8:30 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git

On Mon, Oct 05, 2020 at 12:53:11AM -0700, Jonathan Nieder wrote:

> > @@ -155,8 +155,14 @@ test_expect_success 'fsck detects symlinked .gitmodules file' '
> >  		{
> >  			printf "100644 blob $content\t$tricky\n" &&
> >  			printf "120000 blob $target\t.gitmodules\n"
> > -		} >bad-tree &&
> > -		tree=$(git mktree <bad-tree) &&
> > +		} >bad-tree
> > +	) &&
> > +	tree=$(git -C symlink mktree <symlink/bad-tree)
> > +'
> 
> This is super nitpicky, but: test scripts can be hard to maintain when
> there's this kind of state carried from assertion to assertion without
> it being made obvious.
> 
> Can this include "setup" or "set up" in the name to do that?  E.g.
> 
> 	test_expect_success 'set up repo with symlinked .gitmodules file' '
> 		...
> 	'

Hmph. I specifically _tried_ to do that by breaking it into a separate
test with the name "create" in it, which I thought was one of the
code-words for "I'm doing stuff that will be used in another test". But
I guess there's no official rule on that. I dug up:

  https://lore.kernel.org/git/20130826173501.GS4110@google.com/

but I guess I mis-remembered "create" being present there.

-Peff

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 3/7] t7415: rename to expand scope
  2020-10-05  8:24     ` Jeff King
@ 2020-10-05  8:34       ` Jonathan Nieder
  2020-10-05  8:49         ` Jeff King
  0 siblings, 1 reply; 55+ messages in thread
From: Jonathan Nieder @ 2020-10-05  8:34 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King wrote:
> On Mon, Oct 05, 2020 at 12:50:20AM -0700, Jonathan Nieder wrote:
>> Jeff King wrote:

>>>  rename t/{t7415-submodule-names.sh => t7450-bad-meta-files.sh} (95%)
>>
>> I've never heard of a "meta file" before, but I don't tend to discover
>> test scripts based on their filename anyway. :)  Thanks for updating the
>> test_description.
>
> I couldn't think of a better name for "files that start with .git". I
> almost called it "dot-git", but then I worried about confusion with the
> actual ".git" directory.

t7450-dot-gitfoo-files.sh seems clear to me.

[...]
> In practice, I doubt it matters that much. We don't tend to run the test
> suite serially in order these days anyway, so the notion that finding a
> bug in an early test might save you CPU time or time spent reading error
> messages likely no longer applies.

I see --- the point here is that because it's using e.g. "git clone
--recurse-submodules", we want it to be later than the other clone
tests?

I think I'd like us to move away from having the numbers at all some
day (since collisions are very common), but there's probably not much
to discuss there until one of us comes up with a proposal that still
makes it easy to do things like "skip all git-svn tests". :)

Jonathan

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 4/7] t7450: test verify_path() handling of gitmodules
  2020-10-05  8:30     ` Jeff King
@ 2020-10-05  8:38       ` Jonathan Nieder
  0 siblings, 0 replies; 55+ messages in thread
From: Jonathan Nieder @ 2020-10-05  8:38 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King wrote:
> On Mon, Oct 05, 2020 at 12:53:11AM -0700, Jonathan Nieder wrote:
>> Jeff King wrote:

>>> @@ -155,8 +155,14 @@ test_expect_success 'fsck detects symlinked .gitmodules file' '
>>>  		{
>>>  			printf "100644 blob $content\t$tricky\n" &&
>>>  			printf "120000 blob $target\t.gitmodules\n"
>>> -		} >bad-tree &&
>>> -		tree=$(git mktree <bad-tree) &&
>>> +		} >bad-tree
>>> +	) &&
>>> +	tree=$(git -C symlink mktree <symlink/bad-tree)
>>> +'
>>
>> This is super nitpicky, but: test scripts can be hard to maintain when
>> there's this kind of state carried from assertion to assertion without
>> it being made obvious.
>>
>> Can this include "setup" or "set up" in the name to do that?  E.g.
>>
>> 	test_expect_success 'set up repo with symlinked .gitmodules file' '
>> 		...
>> 	'
>
> Hmph. I specifically _tried_ to do that by breaking it into a separate
> test with the name "create" in it, which I thought was one of the
> code-words for "I'm doing stuff that will be used in another test". But
> I guess there's no official rule on that. I dug up:
>
>   https://lore.kernel.org/git/20130826173501.GS4110@google.com/
>
> but I guess I mis-remembered "create" being present there.

I can try to find some time today to introduce a test_setup helper.
Having to figure out and rely on this kind of ad hoc convention is not
something I really want to ask of patch authors and reviewers.

The reason I find "set up" clearer than "create" is that the latter is
something I can easily imagine myself genuinely wanting to test.  "Set
up for a later test" is more explicit about what the commands are
being run for.

Jonathan

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 5/7] t0060: test obscured .gitattributes and .gitignore matching
  2020-10-05  8:03   ` Jonathan Nieder
@ 2020-10-05  8:40     ` Jeff King
  2020-10-05 21:20       ` Johannes Schindelin
  0 siblings, 1 reply; 55+ messages in thread
From: Jeff King @ 2020-10-05  8:40 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git, Johannes Schindelin

On Mon, Oct 05, 2020 at 01:03:53AM -0700, Jonathan Nieder wrote:

> > Note that the ntfs magic prefix names in the tests come from the
> > algorithm described in e7cb0b4455 (and are different for each file).
> 
> Doesn't block this patch, but I'm curious: how hard would it be to make
> a test with an NTFS prerequisite that makes sure we got the magic prefix
> right?

I suspect hard since Dscho punted on it in the original series. :) If I
understand correctly, it would require having an NTFS filesystem, and
generating 10,000+ files with a clashing prefix.

> > +	for (; *argv; argv++) {
> > +		if (!strcmp("--not", *argv))
> > +			expect = !expect;
> > +		else if (expect != (is_hfs(*argv) || is_ntfs(*argv)))
> > +			 res = error("'%s' is %s.%s", *argv,
> > +				     expect ? "not " : "", x);
> > +		else
> > +			fprintf(stderr, "ok: '%s' is %s.%s\n",
> > +				*argv, expect ? "" : "not ", x);
> 
> micronit: extra space on the "res" line.

Thanks, fixed.

> This "if" cascade is a little hard to read, even though it does the
> right thing.  Can we make it more explicit?  E.g.

This is directly moved from the existing code. I'd prefer to keep the
overall structure intact to make that clear.

> I think it's a little easier to read with either (a) the dot included
> in the 'x' parameter or (b) the entire ".git" missing from the 'x'
> parameter.

Yeah, I agree that's worth doing. I took (b), as "dotgitx" implies that
"x" is "modules", etc. I had originally planned to automatically turn
"gitmodules" into "is_ntfs_dotgitmodules", too, but it required
macros and string-pasting. So I decided it was a bit too ugly. :)

-Peff

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 3/7] t7415: rename to expand scope
  2020-10-05  8:34       ` Jonathan Nieder
@ 2020-10-05  8:49         ` Jeff King
  0 siblings, 0 replies; 55+ messages in thread
From: Jeff King @ 2020-10-05  8:49 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git

On Mon, Oct 05, 2020 at 01:34:41AM -0700, Jonathan Nieder wrote:

> > I couldn't think of a better name for "files that start with .git". I
> > almost called it "dot-git", but then I worried about confusion with the
> > actual ".git" directory.
> 
> t7450-dot-gitfoo-files.sh seems clear to me.

Heh, that was actually one of the ones I thought of, but I worried that
"foo" was too confusing (likewise, I almost called the test-tool
function check_dotgitfoo(const char *foo)). I guess dotgitx would follow
the same pattern here.

> > In practice, I doubt it matters that much. We don't tend to run the test
> > suite serially in order these days anyway, so the notion that finding a
> > bug in an early test might save you CPU time or time spent reading error
> > messages likely no longer applies.
> 
> I see --- the point here is that because it's using e.g. "git clone
> --recurse-submodules", we want it to be later than the other clone
> tests?
> 
> I think I'd like us to move away from having the numbers at all some
> day (since collisions are very common), but there's probably not much
> to discuss there until one of us comes up with a proposal that still
> makes it easy to do things like "skip all git-svn tests". :)

I'd be happy to get away from numbers, too. They're a frequent pain when
dealing with duplicates, or when we run out of space in a group (I have
another series to split t9001 into a few separate scripts, but I have to
move either it or the unrelated bits at t9002).

An obvious solution is providing some kind of name hierarchy. E.g.:

  t-svn-commit-diff.sh
  t-svn-dcommit-auto-props.sh
  t-svn-dcommit-clobber-series.sh
  t-svn-dcommit-concurrent.sh
  t-svn-dcommit-crlf.sh
  t-svn-dcommit-funky-renames.sh
  t-svn-dcommit-interactive.sh
  t-svn-dcommit-merge.sh
  t-svn-dcommit-new-file.sh
  t-svn-deep-rmdir.sh

Then you could skip t-svn-*, or just t-svn-dcommit-*, or even
t-svn-commit-diff.12.

The "t-" is ugly, but lets you distinguish test scripts from other shell
scripts in the directory.

We could also use actual directories for the hierarchy. svn/dcommit/*,
etc. The t/ directory is rather big. It does make it a little more work
to assemble the full set of tests (you have to use `find` rather than a
glob).

-Peff

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 7/7] fsck: complain when .gitattributes or .gitignore is a symlink
  2020-10-05  8:12   ` Jonathan Nieder
@ 2020-10-05  8:53     ` Jeff King
  0 siblings, 0 replies; 55+ messages in thread
From: Jeff King @ 2020-10-05  8:53 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git

On Mon, Oct 05, 2020 at 01:12:33AM -0700, Jonathan Nieder wrote:

> Jeff King wrote:
> 
> > The previous commit made it impossible to have a symlinked
> > .gitattributes or .gitignore file via verify_path(). Let's add the same
> > check to fsck, which matches how we handle .gitmodules symlinks, via
> > b7b1fca175 (fsck: complain when .gitmodules is a symlink, 2018-05-04).
> >
> > Note that we won't add these to the existing gitmodules block. Its logic
> > is a bit more complicated, as we also check the content of non-symlink
> > instances we find. But for these new files, there is no content check;
> > we're just looking at the name and mode of the tree entry (and we can
> > avoid even the complicated name checks in the common case that the mode
> > doesn't indicate a symlink).
> 
> On the subject of where the check gets added, the old description said
> 
> 	It's easier to handle than .gitmodules, because we don't care
> 	about checking the blob content. This is really just about
> 	whether the name and mode for the tree entry are valid.
> 
> which I think was self-explanatory enough.  The new text is a little
> more confusing because I get lost in figuring out what the "it" in
> "Its" refers to.

I rewrote it because I found your original text confusing. ;)

I'll write "The logic for gitmodules is a bit more complicated...".

-Peff

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 0/7] forbidding symlinked .gitattributes and .gitignore
  2020-10-05  7:32 ` [PATCH 0/7] forbidding symlinked .gitattributes and .gitignore Jonathan Nieder
@ 2020-10-05  8:58   ` Jeff King
  0 siblings, 0 replies; 55+ messages in thread
From: Jeff King @ 2020-10-05  8:58 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git

On Mon, Oct 05, 2020 at 12:32:02AM -0700, Jonathan Nieder wrote:

> Jeff King wrote:
> 
> > About 2 years ago as part of a security release we made it illegal to
> > have a symlinked .gitmodules file (refusing it both in the index and via
> > fsck). At the time we discussed (on the security list) outlawing
> > symlinks for other .git files in the same way, but we decided not to do
> > so as part of the security release, as it wasn't strictly necessary.
> >
> > We publicly revisited the topic in:
> >
> >   https://lore.kernel.org/git/20190114230902.GG162110@google.com/
> >
> > but there were a few fixes needed, and it got forgotten. So here it is
> > again, with those fixes:
> 
> Oh!  I'm excited --- at $DAYJOB we've been carrying that patch since
> then; I'll be happy to drop it. :)

Part of my ulterior motive is that we've been carrying a patch at GitHub
to pass O_NOFOLLOW when opening in-tree attributes and ignore files
(which we don't normally do on our servers, but do for things like
GitHub Pages). But I think O_NOFOLLOW isn't perfectly portable (despite
being in POSIX), and the patch is rather invasive.

I also looked at one point into preventing just out-of-tree symlinks.
That helps with out-of-repo reads, but it doesn't change the fact that
in-index reads of symlinks are broken. We _could_ change that with a lot
of work, but I don't think anybody cares enough about the feature to do
so.

-Peff

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 6/7] verify_path(): disallow symlinks in .gitattributes and .gitignore
  2020-10-05  8:09   ` Jonathan Nieder
@ 2020-10-05 12:07     ` Jeff King
  0 siblings, 0 replies; 55+ messages in thread
From: Jeff King @ 2020-10-05 12:07 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git

On Mon, Oct 05, 2020 at 01:09:30AM -0700, Jonathan Nieder wrote:

> > I note that neither these new tests nor the existing .gitmodules ones
> > confirm that we catch the obscured ntfs/hfs forms in the actual code
> > paths (instead, we feed them to a synthetic test-tool helper in t0060).
> > I think that's OK, but if we wanted to be super-paranoid we could beef
> > up these tests with trickier names.
> 
> I think being exhaustive wouldn't be worth it, but perhaps *one*
> example (e.g., ".gitmodules ") would not be a terrible idea.

It wasn't _too_ bad to extract these tests into a function, at which
point it was easy to test one filename of each type. That will be in my
v2.

> > +test_expect_success 'refuse to load symlinked .gitattributes into index' '
> > +	test_must_fail git -C symlink-attr read-tree $tree 2>err &&
> > +	test_i18ngrep "invalid path.*gitattributes" err
> 
> This tests that it fails but doesn't test that it had no effect.
> Would that be straightforward to check as well (e.g. an "ls-files -s"
> before and after)?

Yeah, "ls-files" should be empty afterwards (we've never added anything
to the index during the creation steps). That was easy enough to add.

-Peff

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v2 0/8] forbidding symlinked .gitattributes and .gitignore
  2020-10-05  7:17 [PATCH 0/7] forbidding symlinked .gitattributes and .gitignore Jeff King
                   ` (7 preceding siblings ...)
  2020-10-05  7:32 ` [PATCH 0/7] forbidding symlinked .gitattributes and .gitignore Jonathan Nieder
@ 2020-10-05 12:16 ` Jeff King
  2020-10-05 12:16   ` [PATCH v2 1/8] fsck_tree(): fix shadowed variable Jeff King
                     ` (9 more replies)
  8 siblings, 10 replies; 55+ messages in thread
From: Jeff King @ 2020-10-05 12:16 UTC (permalink / raw)
  To: git; +Cc: Jonathan Nieder

On Mon, Oct 05, 2020 at 03:17:51AM -0400, Jeff King wrote:

> About 2 years ago as part of a security release we made it illegal to
> have a symlinked .gitmodules file (refusing it both in the index and via
> fsck). At the time we discussed (on the security list) outlawing
> symlinks for other .git files in the same way, but we decided not to do
> so as part of the security release, as it wasn't strictly necessary.
> 
> We publicly revisited the topic in:
> 
>   https://lore.kernel.org/git/20190114230902.GG162110@google.com/
> 
> but there were a few fixes needed, and it got forgotten. So here it is
> again, with those fixes:
> [...]

And here's a v2 incorporating feedback from Jonathan. There are no
substantial changes in the code. Most of the fixes are cosmetic, but the
tests are beefed up a bit, as well:

 - we now test that ntfs and hfs names are matched via fsck and
   verify_path() for all file types. The bulk of this is in a new patch
   5, and the final patches are adjusted to use the new helper.

 - we confirm that read-tree doesn't write the forbidden entry into the
   index (in addition to seeing that it complains)

 - the test script name is now "bad-dotgitx" instead of the vague
   "bad-meta-files"

 - whitespace, typo-fixes, clarity, etc; the range diff is below

  [1/8]: fsck_tree(): fix shadowed variable
  [2/8]: fsck_tree(): wrap some long lines
  [3/8]: t7415: rename to expand scope
  [4/8]: t7450: test verify_path() handling of gitmodules
  [5/8]: t7450: test .gitmodules symlink matching against obscured names
  [6/8]: t0060: test obscured .gitattributes and .gitignore matching
  [7/8]: verify_path(): disallow symlinks in .gitattributes and .gitignore
  [8/8]: fsck: complain when .gitattributes or .gitignore is a symlink

 fsck.c                                        | 79 +++++++++++----
 read-cache.c                                  | 12 ++-
 t/helper/test-path-utils.c                    | 41 +++++---
 t/t0060-path-utils.sh                         | 20 ++++
 ...le-names.sh => t7450-bad-dotgitx-files.sh} | 99 +++++++++++++------
 5 files changed, 187 insertions(+), 64 deletions(-)
 rename t/{t7415-submodule-names.sh => t7450-bad-dotgitx-files.sh} (73%)

1:  d4c4b98188 ! 1:  78689a44ba fsck_tree(): fix shadowed variable
    @@ Commit message
     
         Let's rename both variables in the function to avoid confusion. This
         makes the diff a little noisy (e.g., all of the report() calls outside
    -    the loop wee already correct but need touched), but makes sure we catch
    -    all cases and will avoid similar confusion in the future.
    +    the loop were already correct but need to be touched), but makes sure we
    +    catch all cases and will avoid similar confusion in the future.
    +
    +    Note that our test change removes the comment about translation. It was
    +    arguably confusing since 674ba34038 (fsck: mark strings for translation,
    +    2018-11-10); we wouldn't translate gitmodulesSymlink, but it did get
    +    removed by GETTEXT_POISON because that feature eats embedded
    +    %s characters. But certainly after this patch, when we look for the
    +    "tree %s: %s" format, we could get foiled by translation.
     
         Signed-off-by: Jeff King <peff@peff.net>
     
    @@ t/t7415-submodule-names.sh: test_expect_success 'fsck detects symlinked .gitmodu
     +		tree=$(git mktree <bad-tree) &&
      
      		# Check not only that we fail, but that it is due to the
    - 		# symlink detector; this grep string comes from the config
    - 		# variable name and will not be translated.
    +-		# symlink detector; this grep string comes from the config
    +-		# variable name and will not be translated.
    ++		# symlink detector
      		test_must_fail git fsck 2>output &&
     -		test_i18ngrep gitmodulesSymlink output
     +		test_i18ngrep "tree $tree: gitmodulesSymlink" output
2:  29d0d3af44 = 2:  b1f7ec465c fsck_tree(): wrap some long lines
3:  8679f0b2f2 ! 3:  d26ef683fd t7415: rename to expand scope
    @@ Commit message
     
         Signed-off-by: Jeff King <peff@peff.net>
     
    - ## t/t7415-submodule-names.sh => t/t7450-bad-meta-files.sh ##
    + ## t/t7415-submodule-names.sh => t/t7450-bad-dotgitx-files.sh ##
     @@
      #!/bin/sh
      
4:  84e58f7f46 ! 4:  493a4c79a3 t7450: test verify_path() handling of gitmodules
    @@ Commit message
     
         Signed-off-by: Jeff King <peff@peff.net>
     
    - ## t/t7450-bad-meta-files.sh ##
    -@@ t/t7450-bad-meta-files.sh: test_expect_success 'index-pack --strict works for non-repo pack' '
    + ## t/t7450-bad-dotgitx-files.sh ##
    +@@ t/t7450-bad-dotgitx-files.sh: test_expect_success 'index-pack --strict works for non-repo pack' '
      	grep gitmodulesName output
      '
      
     -test_expect_success 'fsck detects symlinked .gitmodules file' '
    -+test_expect_success 'create repo with symlinked .gitmodules file' '
    ++test_expect_success 'set up repo with symlinked .gitmodules file' '
      	git init symlink &&
      	(
      		cd symlink &&
    -@@ t/t7450-bad-meta-files.sh: test_expect_success 'fsck detects symlinked .gitmodules file' '
    +@@ t/t7450-bad-dotgitx-files.sh: test_expect_success 'fsck detects symlinked .gitmodules file' '
      		{
      			printf "100644 blob $content\t$tricky\n" &&
      			printf "120000 blob $target\t.gitmodules\n"
    @@ t/t7450-bad-meta-files.sh: test_expect_success 'fsck detects symlinked .gitmodul
     +		cd symlink &&
      
      		# Check not only that we fail, but that it is due to the
    - 		# symlink detector; this grep string comes from the config
    -@@ t/t7450-bad-meta-files.sh: test_expect_success 'fsck detects symlinked .gitmodules file' '
    + 		# symlink detector
    +@@ t/t7450-bad-dotgitx-files.sh: test_expect_success 'fsck detects symlinked .gitmodules file' '
      	)
      '
      
    -+test_expect_success 'refuse to load symlinked .gitmodule into index' '
    ++test_expect_success 'refuse to load symlinked .gitmodules into index' '
     +	test_must_fail git -C symlink read-tree $tree 2>err &&
    -+	test_i18ngrep "invalid path.*gitmodules" err
    ++	test_i18ngrep "invalid path.*gitmodules" err &&
    ++	git -C symlink ls-files >out &&
    ++	test_must_be_empty out
     +'
     +
      test_expect_success 'fsck detects non-blob .gitmodules' '
-:  ---------- > 5:  db5e78ff5b t7450: test .gitmodules symlink matching against obscured names
5:  e141e49a5b ! 6:  b5962a75a4 t0060: test obscured .gitattributes and .gitignore matching
    @@ t/helper/test-path-utils.c: static struct test_data dirname_data[] = {
     +		if (!strcmp("--not", *argv))
     +			expect = !expect;
     +		else if (expect != (is_hfs(*argv) || is_ntfs(*argv)))
    -+			 res = error("'%s' is %s.%s", *argv,
    -+				     expect ? "not " : "", x);
    ++			res = error("'%s' is %s.git%s", *argv,
    ++				    expect ? "not " : "", x);
     +		else
    -+			fprintf(stderr, "ok: '%s' is %s.%s\n",
    ++			fprintf(stderr, "ok: '%s' is %s.git%s\n",
     +				*argv, expect ? "" : "not ", x);
     +	}
     +	return !!res;
    @@ t/helper/test-path-utils.c: int cmd__path_utils(int argc, const char **argv)
     -				fprintf(stderr, "ok: '%s' is %s.gitmodules\n",
     -					argv[i], expect ? "" : "not ");
     -		return !!res;
    -+		return check_dotgitx("gitmodules", argv + 2,
    ++		return check_dotgitx("modules", argv + 2,
     +				     is_hfs_dotgitmodules,
     +				     is_ntfs_dotgitmodules);
     +	}
     +	if (argc > 2 && !strcmp(argv[1], "is_dotgitignore")) {
    -+		return check_dotgitx("gitignore", argv + 2,
    ++		return check_dotgitx("ignore", argv + 2,
     +				     is_hfs_dotgitignore,
     +				     is_ntfs_dotgitignore);
     +	}
     +	if (argc > 2 && !strcmp(argv[1], "is_dotgitattributes")) {
    -+		return check_dotgitx("gitattributes", argv + 2,
    ++		return check_dotgitx("attributes", argv + 2,
     +				     is_hfs_dotgitattributes,
     +				     is_ntfs_dotgitattributes);
      	}
6:  d214bbd8ec ! 7:  e4ec698a5b verify_path(): disallow symlinks in .gitattributes and .gitignore
    @@ Commit message
         0fc333ba20 (is_hfs_dotgit: match other .git files, 2018-05-02), which
         were done as part of the series touching .gitmodules.
     
    +    No tests yet, as we'll add them in a subsequent patch once we have fsck
    +    support, too.
    +
         Signed-off-by: Jeff King <peff@peff.net>
     
      ## read-cache.c ##
    @@ read-cache.c: int verify_path(const char *path, unsigned mode)
      						return 0;
      				}
      			}
    -
    - ## t/t7450-bad-meta-files.sh ##
    -@@ t/t7450-bad-meta-files.sh: test_expect_success 'git dirs of sibling submodules must not be nested' '
    - 	test_i18ngrep "is inside git dir" err
    - '
    - 
    -+test_expect_success 'create repo with symlinked .gitattributes file' '
    -+	git init symlink-attr &&
    -+	target=$(echo target | git -C symlink-attr hash-object -w --stdin) &&
    -+	tree=$(
    -+		printf "120000 blob $target\t.gitattributes\n" |
    -+		git -C symlink-attr mktree
    -+	)
    -+'
    -+
    -+test_expect_success 'refuse to load symlinked .gitattributes into index' '
    -+	test_must_fail git -C symlink-attr read-tree $tree 2>err &&
    -+	test_i18ngrep "invalid path.*gitattributes" err
    -+'
    -+
    -+test_expect_success 'create repo with symlinked .gitignore file' '
    -+	git init symlink-ignore &&
    -+	target=$(echo target | git -C symlink-ignore hash-object -w --stdin) &&
    -+	tree=$(
    -+		printf "120000 blob $target\t.gitignore\n" |
    -+		git -C symlink-ignore mktree
    -+	)
    -+'
    -+
    -+test_expect_success 'refuse to load symlinked .gitignore into index' '
    -+	test_must_fail git -C symlink-ignore read-tree $tree 2>err &&
    -+	test_i18ngrep "invalid path.*gitignore" err
    -+'
    -+
    -+
    - test_done
7:  49423d03b5 ! 8:  58c9ce0f3c fsck: complain when .gitattributes or .gitignore is a symlink
    @@ Commit message
         check to fsck, which matches how we handle .gitmodules symlinks, via
         b7b1fca175 (fsck: complain when .gitmodules is a symlink, 2018-05-04).
     
    -    Note that we won't add these to the existing gitmodules block. Its logic
    -    is a bit more complicated, as we also check the content of non-symlink
    -    instances we find. But for these new files, there is no content check;
    -    we're just looking at the name and mode of the tree entry (and we can
    -    avoid even the complicated name checks in the common case that the mode
    -    doesn't indicate a symlink).
    +    Note that we won't add these to the existing gitmodules block. The logic
    +    for gitmodules is a bit more complicated, as we also check the content
    +    of non-symlink instances we find. But for these new files, there is no
    +    content check; we're just looking at the name and mode of the tree entry
    +    (and we can avoid even the complicated name checks in the common case
    +    that the mode doesn't indicate a symlink).
    +
    +    We can reuse the test helper function we defined for .gitmodules,
    +    though (and this covers the verify_path() change from the previous
    +    commit, as well).
     
         Signed-off-by: Jeff King <peff@peff.net>
     
    @@ fsck.c: static int fsck_tree(const struct object_id *tree_oid,
      			while (backslash) {
      				backslash++;
     
    - ## t/t7450-bad-meta-files.sh ##
    -@@ t/t7450-bad-meta-files.sh: test_expect_success 'refuse to load symlinked .gitattributes into index' '
    - 	test_i18ngrep "invalid path.*gitattributes" err
    - '
    + ## t/t7450-bad-dotgitx-files.sh ##
    +@@ t/t7450-bad-dotgitx-files.sh: check_forbidden_symlink gitmodules vanilla .gitmodules
    + check_forbidden_symlink gitmodules ntfs ".gitmodules ."
    + check_forbidden_symlink gitmodules hfs ".${u200c}gitmodules"
      
    -+test_expect_success 'fsck detects symlinked .gitattributes file' '
    -+	test_must_fail git -C symlink-attr fsck 2>err &&
    -+	test_i18ngrep "tree $tree: gitattributesSymlink" err
    -+'
    ++check_forbidden_symlink gitattributes vanilla .gitattributes
    ++check_forbidden_symlink gitattributes ntfs ".gitattributes ."
    ++check_forbidden_symlink gitattributes hfs ".${u200c}gitattributes"
     +
    - test_expect_success 'create repo with symlinked .gitignore file' '
    - 	git init symlink-ignore &&
    - 	target=$(echo target | git -C symlink-ignore hash-object -w --stdin) &&
    -@@ t/t7450-bad-meta-files.sh: test_expect_success 'refuse to load symlinked .gitignore into index' '
    - 	test_i18ngrep "invalid path.*gitignore" err
    - '
    - 
    -+test_expect_success 'fsck detects symlinked .gitignore file' '
    -+	test_must_fail git -C symlink-ignore fsck 2>err &&
    -+	test_i18ngrep "tree $tree: gitignoreSymlink" err
    -+'
    - 
    - test_done
    ++check_forbidden_symlink gitignore vanilla .gitignore
    ++check_forbidden_symlink gitignore ntfs ".gitignore ."
    ++check_forbidden_symlink gitignore hfs ".${u200c}gitignore"
    ++
    + test_expect_success 'fsck detects non-blob .gitmodules' '
    + 	git init non-blob &&
    + 	(

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v2 1/8] fsck_tree(): fix shadowed variable
  2020-10-05 12:16 ` [PATCH v2 0/8] " Jeff King
@ 2020-10-05 12:16   ` Jeff King
  2020-10-05 12:16   ` [PATCH v2 2/8] fsck_tree(): wrap some long lines Jeff King
                     ` (8 subsequent siblings)
  9 siblings, 0 replies; 55+ messages in thread
From: Jeff King @ 2020-10-05 12:16 UTC (permalink / raw)
  To: git; +Cc: Jonathan Nieder

Commit b2f2039c2b (fsck: accept an oid instead of a "struct tree" for
fsck_tree(), 2019-10-18) introduced a new "oid" parameter to
fsck_tree(), and we pass it to the report() function when we find
problems. However, that is shadowed within the tree-walking loop by the
existing "oid" variable which we use to store the oid of each tree
entry. As a result, we may report the wrong oid for some problems we
detect within the loop (the entry oid, instead of the tree oid).

Our tests didn't catch this because they checked only that we found the
expected fsck problem, not that it was attached to the correct object.

Let's rename both variables in the function to avoid confusion. This
makes the diff a little noisy (e.g., all of the report() calls outside
the loop were already correct but need to be touched), but makes sure we
catch all cases and will avoid similar confusion in the future.

Note that our test change removes the comment about translation. It was
arguably confusing since 674ba34038 (fsck: mark strings for translation,
2018-11-10); we wouldn't translate gitmodulesSymlink, but it did get
removed by GETTEXT_POISON because that feature eats embedded
%s characters. But certainly after this patch, when we look for the
"tree %s: %s" format, we could get foiled by translation.

Signed-off-by: Jeff King <peff@peff.net>
---
 fsck.c                     | 40 +++++++++++++++++++-------------------
 t/t7415-submodule-names.sh |  8 ++++----
 2 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/fsck.c b/fsck.c
index f82e2fe9e3..46a108839f 100644
--- a/fsck.c
+++ b/fsck.c
@@ -633,7 +633,7 @@ static int verify_ordered(unsigned mode1, const char *name1,
 	return c1 < c2 ? 0 : TREE_UNORDERED;
 }
 
-static int fsck_tree(const struct object_id *oid,
+static int fsck_tree(const struct object_id *tree_oid,
 		     const char *buffer, unsigned long size,
 		     struct fsck_options *options)
 {
@@ -654,7 +654,7 @@ static int fsck_tree(const struct object_id *oid,
 	struct name_stack df_dup_candidates = { NULL };
 
 	if (init_tree_desc_gently(&desc, buffer, size)) {
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree");
+		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree");
 		return retval;
 	}
 
@@ -664,11 +664,11 @@ static int fsck_tree(const struct object_id *oid,
 	while (desc.size) {
 		unsigned short mode;
 		const char *name, *backslash;
-		const struct object_id *oid;
+		const struct object_id *entry_oid;
 
-		oid = tree_entry_extract(&desc, &name, &mode);
+		entry_oid = tree_entry_extract(&desc, &name, &mode);
 
-		has_null_sha1 |= is_null_oid(oid);
+		has_null_sha1 |= is_null_oid(entry_oid);
 		has_full_path |= !!strchr(name, '/');
 		has_empty_name |= !*name;
 		has_dot |= !strcmp(name, ".");
@@ -678,10 +678,10 @@ static int fsck_tree(const struct object_id *oid,
 
 		if (is_hfs_dotgitmodules(name) || is_ntfs_dotgitmodules(name)) {
 			if (!S_ISLNK(mode))
-				oidset_insert(&gitmodules_found, oid);
+				oidset_insert(&gitmodules_found, entry_oid);
 			else
 				retval += report(options,
-						 oid, OBJ_TREE,
+						 tree_oid, OBJ_TREE,
 						 FSCK_MSG_GITMODULES_SYMLINK,
 						 ".gitmodules is a symbolic link");
 		}
@@ -692,9 +692,9 @@ static int fsck_tree(const struct object_id *oid,
 				has_dotgit |= is_ntfs_dotgit(backslash);
 				if (is_ntfs_dotgitmodules(backslash)) {
 					if (!S_ISLNK(mode))
-						oidset_insert(&gitmodules_found, oid);
+						oidset_insert(&gitmodules_found, entry_oid);
 					else
-						retval += report(options, oid, OBJ_TREE,
+						retval += report(options, tree_oid, OBJ_TREE,
 								 FSCK_MSG_GITMODULES_SYMLINK,
 								 ".gitmodules is a symbolic link");
 				}
@@ -703,7 +703,7 @@ static int fsck_tree(const struct object_id *oid,
 		}
 
 		if (update_tree_entry_gently(&desc)) {
-			retval += report(options, oid, OBJ_TREE, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree");
+			retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree");
 			break;
 		}
 
@@ -751,25 +751,25 @@ static int fsck_tree(const struct object_id *oid,
 	name_stack_clear(&df_dup_candidates);
 
 	if (has_null_sha1)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_NULL_SHA1, "contains entries pointing to null sha1");
+		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_NULL_SHA1, "contains entries pointing to null sha1");
 	if (has_full_path)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_FULL_PATHNAME, "contains full pathnames");
+		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_FULL_PATHNAME, "contains full pathnames");
 	if (has_empty_name)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_EMPTY_NAME, "contains empty pathname");
+		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_EMPTY_NAME, "contains empty pathname");
 	if (has_dot)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_HAS_DOT, "contains '.'");
+		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_HAS_DOT, "contains '.'");
 	if (has_dotdot)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_HAS_DOTDOT, "contains '..'");
+		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_HAS_DOTDOT, "contains '..'");
 	if (has_dotgit)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_HAS_DOTGIT, "contains '.git'");
+		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_HAS_DOTGIT, "contains '.git'");
 	if (has_zero_pad)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_ZERO_PADDED_FILEMODE, "contains zero-padded file modes");
+		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_ZERO_PADDED_FILEMODE, "contains zero-padded file modes");
 	if (has_bad_modes)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_BAD_FILEMODE, "contains bad file modes");
+		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_BAD_FILEMODE, "contains bad file modes");
 	if (has_dup_entries)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_DUPLICATE_ENTRIES, "contains duplicate file entries");
+		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_DUPLICATE_ENTRIES, "contains duplicate file entries");
 	if (not_properly_sorted)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_TREE_NOT_SORTED, "not properly sorted");
+		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_TREE_NOT_SORTED, "not properly sorted");
 	return retval;
 }
 
diff --git a/t/t7415-submodule-names.sh b/t/t7415-submodule-names.sh
index f70368bc2e..d1781ef10c 100755
--- a/t/t7415-submodule-names.sh
+++ b/t/t7415-submodule-names.sh
@@ -148,13 +148,13 @@ test_expect_success 'fsck detects symlinked .gitmodules file' '
 		{
 			printf "100644 blob $content\t$tricky\n" &&
 			printf "120000 blob $target\t.gitmodules\n"
-		} | git mktree &&
+		} >bad-tree &&
+		tree=$(git mktree <bad-tree) &&
 
 		# Check not only that we fail, but that it is due to the
-		# symlink detector; this grep string comes from the config
-		# variable name and will not be translated.
+		# symlink detector
 		test_must_fail git fsck 2>output &&
-		test_i18ngrep gitmodulesSymlink output
+		test_i18ngrep "tree $tree: gitmodulesSymlink" output
 	)
 '
 
-- 
2.28.0.1295.gf70bcb366f


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v2 2/8] fsck_tree(): wrap some long lines
  2020-10-05 12:16 ` [PATCH v2 0/8] " Jeff King
  2020-10-05 12:16   ` [PATCH v2 1/8] fsck_tree(): fix shadowed variable Jeff King
@ 2020-10-05 12:16   ` Jeff King
  2020-10-05 12:16   ` [PATCH v2 3/8] t7415: rename to expand scope Jeff King
                     ` (7 subsequent siblings)
  9 siblings, 0 replies; 55+ messages in thread
From: Jeff King @ 2020-10-05 12:16 UTC (permalink / raw)
  To: git; +Cc: Jonathan Nieder

Many calls to report() in fsck_tree() are kept on a single line and are
quite long. Most were pretty big to begin with, but have gotten even
longer over the years as we've added more parameters. Let's accept the
churn of wrapping them in order to conform to our usual line limits.

Signed-off-by: Jeff King <peff@peff.net>
---
 fsck.c | 48 ++++++++++++++++++++++++++++++++++++------------
 1 file changed, 36 insertions(+), 12 deletions(-)

diff --git a/fsck.c b/fsck.c
index 46a108839f..024810139b 100644
--- a/fsck.c
+++ b/fsck.c
@@ -654,7 +654,9 @@ static int fsck_tree(const struct object_id *tree_oid,
 	struct name_stack df_dup_candidates = { NULL };
 
 	if (init_tree_desc_gently(&desc, buffer, size)) {
-		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree");
+		retval += report(options, tree_oid, OBJ_TREE,
+				 FSCK_MSG_BAD_TREE,
+				 "cannot be parsed as a tree");
 		return retval;
 	}
 
@@ -703,7 +705,9 @@ static int fsck_tree(const struct object_id *tree_oid,
 		}
 
 		if (update_tree_entry_gently(&desc)) {
-			retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree");
+			retval += report(options, tree_oid, OBJ_TREE,
+					 FSCK_MSG_BAD_TREE,
+					 "cannot be parsed as a tree");
 			break;
 		}
 
@@ -751,25 +755,45 @@ static int fsck_tree(const struct object_id *tree_oid,
 	name_stack_clear(&df_dup_candidates);
 
 	if (has_null_sha1)
-		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_NULL_SHA1, "contains entries pointing to null sha1");
+		retval += report(options, tree_oid, OBJ_TREE,
+				 FSCK_MSG_NULL_SHA1,
+				 "contains entries pointing to null sha1");
 	if (has_full_path)
-		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_FULL_PATHNAME, "contains full pathnames");
+		retval += report(options, tree_oid, OBJ_TREE,
+				 FSCK_MSG_FULL_PATHNAME,
+				 "contains full pathnames");
 	if (has_empty_name)
-		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_EMPTY_NAME, "contains empty pathname");
+		retval += report(options, tree_oid, OBJ_TREE,
+				 FSCK_MSG_EMPTY_NAME,
+				 "contains empty pathname");
 	if (has_dot)
-		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_HAS_DOT, "contains '.'");
+		retval += report(options, tree_oid, OBJ_TREE,
+				 FSCK_MSG_HAS_DOT,
+				 "contains '.'");
 	if (has_dotdot)
-		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_HAS_DOTDOT, "contains '..'");
+		retval += report(options, tree_oid, OBJ_TREE,
+				 FSCK_MSG_HAS_DOTDOT,
+				 "contains '..'");
 	if (has_dotgit)
-		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_HAS_DOTGIT, "contains '.git'");
+		retval += report(options, tree_oid, OBJ_TREE,
+				 FSCK_MSG_HAS_DOTGIT,
+				 "contains '.git'");
 	if (has_zero_pad)
-		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_ZERO_PADDED_FILEMODE, "contains zero-padded file modes");
+		retval += report(options, tree_oid, OBJ_TREE,
+				 FSCK_MSG_ZERO_PADDED_FILEMODE,
+				 "contains zero-padded file modes");
 	if (has_bad_modes)
-		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_BAD_FILEMODE, "contains bad file modes");
+		retval += report(options, tree_oid, OBJ_TREE,
+				 FSCK_MSG_BAD_FILEMODE,
+				 "contains bad file modes");
 	if (has_dup_entries)
-		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_DUPLICATE_ENTRIES, "contains duplicate file entries");
+		retval += report(options, tree_oid, OBJ_TREE,
+				 FSCK_MSG_DUPLICATE_ENTRIES,
+				 "contains duplicate file entries");
 	if (not_properly_sorted)
-		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_TREE_NOT_SORTED, "not properly sorted");
+		retval += report(options, tree_oid, OBJ_TREE,
+				 FSCK_MSG_TREE_NOT_SORTED,
+				 "not properly sorted");
 	return retval;
 }
 
-- 
2.28.0.1295.gf70bcb366f


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v2 3/8] t7415: rename to expand scope
  2020-10-05 12:16 ` [PATCH v2 0/8] " Jeff King
  2020-10-05 12:16   ` [PATCH v2 1/8] fsck_tree(): fix shadowed variable Jeff King
  2020-10-05 12:16   ` [PATCH v2 2/8] fsck_tree(): wrap some long lines Jeff King
@ 2020-10-05 12:16   ` Jeff King
  2020-10-05 12:16   ` [PATCH v2 4/8] t7450: test verify_path() handling of gitmodules Jeff King
                     ` (6 subsequent siblings)
  9 siblings, 0 replies; 55+ messages in thread
From: Jeff King @ 2020-10-05 12:16 UTC (permalink / raw)
  To: git; +Cc: Jonathan Nieder

This script has already expanded beyond its original intent of ".. in
submodule names" to include other malicious submodule bits. Let's update
the name and description to reflect that, as well as the fact that we'll
soon be adding similar tests for other meta-files (.gitattributes, etc).
We'll also renumber it to move it out of the group of submodule-specific
tests.

Signed-off-by: Jeff King <peff@peff.net>
---
 ...ubmodule-names.sh => t7450-bad-dotgitx-files.sh} | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)
 rename t/{t7415-submodule-names.sh => t7450-bad-dotgitx-files.sh} (95%)

diff --git a/t/t7415-submodule-names.sh b/t/t7450-bad-dotgitx-files.sh
similarity index 95%
rename from t/t7415-submodule-names.sh
rename to t/t7450-bad-dotgitx-files.sh
index d1781ef10c..fb28539948 100755
--- a/t/t7415-submodule-names.sh
+++ b/t/t7450-bad-dotgitx-files.sh
@@ -1,9 +1,16 @@
 #!/bin/sh
 
-test_description='check handling of .. in submodule names
+test_description='check forbidden or malicious patterns in .git* files
 
-Exercise the name-checking function on a variety of names, and then give a
-real-world setup that confirms we catch this in practice.
+Such as:
+
+  - presence of .. in submodule names;
+    Exercise the name-checking function on a variety of names, and then give a
+    real-world setup that confirms we catch this in practice.
+
+  - nested submodule names
+
+  - symlinked .gitmodules, etc
 '
 . ./test-lib.sh
 . "$TEST_DIRECTORY"/lib-pack.sh
-- 
2.28.0.1295.gf70bcb366f


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v2 4/8] t7450: test verify_path() handling of gitmodules
  2020-10-05 12:16 ` [PATCH v2 0/8] " Jeff King
                     ` (2 preceding siblings ...)
  2020-10-05 12:16   ` [PATCH v2 3/8] t7415: rename to expand scope Jeff King
@ 2020-10-05 12:16   ` Jeff King
  2020-10-05 12:16   ` [PATCH v2 5/8] t7450: test .gitmodules symlink matching against obscured names Jeff King
                     ` (5 subsequent siblings)
  9 siblings, 0 replies; 55+ messages in thread
From: Jeff King @ 2020-10-05 12:16 UTC (permalink / raw)
  To: git; +Cc: Jonathan Nieder

Commit 10ecfa7649 (verify_path: disallow symlinks in .gitmodules,
2018-05-04) made it impossible to load a symlink .gitmodules file into
the index. However, there are no tests of this behavior. Let's make sure
this case is covered. We can easily reuse the test setup created by
the matching b7b1fca175 (fsck: complain when .gitmodules is a symlink,
2018-05-04).

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t7450-bad-dotgitx-files.sh | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/t/t7450-bad-dotgitx-files.sh b/t/t7450-bad-dotgitx-files.sh
index fb28539948..8bfd32b10a 100755
--- a/t/t7450-bad-dotgitx-files.sh
+++ b/t/t7450-bad-dotgitx-files.sh
@@ -139,7 +139,7 @@ test_expect_success 'index-pack --strict works for non-repo pack' '
 	grep gitmodulesName output
 '
 
-test_expect_success 'fsck detects symlinked .gitmodules file' '
+test_expect_success 'set up repo with symlinked .gitmodules file' '
 	git init symlink &&
 	(
 		cd symlink &&
@@ -155,8 +155,14 @@ test_expect_success 'fsck detects symlinked .gitmodules file' '
 		{
 			printf "100644 blob $content\t$tricky\n" &&
 			printf "120000 blob $target\t.gitmodules\n"
-		} >bad-tree &&
-		tree=$(git mktree <bad-tree) &&
+		} >bad-tree
+	) &&
+	tree=$(git -C symlink mktree <symlink/bad-tree)
+'
+
+test_expect_success 'fsck detects symlinked .gitmodules file' '
+	(
+		cd symlink &&
 
 		# Check not only that we fail, but that it is due to the
 		# symlink detector
@@ -165,6 +171,13 @@ test_expect_success 'fsck detects symlinked .gitmodules file' '
 	)
 '
 
+test_expect_success 'refuse to load symlinked .gitmodules into index' '
+	test_must_fail git -C symlink read-tree $tree 2>err &&
+	test_i18ngrep "invalid path.*gitmodules" err &&
+	git -C symlink ls-files >out &&
+	test_must_be_empty out
+'
+
 test_expect_success 'fsck detects non-blob .gitmodules' '
 	git init non-blob &&
 	(
-- 
2.28.0.1295.gf70bcb366f


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v2 5/8] t7450: test .gitmodules symlink matching against obscured names
  2020-10-05 12:16 ` [PATCH v2 0/8] " Jeff King
                     ` (3 preceding siblings ...)
  2020-10-05 12:16   ` [PATCH v2 4/8] t7450: test verify_path() handling of gitmodules Jeff King
@ 2020-10-05 12:16   ` Jeff King
  2020-10-05 12:16   ` [PATCH v2 6/8] t0060: test obscured .gitattributes and .gitignore matching Jeff King
                     ` (4 subsequent siblings)
  9 siblings, 0 replies; 55+ messages in thread
From: Jeff King @ 2020-10-05 12:16 UTC (permalink / raw)
  To: git; +Cc: Jonathan Nieder

In t7450 we check that both verify_path() and fsck catch malformed
.gitmodules entries in trees. However, we don't check that we catch
filesystem-equivalent forms of these (e.g., ".GITMOD~1" on Windows).
Our name-matching functions are exercised well in t0060, but there's
nothing to test that we correctly call the matching functions from the
actual fsck and verify_path() code.

So instead of testing just .gitmodules, let's repeat our tests for a few
basic cases. We don't need to be exhaustive here (t0060 handles that),
but just make sure we hit one name of each type.

Besides pushing the tests into a function that takes the path as a
parameter, we'll need to do a few things:

  - adjust the directory name to accommodate the tests running multiple
    times

  - set core.protecthfs for index checks. Fsck always protects all types
    by default, but we want to be able to exercise the HFS routines on
    every system. Note that core.protectntfs is already the default
    these days, but it doesn't hurt to explicitly label our need for it.

  - we'll also take the filename ("gitmodules") as a parameter. All
    calls use the same name for now, but a future patch will extend this
    to handle other .gitfoo files. Note that our fake-content symlink
    destination is somewhat .gitmodules specific. But it isn't necessary
    for other files (which don't do a content check). And it happens to
    be a valid attribute and ignore file anyway.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t7450-bad-dotgitx-files.sh | 91 +++++++++++++++++++++---------------
 1 file changed, 53 insertions(+), 38 deletions(-)

diff --git a/t/t7450-bad-dotgitx-files.sh b/t/t7450-bad-dotgitx-files.sh
index 8bfd32b10a..0cd0f71c39 100755
--- a/t/t7450-bad-dotgitx-files.sh
+++ b/t/t7450-bad-dotgitx-files.sh
@@ -139,44 +139,59 @@ test_expect_success 'index-pack --strict works for non-repo pack' '
 	grep gitmodulesName output
 '
 
-test_expect_success 'set up repo with symlinked .gitmodules file' '
-	git init symlink &&
-	(
-		cd symlink &&
-
-		# Make the tree directly to avoid index restrictions.
-		#
-		# Because symlinks store the target as a blob, choose
-		# a pathname that could be parsed as a .gitmodules file
-		# to trick naive non-symlink-aware checking.
-		tricky="[foo]bar=true" &&
-		content=$(git hash-object -w ../.gitmodules) &&
-		target=$(printf "$tricky" | git hash-object -w --stdin) &&
-		{
-			printf "100644 blob $content\t$tricky\n" &&
-			printf "120000 blob $target\t.gitmodules\n"
-		} >bad-tree
-	) &&
-	tree=$(git -C symlink mktree <symlink/bad-tree)
-'
-
-test_expect_success 'fsck detects symlinked .gitmodules file' '
-	(
-		cd symlink &&
-
-		# Check not only that we fail, but that it is due to the
-		# symlink detector
-		test_must_fail git fsck 2>output &&
-		test_i18ngrep "tree $tree: gitmodulesSymlink" output
-	)
-'
-
-test_expect_success 'refuse to load symlinked .gitmodules into index' '
-	test_must_fail git -C symlink read-tree $tree 2>err &&
-	test_i18ngrep "invalid path.*gitmodules" err &&
-	git -C symlink ls-files >out &&
-	test_must_be_empty out
-'
+check_forbidden_symlink () {
+	name=$1
+	type=$2
+	path=$3
+	dir=symlink-$name-$type
+
+	test_expect_success "set up repo with symlinked $name ($type)" '
+		git init $dir &&
+		(
+			cd $dir &&
+
+			# Make the tree directly to avoid index restrictions.
+			#
+			# Because symlinks store the target as a blob, choose
+			# a pathname that could be parsed as a .gitmodules file
+			# to trick naive non-symlink-aware checking.
+			tricky="[foo]bar=true" &&
+			content=$(git hash-object -w ../.gitmodules) &&
+			target=$(printf "$tricky" | git hash-object -w --stdin) &&
+			{
+				printf "100644 blob $content\t$tricky\n" &&
+				printf "120000 blob $target\t$path\n"
+			} >bad-tree
+		) &&
+		tree=$(git -C $dir mktree <$dir/bad-tree)
+	'
+
+	test_expect_success "fsck detects symlinked $name ($type)" '
+		(
+			cd $dir &&
+
+			# Check not only that we fail, but that it is due to the
+			# symlink detector
+			test_must_fail git fsck 2>output &&
+			test_i18ngrep "tree $tree: ${name}Symlink" output
+		)
+	'
+
+	test_expect_success "refuse to load symlinked $name into index ($type)" '
+		test_must_fail \
+			git -C $dir \
+			    -c core.protectntfs \
+			    -c core.protecthfs \
+			    read-tree $tree 2>err &&
+		test_i18ngrep "invalid path.*$name" err &&
+		git -C $dir ls-files -s >out &&
+		test_must_be_empty out
+	'
+}
+
+check_forbidden_symlink gitmodules vanilla .gitmodules
+check_forbidden_symlink gitmodules ntfs ".gitmodules ."
+check_forbidden_symlink gitmodules hfs ".${u200c}gitmodules"
 
 test_expect_success 'fsck detects non-blob .gitmodules' '
 	git init non-blob &&
-- 
2.28.0.1295.gf70bcb366f


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v2 6/8] t0060: test obscured .gitattributes and .gitignore matching
  2020-10-05 12:16 ` [PATCH v2 0/8] " Jeff King
                     ` (4 preceding siblings ...)
  2020-10-05 12:16   ` [PATCH v2 5/8] t7450: test .gitmodules symlink matching against obscured names Jeff King
@ 2020-10-05 12:16   ` Jeff King
  2020-10-05 12:16   ` [PATCH v2 7/8] verify_path(): disallow symlinks in .gitattributes and .gitignore Jeff King
                     ` (3 subsequent siblings)
  9 siblings, 0 replies; 55+ messages in thread
From: Jeff King @ 2020-10-05 12:16 UTC (permalink / raw)
  To: git; +Cc: Jonathan Nieder

We have tests that cover various filesystem-specific spellings of
".gitmodules", because we need to reliably identify that path for some
security checks. These are from dc2d9ba318 (is_{hfs,ntfs}_dotgitmodules:
add tests, 2018-05-12), with the actual code coming from e7cb0b4455
(is_ntfs_dotgit: match other .git files, 2018-05-11) and 0fc333ba20
(is_hfs_dotgit: match other .git files, 2018-05-02).

Those latter two commits also added similar matching functions for
.gitattributes and .gitignore. These ended up not being used in the
final series, and are currently dead code. But in preparation for them
being used, let's make sure they actually work by throwing a few basic
checks at them.

I didn't bother with the whole battery of tests that we cover for
.gitmodules. These functions are all based on the same generic matcher,
so it's sufficient to test most of the corner cases just once.

Note that the ntfs magic prefix names in the tests come from the
algorithm described in e7cb0b4455 (and are different for each file).

Signed-off-by: Jeff King <peff@peff.net>
---
 t/helper/test-path-utils.c | 41 ++++++++++++++++++++++++++------------
 t/t0060-path-utils.sh      | 20 +++++++++++++++++++
 2 files changed, 48 insertions(+), 13 deletions(-)

diff --git a/t/helper/test-path-utils.c b/t/helper/test-path-utils.c
index 313a153209..732372b97f 100644
--- a/t/helper/test-path-utils.c
+++ b/t/helper/test-path-utils.c
@@ -172,9 +172,22 @@ static struct test_data dirname_data[] = {
 	{ NULL,              NULL     }
 };
 
-static int is_dotgitmodules(const char *path)
+static int check_dotgitx(const char *x, const char **argv,
+			 int (*is_hfs)(const char *),
+			 int (*is_ntfs)(const char *))
 {
-	return is_hfs_dotgitmodules(path) || is_ntfs_dotgitmodules(path);
+	int res = 0, expect = 1;
+	for (; *argv; argv++) {
+		if (!strcmp("--not", *argv))
+			expect = !expect;
+		else if (expect != (is_hfs(*argv) || is_ntfs(*argv)))
+			res = error("'%s' is %s.git%s", *argv,
+				    expect ? "not " : "", x);
+		else
+			fprintf(stderr, "ok: '%s' is %s.git%s\n",
+				*argv, expect ? "" : "not ", x);
+	}
+	return !!res;
 }
 
 static int cmp_by_st_size(const void *a, const void *b)
@@ -382,17 +395,19 @@ int cmd__path_utils(int argc, const char **argv)
 		return test_function(dirname_data, posix_dirname, argv[1]);
 
 	if (argc > 2 && !strcmp(argv[1], "is_dotgitmodules")) {
-		int res = 0, expect = 1, i;
-		for (i = 2; i < argc; i++)
-			if (!strcmp("--not", argv[i]))
-				expect = !expect;
-			else if (expect != is_dotgitmodules(argv[i]))
-				res = error("'%s' is %s.gitmodules", argv[i],
-					    expect ? "not " : "");
-			else
-				fprintf(stderr, "ok: '%s' is %s.gitmodules\n",
-					argv[i], expect ? "" : "not ");
-		return !!res;
+		return check_dotgitx("modules", argv + 2,
+				     is_hfs_dotgitmodules,
+				     is_ntfs_dotgitmodules);
+	}
+	if (argc > 2 && !strcmp(argv[1], "is_dotgitignore")) {
+		return check_dotgitx("ignore", argv + 2,
+				     is_hfs_dotgitignore,
+				     is_ntfs_dotgitignore);
+	}
+	if (argc > 2 && !strcmp(argv[1], "is_dotgitattributes")) {
+		return check_dotgitx("attributes", argv + 2,
+				     is_hfs_dotgitattributes,
+				     is_ntfs_dotgitattributes);
 	}
 
 	if (argc > 2 && !strcmp(argv[1], "file-size")) {
diff --git a/t/t0060-path-utils.sh b/t/t0060-path-utils.sh
index 56db5c8aba..b2e3cf3f4c 100755
--- a/t/t0060-path-utils.sh
+++ b/t/t0060-path-utils.sh
@@ -468,6 +468,26 @@ test_expect_success 'match .gitmodules' '
 		.gitmodules,:\$DATA
 '
 
+test_expect_success 'match .gitattributes' '
+	test-tool path-utils is_dotgitattributes \
+		.gitattributes \
+		.git${u200c}attributes \
+		.Gitattributes \
+		.gitattributeS \
+		GITATT~1 \
+		GI7D29~1
+'
+
+test_expect_success 'match .gitignore' '
+	test-tool path-utils is_dotgitignore \
+		.gitignore \
+		.git${u200c}ignore \
+		.Gitignore \
+		.gitignorE \
+		GITIGN~1 \
+		GI250A~1
+'
+
 test_expect_success MINGW 'is_valid_path() on Windows' '
 	test-tool path-utils is_valid_path \
 		win32 \
-- 
2.28.0.1295.gf70bcb366f


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v2 7/8] verify_path(): disallow symlinks in .gitattributes and .gitignore
  2020-10-05 12:16 ` [PATCH v2 0/8] " Jeff King
                     ` (5 preceding siblings ...)
  2020-10-05 12:16   ` [PATCH v2 6/8] t0060: test obscured .gitattributes and .gitignore matching Jeff King
@ 2020-10-05 12:16   ` Jeff King
  2020-10-27  3:35     ` Jonathan Nieder
  2020-10-05 12:16   ` [PATCH v2 8/8] fsck: complain when .gitattributes or .gitignore is a symlink Jeff King
                     ` (2 subsequent siblings)
  9 siblings, 1 reply; 55+ messages in thread
From: Jeff King @ 2020-10-05 12:16 UTC (permalink / raw)
  To: git; +Cc: Jonathan Nieder

In commit 10ecfa7649 (verify_path: disallow symlinks in .gitmodules,
2018-05-04) we made it impossible to load a .gitmodules file that's a
symlink into the index. The security reasons for doing so are described
there. We also discussed forbidding symlinks of other .git files as part
of that fix, but the tradeoff was less compelling:

  1. Unlike .gitmodules, the other files don't have content-level fsck
     checks. So an attacker using symlinks to evade those checks isn't a
     problem.

  2. Unlike .gitmodules, Git will never write .gitignore or
     .gitattributes itself, making it much less likely to use them to
     write outside the repo. They could be used for out-of-repo reads,
     however.

  3. The .gitmodules change was part of a critical bug-fix that was
     not publicly disclosed until it was released. Changing the other
     files was not needed for the minimal fix.

However, it's still a reasonable idea to forbid symlinks for these
files:

  - As noted, they can still be used to read out-of-repo files (which is
    fairly restricted, but in some circumstances you can probe file
    content by speculatively creating files and seeing if they get
    ignored)

  - They don't currently behave well in all cases. We sometimes read
    these files from the index, where we _don't_ follow symlinks (we'd
    just treat the symlink target as the .gitignore or .gitattributes
    content, which is actively wrong).

This patch forbids symlinked versions of these files from entering the
index. We already have helpers for obscured forms of the names from
e7cb0b4455 (is_ntfs_dotgit: match other .git files, 2018-05-11) and
0fc333ba20 (is_hfs_dotgit: match other .git files, 2018-05-02), which
were done as part of the series touching .gitmodules.

No tests yet, as we'll add them in a subsequent patch once we have fsck
support, too.

Signed-off-by: Jeff King <peff@peff.net>
---
 read-cache.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index ecf6f68994..63aec6c35d 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -947,7 +947,9 @@ static int verify_dotfile(const char *rest, unsigned mode)
 			return 0;
 		if (S_ISLNK(mode)) {
 			rest += 3;
-			if (skip_iprefix(rest, "modules", &rest) &&
+			if ((skip_iprefix(rest, "modules", &rest) ||
+			     skip_iprefix(rest, "ignore", &rest) ||
+			     skip_iprefix(rest, "attributes", &rest)) &&
 			    (*rest == '\0' || is_dir_sep(*rest)))
 				return 0;
 		}
@@ -980,7 +982,9 @@ int verify_path(const char *path, unsigned mode)
 				if (is_hfs_dotgit(path))
 					return 0;
 				if (S_ISLNK(mode)) {
-					if (is_hfs_dotgitmodules(path))
+					if (is_hfs_dotgitmodules(path) ||
+					    is_hfs_dotgitignore(path) ||
+					    is_hfs_dotgitattributes(path))
 						return 0;
 				}
 			}
@@ -992,7 +996,9 @@ int verify_path(const char *path, unsigned mode)
 				if (is_ntfs_dotgit(path))
 					return 0;
 				if (S_ISLNK(mode)) {
-					if (is_ntfs_dotgitmodules(path))
+					if (is_ntfs_dotgitmodules(path) ||
+					    is_ntfs_dotgitignore(path) ||
+					    is_ntfs_dotgitattributes(path))
 						return 0;
 				}
 			}
-- 
2.28.0.1295.gf70bcb366f


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v2 8/8] fsck: complain when .gitattributes or .gitignore is a symlink
  2020-10-05 12:16 ` [PATCH v2 0/8] " Jeff King
                     ` (6 preceding siblings ...)
  2020-10-05 12:16   ` [PATCH v2 7/8] verify_path(): disallow symlinks in .gitattributes and .gitignore Jeff King
@ 2020-10-05 12:16   ` Jeff King
  2020-10-06 20:41   ` [PATCH v2 0/8] forbidding symlinked .gitattributes and .gitignore Junio C Hamano
  2020-10-20 23:19   ` Philip Oakley
  9 siblings, 0 replies; 55+ messages in thread
From: Jeff King @ 2020-10-05 12:16 UTC (permalink / raw)
  To: git; +Cc: Jonathan Nieder

The previous commit made it impossible to have a symlinked
.gitattributes or .gitignore file via verify_path(). Let's add the same
check to fsck, which matches how we handle .gitmodules symlinks, via
b7b1fca175 (fsck: complain when .gitmodules is a symlink, 2018-05-04).

Note that we won't add these to the existing gitmodules block. The logic
for gitmodules is a bit more complicated, as we also check the content
of non-symlink instances we find. But for these new files, there is no
content check; we're just looking at the name and mode of the tree entry
(and we can avoid even the complicated name checks in the common case
that the mode doesn't indicate a symlink).

We can reuse the test helper function we defined for .gitmodules,
though (and this covers the verify_path() change from the previous
commit, as well).

Signed-off-by: Jeff King <peff@peff.net>
---
 fsck.c                       | 15 +++++++++++++++
 t/t7450-bad-dotgitx-files.sh |  8 ++++++++
 2 files changed, 23 insertions(+)

diff --git a/fsck.c b/fsck.c
index 024810139b..fcd3f268b1 100644
--- a/fsck.c
+++ b/fsck.c
@@ -67,6 +67,8 @@ static struct oidset gitmodules_done = OIDSET_INIT;
 	FUNC(GITMODULES_URL, ERROR) \
 	FUNC(GITMODULES_PATH, ERROR) \
 	FUNC(GITMODULES_UPDATE, ERROR) \
+	FUNC(GITIGNORE_SYMLINK, ERROR) \
+	FUNC(GITATTRIBUTES_SYMLINK, ERROR) \
 	/* warnings */ \
 	FUNC(BAD_FILEMODE, WARN) \
 	FUNC(EMPTY_NAME, WARN) \
@@ -688,6 +690,19 @@ static int fsck_tree(const struct object_id *tree_oid,
 						 ".gitmodules is a symbolic link");
 		}
 
+		if (S_ISLNK(mode)) {
+			if (is_hfs_dotgitignore(name) ||
+			    is_ntfs_dotgitignore(name))
+				retval += report(options, tree_oid, OBJ_TREE,
+						 FSCK_MSG_GITIGNORE_SYMLINK,
+						 ".gitignore is a symlink");
+			if (is_hfs_dotgitattributes(name) ||
+			    is_ntfs_dotgitattributes(name))
+				retval += report(options, tree_oid, OBJ_TREE,
+						 FSCK_MSG_GITATTRIBUTES_SYMLINK,
+						 ".gitattributes is a symlink");
+		}
+
 		if ((backslash = strchr(name, '\\'))) {
 			while (backslash) {
 				backslash++;
diff --git a/t/t7450-bad-dotgitx-files.sh b/t/t7450-bad-dotgitx-files.sh
index 0cd0f71c39..326b34e167 100755
--- a/t/t7450-bad-dotgitx-files.sh
+++ b/t/t7450-bad-dotgitx-files.sh
@@ -193,6 +193,14 @@ check_forbidden_symlink gitmodules vanilla .gitmodules
 check_forbidden_symlink gitmodules ntfs ".gitmodules ."
 check_forbidden_symlink gitmodules hfs ".${u200c}gitmodules"
 
+check_forbidden_symlink gitattributes vanilla .gitattributes
+check_forbidden_symlink gitattributes ntfs ".gitattributes ."
+check_forbidden_symlink gitattributes hfs ".${u200c}gitattributes"
+
+check_forbidden_symlink gitignore vanilla .gitignore
+check_forbidden_symlink gitignore ntfs ".gitignore ."
+check_forbidden_symlink gitignore hfs ".${u200c}gitignore"
+
 test_expect_success 'fsck detects non-blob .gitmodules' '
 	git init non-blob &&
 	(
-- 
2.28.0.1295.gf70bcb366f

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 5/7] t0060: test obscured .gitattributes and .gitignore matching
  2020-10-05  8:40     ` Jeff King
@ 2020-10-05 21:20       ` Johannes Schindelin
  2020-10-06 14:01         ` Jeff King
  0 siblings, 1 reply; 55+ messages in thread
From: Johannes Schindelin @ 2020-10-05 21:20 UTC (permalink / raw)
  To: Jeff King; +Cc: Jonathan Nieder, git

Hi Peff & Jonathan N,

On Mon, 5 Oct 2020, Jeff King wrote:

> On Mon, Oct 05, 2020 at 01:03:53AM -0700, Jonathan Nieder wrote:
>
> > > Note that the ntfs magic prefix names in the tests come from the
> > > algorithm described in e7cb0b4455 (and are different for each file).
> >
> > Doesn't block this patch, but I'm curious: how hard would it be to make
> > a test with an NTFS prerequisite that makes sure we got the magic prefix
> > right?
>
> I suspect hard since Dscho punted on it in the original series. :) If I
> understand correctly, it would require having an NTFS filesystem, and
> generating 10,000+ files with a clashing prefix.

It's not quite _as_ bad: you only need to generate 4 files with a clashing
prefix and then the real one:

-- snip --
me@work MINGW64 ~/repros/ntfs-short-names
$ touch .gitattributes1

me@work MINGW64 ~/repros/ntfs-short-names
$ touch .gitattributes2

me@work MINGW64 ~/repros/ntfs-short-names
$ touch .gitattributes3

me@work MINGW64 ~/repros/ntfs-short-names
$ touch .gitattributes4

me@work MINGW64 ~/repros/ntfs-short-names
$ touch .gitattributes

me@work MINGW64 ~/repros/ntfs-short-names
$ touch .gitignore1

me@work MINGW64 ~/repros/ntfs-short-names
$ touch .gitignore2

me@work MINGW64 ~/repros/ntfs-short-names
$ touch .gitignore3

me@work MINGW64 ~/repros/ntfs-short-names
$ touch .gitignore4

me@work MINGW64 ~/repros/ntfs-short-names
$ touch .gitignore

me@work MINGW64 ~/repros/ntfs-short-names
$ cmd //c dir //x
 Volume in drive C is OSDisk
 Volume Serial Number is 5E6B-4E77

 Directory of C:\Users\me\repros\ntfs-short-names

10/05/2020  11:11 PM    <DIR>                       .
10/05/2020  11:11 PM    <DIR>                       ..
10/05/2020  11:08 PM                 0 GI7D29~1     .gitattributes
10/05/2020  11:08 PM                 0 GITATT~1     .gitattributes1
10/05/2020  11:08 PM                 0 GITATT~2     .gitattributes2
10/05/2020  11:08 PM                 0 GITATT~3     .gitattributes3
10/05/2020  11:08 PM                 0 GITATT~4     .gitattributes4
10/05/2020  11:11 PM                 0 GI250A~1     .gitignore
10/05/2020  11:11 PM                 0 GITIGN~1     .gitignore1
10/05/2020  11:11 PM                 0 GITIGN~2     .gitignore2
10/05/2020  11:11 PM                 0 GITIGN~3     .gitignore3
10/05/2020  11:11 PM                 0 GITIGN~4     .gitignore4
              10 File(s)              0 bytes
               2 Dir(s)  314,658,705,408 bytes free
-- snap --

But I don't necessarily think that it would make sense to add that test:
it adds churn _every_ time the regression test is run, and by deity, it
sure takes way too long on Windows _already_, and the test would be for a
regression _in the NTFS driver_.

At this stage, I also highly doubt that the algorithm will change ever
again (the last time it changed was several Windows versions ago, I want
to say in Windows XP, but it could have been all the way back to NT).

In light of that, I'd say that the bang is rather small and the buck would
be not small at all, and would have to be paid by developers on Windows
who already pay a disproportionately high price when running the test
suite, so...

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 5/7] t0060: test obscured .gitattributes and .gitignore matching
  2020-10-05 21:20       ` Johannes Schindelin
@ 2020-10-06 14:01         ` Jeff King
  0 siblings, 0 replies; 55+ messages in thread
From: Jeff King @ 2020-10-06 14:01 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Jonathan Nieder, git

On Mon, Oct 05, 2020 at 11:20:48PM +0200, Johannes Schindelin wrote:

> > > > Note that the ntfs magic prefix names in the tests come from the
> > > > algorithm described in e7cb0b4455 (and are different for each file).
> > >
> > > Doesn't block this patch, but I'm curious: how hard would it be to make
> > > a test with an NTFS prerequisite that makes sure we got the magic prefix
> > > right?
> >
> > I suspect hard since Dscho punted on it in the original series. :) If I
> > understand correctly, it would require having an NTFS filesystem, and
> > generating 10,000+ files with a clashing prefix.
> 
> It's not quite _as_ bad: you only need to generate 4 files with a clashing
> prefix and then the real one:

Ah, that really isn't that bad, then. Still, I don't mind leaving this
as-is under the notion that if the algorithm does change, it would
likely make it onto your radar anyway (or the radar of _anybody_ who
would raise the issue).

-Peff

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 0/8] forbidding symlinked .gitattributes and .gitignore
  2020-10-05 12:16 ` [PATCH v2 0/8] " Jeff King
                     ` (7 preceding siblings ...)
  2020-10-05 12:16   ` [PATCH v2 8/8] fsck: complain when .gitattributes or .gitignore is a symlink Jeff King
@ 2020-10-06 20:41   ` Junio C Hamano
  2020-10-20 23:19   ` Philip Oakley
  9 siblings, 0 replies; 55+ messages in thread
From: Junio C Hamano @ 2020-10-06 20:41 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Jonathan Nieder

Jeff King <peff@peff.net> writes:

> On Mon, Oct 05, 2020 at 03:17:51AM -0400, Jeff King wrote:
>
>> About 2 years ago as part of a security release we made it illegal to
>> have a symlinked .gitmodules file (refusing it both in the index and via
>> fsck). At the time we discussed (on the security list) outlawing
>> symlinks for other .git files in the same way, but we decided not to do
>> so as part of the security release, as it wasn't strictly necessary.
>> 
>> We publicly revisited the topic in:
>> 
>>   https://lore.kernel.org/git/20190114230902.GG162110@google.com/
>> 
>> but there were a few fixes needed, and it got forgotten. So here it is
>> again, with those fixes:
>> [...]
>
> And here's a v2 incorporating feedback from Jonathan. There are no
> substantial changes in the code. Most of the fixes are cosmetic, but the
> tests are beefed up a bit, as well:
>
>  - we now test that ntfs and hfs names are matched via fsck and
>    verify_path() for all file types. The bulk of this is in a new patch
>    5, and the final patches are adjusted to use the new helper.
>
>  - we confirm that read-tree doesn't write the forbidden entry into the
>    index (in addition to seeing that it complains)
>
>  - the test script name is now "bad-dotgitx" instead of the vague
>    "bad-meta-files"
>
>  - whitespace, typo-fixes, clarity, etc; the range diff is below

Thanks for a pleasant read.  Will queue.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 0/8] forbidding symlinked .gitattributes and .gitignore
  2020-10-05 12:16 ` [PATCH v2 0/8] " Jeff King
                     ` (8 preceding siblings ...)
  2020-10-06 20:41   ` [PATCH v2 0/8] forbidding symlinked .gitattributes and .gitignore Junio C Hamano
@ 2020-10-20 23:19   ` Philip Oakley
  2020-10-23  8:17     ` [PATCH] documentation symlink restrictions for .git* files Jeff King
  9 siblings, 1 reply; 55+ messages in thread
From: Philip Oakley @ 2020-10-20 23:19 UTC (permalink / raw)
  To: Jeff King, git; +Cc: Jonathan Nieder

A rather late comment,

On 05/10/2020 13:16, Jeff King wrote:
> On Mon, Oct 05, 2020 at 03:17:51AM -0400, Jeff King wrote:
>
>> About 2 years ago as part of a security release we made it illegal to
>> have a symlinked .gitmodules file (refusing it both in the index and via
>> fsck). At the time we discussed (on the security list) outlawing
>> symlinks for other .git files in the same way, but we decided not to do
>> so as part of the security release, as it wasn't strictly necessary.

Is this something that should be recorded in the documentation, either as a
simple (sensible) limitation, or explicitly as a security related safety
measure?

I didn't see any changes to the .txt docs in the change list below.

Philip
>>
>> We publicly revisited the topic in:
>>
>>   https://lore.kernel.org/git/20190114230902.GG162110@google.com/
>>
>> but there were a few fixes needed, and it got forgotten. So here it is
>> again, with those fixes:
>> [...]
> And here's a v2 incorporating feedback from Jonathan. There are no
> substantial changes in the code. Most of the fixes are cosmetic, but the
> tests are beefed up a bit, as well:
>
>  - we now test that ntfs and hfs names are matched via fsck and
>    verify_path() for all file types. The bulk of this is in a new patch
>    5, and the final patches are adjusted to use the new helper.
>
>  - we confirm that read-tree doesn't write the forbidden entry into the
>    index (in addition to seeing that it complains)
>
>  - the test script name is now "bad-dotgitx" instead of the vague
>    "bad-meta-files"
>
>  - whitespace, typo-fixes, clarity, etc; the range diff is below
>
>   [1/8]: fsck_tree(): fix shadowed variable
>   [2/8]: fsck_tree(): wrap some long lines
>   [3/8]: t7415: rename to expand scope
>   [4/8]: t7450: test verify_path() handling of gitmodules
>   [5/8]: t7450: test .gitmodules symlink matching against obscured names
>   [6/8]: t0060: test obscured .gitattributes and .gitignore matching
>   [7/8]: verify_path(): disallow symlinks in .gitattributes and .gitignore
>   [8/8]: fsck: complain when .gitattributes or .gitignore is a symlink
>
>  fsck.c                                        | 79 +++++++++++----
>  read-cache.c                                  | 12 ++-
>  t/helper/test-path-utils.c                    | 41 +++++---
>  t/t0060-path-utils.sh                         | 20 ++++
>  ...le-names.sh => t7450-bad-dotgitx-files.sh} | 99 +++++++++++++------
>  5 files changed, 187 insertions(+), 64 deletions(-)
>  rename t/{t7415-submodule-names.sh => t7450-bad-dotgitx-files.sh} (73%)
>
> 1:  d4c4b98188 ! 1:  78689a44ba fsck_tree(): fix shadowed variable
>     @@ Commit message
>      
>          Let's rename both variables in the function to avoid confusion. This
>          makes the diff a little noisy (e.g., all of the report() calls outside
>     -    the loop wee already correct but need touched), but makes sure we catch
>     -    all cases and will avoid similar confusion in the future.
>     +    the loop were already correct but need to be touched), but makes sure we
>     +    catch all cases and will avoid similar confusion in the future.
>     +
>     +    Note that our test change removes the comment about translation. It was
>     +    arguably confusing since 674ba34038 (fsck: mark strings for translation,
>     +    2018-11-10); we wouldn't translate gitmodulesSymlink, but it did get
>     +    removed by GETTEXT_POISON because that feature eats embedded
>     +    %s characters. But certainly after this patch, when we look for the
>     +    "tree %s: %s" format, we could get foiled by translation.
>      
>          Signed-off-by: Jeff King <peff@peff.net>
>      
>     @@ t/t7415-submodule-names.sh: test_expect_success 'fsck detects symlinked .gitmodu
>      +		tree=$(git mktree <bad-tree) &&
>       
>       		# Check not only that we fail, but that it is due to the
>     - 		# symlink detector; this grep string comes from the config
>     - 		# variable name and will not be translated.
>     +-		# symlink detector; this grep string comes from the config
>     +-		# variable name and will not be translated.
>     ++		# symlink detector
>       		test_must_fail git fsck 2>output &&
>      -		test_i18ngrep gitmodulesSymlink output
>      +		test_i18ngrep "tree $tree: gitmodulesSymlink" output
> 2:  29d0d3af44 = 2:  b1f7ec465c fsck_tree(): wrap some long lines
> 3:  8679f0b2f2 ! 3:  d26ef683fd t7415: rename to expand scope
>     @@ Commit message
>      
>          Signed-off-by: Jeff King <peff@peff.net>
>      
>     - ## t/t7415-submodule-names.sh => t/t7450-bad-meta-files.sh ##
>     + ## t/t7415-submodule-names.sh => t/t7450-bad-dotgitx-files.sh ##
>      @@
>       #!/bin/sh
>       
> 4:  84e58f7f46 ! 4:  493a4c79a3 t7450: test verify_path() handling of gitmodules
>     @@ Commit message
>      
>          Signed-off-by: Jeff King <peff@peff.net>
>      
>     - ## t/t7450-bad-meta-files.sh ##
>     -@@ t/t7450-bad-meta-files.sh: test_expect_success 'index-pack --strict works for non-repo pack' '
>     + ## t/t7450-bad-dotgitx-files.sh ##
>     +@@ t/t7450-bad-dotgitx-files.sh: test_expect_success 'index-pack --strict works for non-repo pack' '
>       	grep gitmodulesName output
>       '
>       
>      -test_expect_success 'fsck detects symlinked .gitmodules file' '
>     -+test_expect_success 'create repo with symlinked .gitmodules file' '
>     ++test_expect_success 'set up repo with symlinked .gitmodules file' '
>       	git init symlink &&
>       	(
>       		cd symlink &&
>     -@@ t/t7450-bad-meta-files.sh: test_expect_success 'fsck detects symlinked .gitmodules file' '
>     +@@ t/t7450-bad-dotgitx-files.sh: test_expect_success 'fsck detects symlinked .gitmodules file' '
>       		{
>       			printf "100644 blob $content\t$tricky\n" &&
>       			printf "120000 blob $target\t.gitmodules\n"
>     @@ t/t7450-bad-meta-files.sh: test_expect_success 'fsck detects symlinked .gitmodul
>      +		cd symlink &&
>       
>       		# Check not only that we fail, but that it is due to the
>     - 		# symlink detector; this grep string comes from the config
>     -@@ t/t7450-bad-meta-files.sh: test_expect_success 'fsck detects symlinked .gitmodules file' '
>     + 		# symlink detector
>     +@@ t/t7450-bad-dotgitx-files.sh: test_expect_success 'fsck detects symlinked .gitmodules file' '
>       	)
>       '
>       
>     -+test_expect_success 'refuse to load symlinked .gitmodule into index' '
>     ++test_expect_success 'refuse to load symlinked .gitmodules into index' '
>      +	test_must_fail git -C symlink read-tree $tree 2>err &&
>     -+	test_i18ngrep "invalid path.*gitmodules" err
>     ++	test_i18ngrep "invalid path.*gitmodules" err &&
>     ++	git -C symlink ls-files >out &&
>     ++	test_must_be_empty out
>      +'
>      +
>       test_expect_success 'fsck detects non-blob .gitmodules' '
> -:  ---------- > 5:  db5e78ff5b t7450: test .gitmodules symlink matching against obscured names
> 5:  e141e49a5b ! 6:  b5962a75a4 t0060: test obscured .gitattributes and .gitignore matching
>     @@ t/helper/test-path-utils.c: static struct test_data dirname_data[] = {
>      +		if (!strcmp("--not", *argv))
>      +			expect = !expect;
>      +		else if (expect != (is_hfs(*argv) || is_ntfs(*argv)))
>     -+			 res = error("'%s' is %s.%s", *argv,
>     -+				     expect ? "not " : "", x);
>     ++			res = error("'%s' is %s.git%s", *argv,
>     ++				    expect ? "not " : "", x);
>      +		else
>     -+			fprintf(stderr, "ok: '%s' is %s.%s\n",
>     ++			fprintf(stderr, "ok: '%s' is %s.git%s\n",
>      +				*argv, expect ? "" : "not ", x);
>      +	}
>      +	return !!res;
>     @@ t/helper/test-path-utils.c: int cmd__path_utils(int argc, const char **argv)
>      -				fprintf(stderr, "ok: '%s' is %s.gitmodules\n",
>      -					argv[i], expect ? "" : "not ");
>      -		return !!res;
>     -+		return check_dotgitx("gitmodules", argv + 2,
>     ++		return check_dotgitx("modules", argv + 2,
>      +				     is_hfs_dotgitmodules,
>      +				     is_ntfs_dotgitmodules);
>      +	}
>      +	if (argc > 2 && !strcmp(argv[1], "is_dotgitignore")) {
>     -+		return check_dotgitx("gitignore", argv + 2,
>     ++		return check_dotgitx("ignore", argv + 2,
>      +				     is_hfs_dotgitignore,
>      +				     is_ntfs_dotgitignore);
>      +	}
>      +	if (argc > 2 && !strcmp(argv[1], "is_dotgitattributes")) {
>     -+		return check_dotgitx("gitattributes", argv + 2,
>     ++		return check_dotgitx("attributes", argv + 2,
>      +				     is_hfs_dotgitattributes,
>      +				     is_ntfs_dotgitattributes);
>       	}
> 6:  d214bbd8ec ! 7:  e4ec698a5b verify_path(): disallow symlinks in .gitattributes and .gitignore
>     @@ Commit message
>          0fc333ba20 (is_hfs_dotgit: match other .git files, 2018-05-02), which
>          were done as part of the series touching .gitmodules.
>      
>     +    No tests yet, as we'll add them in a subsequent patch once we have fsck
>     +    support, too.
>     +
>          Signed-off-by: Jeff King <peff@peff.net>
>      
>       ## read-cache.c ##
>     @@ read-cache.c: int verify_path(const char *path, unsigned mode)
>       						return 0;
>       				}
>       			}
>     -
>     - ## t/t7450-bad-meta-files.sh ##
>     -@@ t/t7450-bad-meta-files.sh: test_expect_success 'git dirs of sibling submodules must not be nested' '
>     - 	test_i18ngrep "is inside git dir" err
>     - '
>     - 
>     -+test_expect_success 'create repo with symlinked .gitattributes file' '
>     -+	git init symlink-attr &&
>     -+	target=$(echo target | git -C symlink-attr hash-object -w --stdin) &&
>     -+	tree=$(
>     -+		printf "120000 blob $target\t.gitattributes\n" |
>     -+		git -C symlink-attr mktree
>     -+	)
>     -+'
>     -+
>     -+test_expect_success 'refuse to load symlinked .gitattributes into index' '
>     -+	test_must_fail git -C symlink-attr read-tree $tree 2>err &&
>     -+	test_i18ngrep "invalid path.*gitattributes" err
>     -+'
>     -+
>     -+test_expect_success 'create repo with symlinked .gitignore file' '
>     -+	git init symlink-ignore &&
>     -+	target=$(echo target | git -C symlink-ignore hash-object -w --stdin) &&
>     -+	tree=$(
>     -+		printf "120000 blob $target\t.gitignore\n" |
>     -+		git -C symlink-ignore mktree
>     -+	)
>     -+'
>     -+
>     -+test_expect_success 'refuse to load symlinked .gitignore into index' '
>     -+	test_must_fail git -C symlink-ignore read-tree $tree 2>err &&
>     -+	test_i18ngrep "invalid path.*gitignore" err
>     -+'
>     -+
>     -+
>     - test_done
> 7:  49423d03b5 ! 8:  58c9ce0f3c fsck: complain when .gitattributes or .gitignore is a symlink
>     @@ Commit message
>          check to fsck, which matches how we handle .gitmodules symlinks, via
>          b7b1fca175 (fsck: complain when .gitmodules is a symlink, 2018-05-04).
>      
>     -    Note that we won't add these to the existing gitmodules block. Its logic
>     -    is a bit more complicated, as we also check the content of non-symlink
>     -    instances we find. But for these new files, there is no content check;
>     -    we're just looking at the name and mode of the tree entry (and we can
>     -    avoid even the complicated name checks in the common case that the mode
>     -    doesn't indicate a symlink).
>     +    Note that we won't add these to the existing gitmodules block. The logic
>     +    for gitmodules is a bit more complicated, as we also check the content
>     +    of non-symlink instances we find. But for these new files, there is no
>     +    content check; we're just looking at the name and mode of the tree entry
>     +    (and we can avoid even the complicated name checks in the common case
>     +    that the mode doesn't indicate a symlink).
>     +
>     +    We can reuse the test helper function we defined for .gitmodules,
>     +    though (and this covers the verify_path() change from the previous
>     +    commit, as well).
>      
>          Signed-off-by: Jeff King <peff@peff.net>
>      
>     @@ fsck.c: static int fsck_tree(const struct object_id *tree_oid,
>       			while (backslash) {
>       				backslash++;
>      
>     - ## t/t7450-bad-meta-files.sh ##
>     -@@ t/t7450-bad-meta-files.sh: test_expect_success 'refuse to load symlinked .gitattributes into index' '
>     - 	test_i18ngrep "invalid path.*gitattributes" err
>     - '
>     + ## t/t7450-bad-dotgitx-files.sh ##
>     +@@ t/t7450-bad-dotgitx-files.sh: check_forbidden_symlink gitmodules vanilla .gitmodules
>     + check_forbidden_symlink gitmodules ntfs ".gitmodules ."
>     + check_forbidden_symlink gitmodules hfs ".${u200c}gitmodules"
>       
>     -+test_expect_success 'fsck detects symlinked .gitattributes file' '
>     -+	test_must_fail git -C symlink-attr fsck 2>err &&
>     -+	test_i18ngrep "tree $tree: gitattributesSymlink" err
>     -+'
>     ++check_forbidden_symlink gitattributes vanilla .gitattributes
>     ++check_forbidden_symlink gitattributes ntfs ".gitattributes ."
>     ++check_forbidden_symlink gitattributes hfs ".${u200c}gitattributes"
>      +
>     - test_expect_success 'create repo with symlinked .gitignore file' '
>     - 	git init symlink-ignore &&
>     - 	target=$(echo target | git -C symlink-ignore hash-object -w --stdin) &&
>     -@@ t/t7450-bad-meta-files.sh: test_expect_success 'refuse to load symlinked .gitignore into index' '
>     - 	test_i18ngrep "invalid path.*gitignore" err
>     - '
>     - 
>     -+test_expect_success 'fsck detects symlinked .gitignore file' '
>     -+	test_must_fail git -C symlink-ignore fsck 2>err &&
>     -+	test_i18ngrep "tree $tree: gitignoreSymlink" err
>     -+'
>     - 
>     - test_done
>     ++check_forbidden_symlink gitignore vanilla .gitignore
>     ++check_forbidden_symlink gitignore ntfs ".gitignore ."
>     ++check_forbidden_symlink gitignore hfs ".${u200c}gitignore"
>     ++
>     + test_expect_success 'fsck detects non-blob .gitmodules' '
>     + 	git init non-blob &&
>     + 	(


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH] documentation symlink restrictions for .git* files
  2020-10-20 23:19   ` Philip Oakley
@ 2020-10-23  8:17     ` Jeff King
  2020-10-23  8:27       ` Jeff King
  2020-10-26 22:18       ` Philip Oakley
  0 siblings, 2 replies; 55+ messages in thread
From: Jeff King @ 2020-10-23  8:17 UTC (permalink / raw)
  To: Philip Oakley; +Cc: git, Jonathan Nieder

On Wed, Oct 21, 2020 at 12:19:25AM +0100, Philip Oakley wrote:

> On 05/10/2020 13:16, Jeff King wrote:
> > On Mon, Oct 05, 2020 at 03:17:51AM -0400, Jeff King wrote:
> >
> >> About 2 years ago as part of a security release we made it illegal to
> >> have a symlinked .gitmodules file (refusing it both in the index and via
> >> fsck). At the time we discussed (on the security list) outlawing
> >> symlinks for other .git files in the same way, but we decided not to do
> >> so as part of the security release, as it wasn't strictly necessary.
> 
> Is this something that should be recorded in the documentation, either as a
> simple (sensible) limitation, or explicitly as a security related safety
> measure?
> 
> I didn't see any changes to the .txt docs in the change list below.

Yeah, that's a good point.

How about this (on top of jk/symlinked-dotgitx-files)?

-- >8 --
Subject: [PATCH] documentation symlink restrictions for .git* files

We outlawed symbolic link versions of various .git files in 10ecfa7649
(verify_path: disallow symlinks in .gitmodules, 2018-05-04) and
dd4c2fe66b (verify_path(): disallow symlinks in .gitattributes and
.gitignore, 2020-10-05). The reasons are discussed in detail there, but
we never adjusted the documentation to let users know.

This hasn't been a big deal since the point is that such setups were
mildly broken and thought to be unusual anyway. But it certainly doesn't
hurt to be clear and explicit about it.

Suggested-by: Philip Oakley <philipoakley@iee.email>
Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/gitattributes.txt | 7 +++++++
 Documentation/gitignore.txt     | 5 +++++
 Documentation/gitmodules.txt    | 8 ++++++++
 3 files changed, 20 insertions(+)

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 2d0a03715b..9a2ce4f1ea 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -1241,6 +1241,13 @@ to:
 [attr]binary -diff -merge -text
 ------------
 
+NOTES
+-----
+
+Note that Git does not allow a `.gitattributes` file within the working
+tree to be a symbolic link, and will refuse to check out such a tree
+entry.  This keeps behavior consistent when the file is accessed from
+the index or a tree versus from the filesystem.
 
 EXAMPLES
 --------
diff --git a/Documentation/gitignore.txt b/Documentation/gitignore.txt
index d47b1ae296..7e9a1d49d6 100644
--- a/Documentation/gitignore.txt
+++ b/Documentation/gitignore.txt
@@ -149,6 +149,11 @@ not tracked by Git remain untracked.
 To stop tracking a file that is currently tracked, use
 'git rm --cached'.
 
+Note that Git does not allow a `.gitignore` file within the working tree
+to be a symbolic link, and will refuse to check out such a tree entry.
+This keeps behavior consistent when the file is accessed from the index
+or a tree versus from the filesystem.
+
 EXAMPLES
 --------
 
diff --git a/Documentation/gitmodules.txt b/Documentation/gitmodules.txt
index 539b4e1997..2b884be3c7 100644
--- a/Documentation/gitmodules.txt
+++ b/Documentation/gitmodules.txt
@@ -98,6 +98,14 @@ submodule.<name>.shallow::
 	shallow clone (with a history depth of 1) unless the user explicitly
 	asks for a non-shallow clone.
 
+NOTES
+-----
+
+Note that Git does not allow the `.gitmodules` file within a working
+tree to be a symbolic link, and will refuse to check out such a tree
+entry. This keeps behavior consistent when the file is accessed from the
+index or a tree versus from the filesystem, and helps Git reliably
+enforce security checks of the file contents.
 
 EXAMPLES
 --------
-- 
2.29.0.583.g8e3ac41d8f


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] documentation symlink restrictions for .git* files
  2020-10-23  8:17     ` [PATCH] documentation symlink restrictions for .git* files Jeff King
@ 2020-10-23  8:27       ` Jeff King
  2020-10-26 22:18       ` Philip Oakley
  1 sibling, 0 replies; 55+ messages in thread
From: Jeff King @ 2020-10-23  8:27 UTC (permalink / raw)
  To: Philip Oakley; +Cc: git, Jonathan Nieder

On Fri, Oct 23, 2020 at 04:17:11AM -0400, Jeff King wrote:

> Subject: [PATCH] documentation symlink restrictions for .git* files

Oops, that should be "document symlink" of course.

-Peff

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] documentation symlink restrictions for .git* files
  2020-10-23  8:17     ` [PATCH] documentation symlink restrictions for .git* files Jeff King
  2020-10-23  8:27       ` Jeff King
@ 2020-10-26 22:18       ` Philip Oakley
  2020-10-26 22:53         ` Jeff King
  1 sibling, 1 reply; 55+ messages in thread
From: Philip Oakley @ 2020-10-26 22:18 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Jonathan Nieder

On 23/10/2020 09:17, Jeff King wrote:
> On Wed, Oct 21, 2020 at 12:19:25AM +0100, Philip Oakley wrote:
>
>> On 05/10/2020 13:16, Jeff King wrote:
>>> On Mon, Oct 05, 2020 at 03:17:51AM -0400, Jeff King wrote:
>>>
>>>> About 2 years ago as part of a security release we made it illegal to
>>>> have a symlinked .gitmodules file (refusing it both in the index and via
>>>> fsck). At the time we discussed (on the security list) outlawing
>>>> symlinks for other .git files in the same way, but we decided not to do
>>>> so as part of the security release, as it wasn't strictly necessary.
>> Is this something that should be recorded in the documentation, either as a
>> simple (sensible) limitation, or explicitly as a security related safety
>> measure?
>>
>> I didn't see any changes to the .txt docs in the change list below.
> Yeah, that's a good point.
>
> How about this (on top of jk/symlinked-dotgitx-files)?
>
> -- >8 --
> Subject: [PATCH] documentation symlink restrictions for .git* files
>
> We outlawed symbolic link versions of various .git files in 10ecfa7649
> (verify_path: disallow symlinks in .gitmodules, 2018-05-04) and
> dd4c2fe66b (verify_path(): disallow symlinks in .gitattributes and
> .gitignore, 2020-10-05). The reasons are discussed in detail there, but
> we never adjusted the documentation to let users know.
>
> This hasn't been a big deal since the point is that such setups were
> mildly broken and thought to be unusual anyway. But it certainly doesn't
> hurt to be clear and explicit about it.
>
> Suggested-by: Philip Oakley <philipoakley@iee.email>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  Documentation/gitattributes.txt | 7 +++++++
>  Documentation/gitignore.txt     | 5 +++++
>  Documentation/gitmodules.txt    | 8 ++++++++
>  3 files changed, 20 insertions(+)
>
> diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
> index 2d0a03715b..9a2ce4f1ea 100644
> --- a/Documentation/gitattributes.txt
> +++ b/Documentation/gitattributes.txt
> @@ -1241,6 +1241,13 @@ to:
>  [attr]binary -diff -merge -text
>  ------------
>  
> +NOTES
> +-----
> +
> +Note that Git does not allow a `.gitattributes` file within the working
> +tree to be a symbolic link, and will refuse to check out such a tree
> +entry.  This keeps behavior consistent when the file is accessed from
> +the index or a tree versus from the filesystem.
>  
>  EXAMPLES
>  --------
> diff --git a/Documentation/gitignore.txt b/Documentation/gitignore.txt
> index d47b1ae296..7e9a1d49d6 100644
> --- a/Documentation/gitignore.txt
> +++ b/Documentation/gitignore.txt
> @@ -149,6 +149,11 @@ not tracked by Git remain untracked.
>  To stop tracking a file that is currently tracked, use
>  'git rm --cached'.
>  
> +Note that Git does not allow a `.gitignore` file within the working tree
> +to be a symbolic link, and will refuse to check out such a tree entry.
> +This keeps behavior consistent when the file is accessed from the index
> +or a tree versus from the filesystem.
> +
>  EXAMPLES
>  --------
>  
> diff --git a/Documentation/gitmodules.txt b/Documentation/gitmodules.txt
> index 539b4e1997..2b884be3c7 100644
> --- a/Documentation/gitmodules.txt
> +++ b/Documentation/gitmodules.txt
> @@ -98,6 +98,14 @@ submodule.<name>.shallow::
>  	shallow clone (with a history depth of 1) unless the user explicitly
>  	asks for a non-shallow clone.
>  
> +NOTES
> +-----
> +
> +Note that Git does not allow the `.gitmodules` file within a working
> +tree to be a symbolic link, and will refuse to check out such a tree
> +entry. This keeps behavior consistent when the file is accessed from the
> +index or a tree versus from the filesystem, and helps Git reliably
> +enforce security checks of the file contents.
>  
>  EXAMPLES
>  --------
The text looks good to me, with security point explicitly mentioned just
for .gitmodules file.

However, is placing the Note so far down appropriate (.gitattributes and
.gitignore), given that there is within the descriptions a discussion of
the priority order for finding those files?

Philip

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] documentation symlink restrictions for .git* files
  2020-10-26 22:18       ` Philip Oakley
@ 2020-10-26 22:53         ` Jeff King
  2020-10-26 23:32           ` Junio C Hamano
  0 siblings, 1 reply; 55+ messages in thread
From: Jeff King @ 2020-10-26 22:53 UTC (permalink / raw)
  To: Philip Oakley; +Cc: git, Jonathan Nieder

On Mon, Oct 26, 2020 at 10:18:18PM +0000, Philip Oakley wrote:

> > +NOTES
> > +-----
> > +
> > +Note that Git does not allow the `.gitmodules` file within a working
> > +tree to be a symbolic link, and will refuse to check out such a tree
> > +entry. This keeps behavior consistent when the file is accessed from the
> > +index or a tree versus from the filesystem, and helps Git reliably
> > +enforce security checks of the file contents.
> >  
> >  EXAMPLES
> >  --------
> The text looks good to me, with security point explicitly mentioned just
> for .gitmodules file.
> 
> However, is placing the Note so far down appropriate (.gitattributes and
> .gitignore), given that there is within the descriptions a discussion of
> the priority order for finding those files?

Definitely a "NOTES" section should go in that spot, but possibly the
text should be in the "DESCRIPTION" section. I was worried about
cluttering that early part with a detail that most people wouldn't care
too much about.

Looks like my patch is in 'next'; do you want to propose a patch moving
it around on top?

-Peff

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] documentation symlink restrictions for .git* files
  2020-10-26 22:53         ` Jeff King
@ 2020-10-26 23:32           ` Junio C Hamano
  2020-10-27  7:26             ` Jeff King
  0 siblings, 1 reply; 55+ messages in thread
From: Junio C Hamano @ 2020-10-26 23:32 UTC (permalink / raw)
  To: Jeff King; +Cc: Philip Oakley, git, Jonathan Nieder

Jeff King <peff@peff.net> writes:

> Definitely a "NOTES" section should go in that spot, but possibly the
> text should be in the "DESCRIPTION" section. I was worried about
> cluttering that early part with a detail that most people wouldn't care
> too much about.
>
> Looks like my patch is in 'next'; do you want to propose a patch moving
> it around on top?

It probably is possible to tweak the introductory text this way
without being unnecessarily loud and keep the NOTES section where it
is.

I personally found that the placement of new text was OK, but this
may be overly compensated by my tendency to discount things that we
have discussed recently, which our mind often consider (only for
relative a short moment) as more important than they are, relative
to other things in the same document.

 Documentation/gitattributes.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git i/Documentation/gitattributes.txt w/Documentation/gitattributes.txt
index 2d0a03715b..fd2de7344a 100644
--- i/Documentation/gitattributes.txt
+++ w/Documentation/gitattributes.txt
@@ -13,8 +13,8 @@ $GIT_DIR/info/attributes, .gitattributes
 DESCRIPTION
 -----------
 
-A `gitattributes` file is a simple text file that gives
-`attributes` to pathnames.
+A `gitattributes` file is a simple text file (it cannot be a
+symbolic link to anything) that gives `attributes` to pathnames.
 
 Each line in `gitattributes` file is of form:
 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 7/8] verify_path(): disallow symlinks in .gitattributes and .gitignore
  2020-10-05 12:16   ` [PATCH v2 7/8] verify_path(): disallow symlinks in .gitattributes and .gitignore Jeff King
@ 2020-10-27  3:35     ` Jonathan Nieder
  2020-10-27  7:58       ` Jeff King
  0 siblings, 1 reply; 55+ messages in thread
From: Jonathan Nieder @ 2020-10-27  3:35 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Hi,

Jeff King wrote:

> However, it's still a reasonable idea to forbid symlinks for these
> files:
>
>   - As noted, they can still be used to read out-of-repo files (which is
>     fairly restricted, but in some circumstances you can probe file
>     content by speculatively creating files and seeing if they get
>     ignored)
>
>   - They don't currently behave well in all cases. We sometimes read
>     these files from the index, where we _don't_ follow symlinks (we'd
>     just treat the symlink target as the .gitignore or .gitattributes
>     content, which is actively wrong).
>
> This patch forbids symlinked versions of these files from entering the
> index. We already have helpers for obscured forms of the names from
> e7cb0b4455 (is_ntfs_dotgit: match other .git files, 2018-05-11) and
> 0fc333ba20 (is_hfs_dotgit: match other .git files, 2018-05-02), which
> were done as part of the series touching .gitmodules.

Thanks again for this.  Since this patch has been in "next", we've
gotten a little experience of it at Google.

We've been running with the fsck check for a while (more than a year),
but not the verify_dotfile check.  The verify_dotfile check didn't
trigger for anyone with .gitattributes, but it hit several people for
.gitignore.  Some examples that users have mentioned:

Before https://android-review.googlesource.com/c/platform/tools/test/connectivity/+/1462771/
Android used a .gitignore symlink for two directories that had similar
gitignore requirements.  Diagnosing the error was confusing for them,
especially because the "repo" wrapper tool produced messages like

	error.GitError: Cannot initialize work tree for platform/tools/test/connectivity

Eventually someone manually ran

	$ git add --a
	error: invalid path 'acts_tests/.gitignore'
	error: unable to add 'acts_tests/.gitignore' to index
	fatal: adding files failed

which helped them realize it was a git issue and helped me point them
to the need to replace the symlink with a plain file.

As another example, a user working with the
https://github.com/bakerstu/openmrn.git repository noticed "git
checkout" commands failing.  In this user's case, the checkout failed
part-way through, producing confusing behavior ("git status" showing
entries missing from the index).  When I tried to reproduce this, I
wasn't able to clone the repository at all because it failed fsck;
after disabling transfer.fsckObjects, I still wasn't able to check out
HEAD.

Observations:

- since some widely used repositories have .gitignore symlinks, I
  think we can't forbid it in fsck, alas

- it would be useful to be able to check whether these symlinks would
  not escape the worktree, for a more targeted check.  It might be
  nice to even respect these settings when they would not escape the
  worktree, but not necessarily

- we could use a clearer error message than "invalid path".  There's
  some room for improvement in "git checkout"'s error handling, too
  --- I think my ideal would be if the operation would fail entirely,
  with an advice message describing a checkout command that would
  succeed (But how do I checkout another commit while excluding some
  files? Should it suggest a sparse checkout?).

That's all I have time for today --- will revisit again tomorrow.

Thoughts?

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] documentation symlink restrictions for .git* files
  2020-10-26 23:32           ` Junio C Hamano
@ 2020-10-27  7:26             ` Jeff King
  2020-10-27 18:45               ` Junio C Hamano
  0 siblings, 1 reply; 55+ messages in thread
From: Jeff King @ 2020-10-27  7:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Philip Oakley, git, Jonathan Nieder

On Mon, Oct 26, 2020 at 04:32:27PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > Definitely a "NOTES" section should go in that spot, but possibly the
> > text should be in the "DESCRIPTION" section. I was worried about
> > cluttering that early part with a detail that most people wouldn't care
> > too much about.
> >
> > Looks like my patch is in 'next'; do you want to propose a patch moving
> > it around on top?
> 
> It probably is possible to tweak the introductory text this way
> without being unnecessarily loud and keep the NOTES section where it
> is.
> [...]
>  DESCRIPTION
>  -----------
>  
> -A `gitattributes` file is a simple text file that gives
> -`attributes` to pathnames.
> +A `gitattributes` file is a simple text file (it cannot be a
> +symbolic link to anything) that gives `attributes` to pathnames.

I worried that even a short mention like this would be distracting. Not
because it's so long, but because it's right there in the very first
sentence, and I really think this is a corner case that most people
would not even think about.

So it is helpful if you are looking for info on symlinks and these
files, but probably clutter if you are looking for something else.

I have to admit I don't feel all that strongly either way, though.

-Peff

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 7/8] verify_path(): disallow symlinks in .gitattributes and .gitignore
  2020-10-27  3:35     ` Jonathan Nieder
@ 2020-10-27  7:58       ` Jeff King
  2020-10-27 22:00         ` Junio C Hamano
  2020-10-27 23:43         ` Jonathan Nieder
  0 siblings, 2 replies; 55+ messages in thread
From: Jeff King @ 2020-10-27  7:58 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git

On Mon, Oct 26, 2020 at 08:35:18PM -0700, Jonathan Nieder wrote:

> As another example, a user working with the
> https://github.com/bakerstu/openmrn.git repository noticed "git
> checkout" commands failing.  In this user's case, the checkout failed
> part-way through, producing confusing behavior ("git status" showing
> entries missing from the index).  When I tried to reproduce this, I
> wasn't able to clone the repository at all because it failed fsck;
> after disabling transfer.fsckObjects, I still wasn't able to check out
> HEAD.

I wouldn't be surprised if there are code paths which die() early
instead of checking out as much as they can. But I think fixing that is
somewhat orthogonal (here we're just adding new reasons that checking
out may fail).

> Observations:
> 
> - since some widely used repositories have .gitignore symlinks, I
>   think we can't forbid it in fsck, alas

I am a little puzzled here. You said you had the fsck checks for the
last year, so why did they just come up now? I guess nobody sets
transfer.fsckObjects, and because you were testing only with clients,
your server implementation didn't reject pushes?

I agree it's annoying for them if they fail fsck, but it's not entirely
a show-stopper. There are config options for fine-tuning what you're
willing to enforce or ignore. But they of course are also annoying to
use, because every receiver of a transfer needs to set them (on GitHub,
for example, you have to email Support).

So I won't be too devastated to remove the symlink checks, or possibly
downgrade them to purely warnings (or "info"; the naming in fsck.c is
confusing, because the transfer operations take even warnings as fatal.
I suspect we could do with some cleanup there).

> - it would be useful to be able to check whether these symlinks would
>   not escape the worktree, for a more targeted check.  It might be
>   nice to even respect these settings when they would not escape the
>   worktree, but not necessarily

I actually wrote a patch several years ago for checking symlinks (not
just these ones, but _any_ symlinks in the repo, but of course it would
be easy to limit it more). It's included at the end of this mail. It's
been part of my daily build for many years, so I'm confident it doesn't
crash or have other bad behavior. But it's possible the logic for what
it catches is faulty.

> - we could use a clearer error message than "invalid path".  

That part is tricky. The "invalid path" error comes from the caller of
verify_path(), and we have no way to pass back an intelligent error
there. We can call error() ourselves, of course. That adds an extra line
of output, but it's rare enough for verify_path() to fail that it's
likely OK. However, I would worry that some callers might be surprised
by it producing output at all.

An alternative is letting the caller pass in a strbuf that we fill out
with an extra error string.

> There's some room for improvement in "git checkout"'s error handling,
> too --- I think my ideal would be if the operation would fail
> entirely, with an advice message describing a checkout command that
> would succeed (But how do I checkout another commit while excluding
> some files? Should it suggest a sparse checkout?).

I suspect it's too late for "fail entirely". We may have already written
to the filesystem, and rolling back is difficult and error-prone. In
general I'd expect to checkout what we can, produce errors for the rest,
and let the user work from there with "git status".

But I may be wrong. The problem is loading the value into the index, not
writing it to the filesystem. So perhaps the relevant code paths load
the index fully before writing out anything to the filesystem, and it's
easy to rollback. But I imagine they are using unpack-trees' flag to
update the filesystem, and I assume that checks out as it loads entries
(but I didn't confirm).

-Peff

Here's the patch. The tests need the checkout-index fix from:

  https://lore.kernel.org/git/20201027073000.GA3651896@coredump.intra.peff.net/

but it should otherwise be stand-alone. I don't recall why I never sent
it. One obvious downside is that it's difficult to have fsck checks for
it, since the full path of an entry is not a property of a single tree
object.

-- >8 --
Date: Wed, 9 Nov 2016 23:24:09 -0500
Subject: [PATCH] optionally block out-of-repo symlinks

---
 apply.c                              |   2 +-
 cache.h                              |   3 +
 config.c                             |   5 ++
 entry.c                              |   2 +-
 environment.c                        |   1 +
 merge-recursive.c                    |   2 +-
 path.c                               |  73 ++++++++++++++++++
 t/t2031-checkout-symlink-external.sh | 107 +++++++++++++++++++++++++++
 8 files changed, 192 insertions(+), 3 deletions(-)
 create mode 100755 t/t2031-checkout-symlink-external.sh

diff --git a/apply.c b/apply.c
index 76dba93c97..2ac940dc05 100644
--- a/apply.c
+++ b/apply.c
@@ -4360,7 +4360,7 @@ static int try_create_file(struct apply_state *state, const char *path,
 		/* Although buf:size is counted string, it also is NUL
 		 * terminated.
 		 */
-		return !!symlink(buf, path);
+		return !!safe_symlink(buf, path);
 
 	fd = open(path, O_CREAT | O_EXCL | O_WRONLY, (mode & 0100) ? 0777 : 0666);
 	if (fd < 0)
diff --git a/cache.h b/cache.h
index c0072d43b1..9b56e2327a 100644
--- a/cache.h
+++ b/cache.h
@@ -959,6 +959,7 @@ extern int precomposed_unicode;
 extern int protect_hfs;
 extern int protect_ntfs;
 extern const char *core_fsmonitor;
+extern int allow_external_symlinks;
 
 extern int core_apply_sparse_checkout;
 extern int core_sparse_checkout_cone;
@@ -1979,4 +1980,6 @@ int print_sha1_ellipsis(void);
 /* Return 1 if the file is empty or does not exists, 0 otherwise. */
 int is_empty_or_missing_file(const char *filename);
 
+int safe_symlink(const char *target, const char *linkpath);
+
 #endif /* CACHE_H */
diff --git a/config.c b/config.c
index 2bdff4457b..6699881c6e 100644
--- a/config.c
+++ b/config.c
@@ -1404,6 +1404,11 @@ static int git_default_core_config(const char *var, const char *value, void *cb)
 		return 0;
 	}
 
+	if (!strcmp(var, "core.allowexternalsymlinks")) {
+		allow_external_symlinks = git_config_bool(var, value);
+		return 0;
+	}
+
 	/* Add other config variables here and to Documentation/config.txt. */
 	return platform_core_config(var, value, cb);
 }
diff --git a/entry.c b/entry.c
index a0532f1f00..f2d3456df3 100644
--- a/entry.c
+++ b/entry.c
@@ -291,7 +291,7 @@ static int write_entry(struct cache_entry *ce,
 		if (!has_symlinks || to_tempfile)
 			goto write_file_entry;
 
-		ret = symlink(new_blob, path);
+		ret = safe_symlink(new_blob, path);
 		free(new_blob);
 		if (ret)
 			return error_errno("unable to create symlink %s", path);
diff --git a/environment.c b/environment.c
index bb518c61cd..7c233e0e0e 100644
--- a/environment.c
+++ b/environment.c
@@ -73,6 +73,7 @@ int merge_log_config = -1;
 int precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */
 unsigned long pack_size_limit_cfg;
 enum log_refs_config log_all_ref_updates = LOG_REFS_UNSET;
+int allow_external_symlinks = 1;
 
 #ifndef PROTECT_HFS_DEFAULT
 #define PROTECT_HFS_DEFAULT 0
diff --git a/merge-recursive.c b/merge-recursive.c
index d0214335a7..61208a8b43 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -986,7 +986,7 @@ static int update_file_flags(struct merge_options *opt,
 			char *lnk = xmemdupz(buf, size);
 			safe_create_leading_directories_const(path);
 			unlink(path);
-			if (symlink(lnk, path))
+			if (safe_symlink(lnk, path))
 				ret = err(opt, _("failed to symlink '%s': %s"),
 					  path, strerror(errno));
 			free(lnk);
diff --git a/path.c b/path.c
index 7b385e5eb2..251dd70ff5 100644
--- a/path.c
+++ b/path.c
@@ -1498,6 +1498,79 @@ int looks_like_command_line_option(const char *str)
 	return str && str[0] == '-';
 }
 
+static int symlink_leaves_repo(const char *target, const char *linkpath)
+{
+	/*
+	 * Absolute paths are always considered to leave the repository (even
+	 * if they happen to point to the working tree path).
+	 */
+	if (is_absolute_path(target))
+		return 1;
+
+	/*
+	 * Allow relative paths that start with a sequence of "../",
+	 * as long as they do not break out of the symlink's root.
+	 * This loop will detect break-out cases and return; otherwise, at the
+	 * end of the loop "target" will point to the first non-".." component.
+	 *
+	 * We count the depth of linkpath by eating up directory components left
+	 * to right. Technically the symlink would resolve right-to-left, but
+	 * we don't care about the actual values, only the number.
+	 */
+	while (target[0] == '.') {
+		if (!target[1]) {
+			/* trailing "." -- ignore */
+			target++;
+		} else if (is_dir_sep(target[1])) {
+			/* "./" -- ignore */
+			target += 2;
+		} else if (target[1] == '.' &&
+			   (!target[2] || is_dir_sep(target[2]))) {
+			/* ".." or "../" -- drop one from linkpath depth */
+			while (!is_dir_sep(*linkpath)) {
+				/* end-of-string; target exceeded our depth */
+				if (!*linkpath)
+					return 1;
+				linkpath++;
+			}
+			/* skip final "/" */
+			linkpath++;
+
+			/* skip past ".." */
+			target += 2;
+			/* and "/" if present */
+			if (is_dir_sep(*target))
+				target++;
+		}
+	}
+
+	/*
+	 * Now we have a path in "target" that only go down into the tree.
+	 * Disallow any interior "../", like "foo/../bar". These might be
+	 * OK, but we cannot know unless we know whether "foo" is itself a
+	 * symlink. So err on the side of caution.
+	 */
+	while (*target) {
+		const char *v;
+		if (skip_prefix(target, "..", &v) && (!*v || is_dir_sep(*v)))
+			return 1;
+		target++;
+	}
+
+	return 0;
+}
+
+int safe_symlink(const char *target, const char *linkpath)
+{
+	if (!allow_external_symlinks &&
+	    symlink_leaves_repo(target, linkpath)) {
+		errno = EPERM;
+		return -1;
+	}
+
+	return symlink(target, linkpath);
+}
+
 char *xdg_config_home(const char *filename)
 {
 	const char *home, *config_home;
diff --git a/t/t2031-checkout-symlink-external.sh b/t/t2031-checkout-symlink-external.sh
new file mode 100755
index 0000000000..c799f8158e
--- /dev/null
+++ b/t/t2031-checkout-symlink-external.sh
@@ -0,0 +1,107 @@
+#!/bin/sh
+
+test_description='detection and prevention of out-of-tree symlinks'
+. ./test-lib.sh
+
+if ! test_have_prereq SYMLINKS
+then
+	skip_all='skipping external symlink tests (missing SYMLINKS)'
+	test_done
+fi
+
+create_symlink() {
+	symlink=$1
+	target=$2
+	test_expect_success "create symlink ($symlink)" '
+		sha1=$(printf "%s" "$target" | git hash-object -w --stdin) &&
+		git update-index --add --cacheinfo "120000,$sha1,$symlink"
+	'
+}
+
+check_symlink () {
+	symlink=$1
+	config=$2
+	outcome=$3
+	expect=$4
+
+	if test "$outcome" = "allow"
+	then
+		fail=
+		: ${expect:=test_cmp ../target}
+	else
+		fail=test_must_fail
+		: ${expect:=! cat}
+	fi
+
+	test_expect_success " check symlink ($symlink, $config -> $outcome)" "
+		rm -f $symlink &&
+		$fail git -c core.allowExternalSymlinks=$config \\
+			checkout-index -- $symlink &&
+		$expect $symlink
+	"
+}
+
+# we want to try breaking out of the repository,
+# so let's work inside a sub-repository, and break
+# out to the top-level trash directory
+test_expect_success 'set up repository' '
+	echo content >target &&
+	git init subrepo &&
+	cd subrepo &&
+	test_commit base &&
+	echo content >in-repo-target
+'
+
+create_symlink in-repo in-repo-target
+check_symlink in-repo false allow
+
+create_symlink subdir/in-repo ../in-repo-target
+check_symlink subdir/in-repo false allow
+
+create_symlink absolute "$TRASH_DIRECTORY/target"
+check_symlink absolute true allow
+check_symlink absolute false forbid
+
+create_symlink relative "../target"
+check_symlink relative true allow
+check_symlink relative false forbid
+
+create_symlink curdir .
+check_symlink curdir false allow test_path_is_dir
+create_symlink sneaky curdir/../target
+check_symlink sneaky true allow
+check_symlink sneaky false forbid
+
+test_expect_success 'applying a patch checks symlink config' '
+	git diff-index -p --cached HEAD -- relative >patch &&
+	rm -f relative &&
+	git -c core.allowExternalSymlinks=true apply <patch &&
+	test_cmp ../target relative &&
+	rm -f relative &&
+	test_must_fail git -c core.allowExternalSymlinks=false apply <patch
+'
+
+test_expect_success 'merge-recursive checks symlinks config' '
+	git reset --hard &&
+
+	# create rename situation which requires processing
+	# outside of unpack_trees()
+	ln -s ../foo one &&
+	git add one &&
+	git commit -m base &&
+
+	ln -sf ../target one &&
+	git commit -am modify &&
+
+	git checkout -b side HEAD^ &&
+	git mv one two &&
+	git commit -am rename &&
+
+	git -c core.allowExternalSymlinks=true merge master &&
+	test_cmp ../target two &&
+
+	git reset --hard HEAD^ &&
+	test_must_fail git -c core.allowExternalSymlinks=false merge master
+'
+
+test_done
-- 
2.29.1.634.g9e41dc1bf2


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] documentation symlink restrictions for .git* files
  2020-10-27  7:26             ` Jeff King
@ 2020-10-27 18:45               ` Junio C Hamano
  2020-10-27 21:00                 ` Philip Oakley
  0 siblings, 1 reply; 55+ messages in thread
From: Junio C Hamano @ 2020-10-27 18:45 UTC (permalink / raw)
  To: Jeff King; +Cc: Philip Oakley, git, Jonathan Nieder

Jeff King <peff@peff.net> writes:

>>  DESCRIPTION
>>  -----------
>>  
>> -A `gitattributes` file is a simple text file that gives
>> -`attributes` to pathnames.
>> +A `gitattributes` file is a simple text file (it cannot be a
>> +symbolic link to anything) that gives `attributes` to pathnames.
>
> I worried that even a short mention like this would be distracting. Not
> because it's so long, but because it's right there in the very first
> sentence, and I really think this is a corner case that most people
> would not even think about.
>
> So it is helpful if you are looking for info on symlinks and these
> files, but probably clutter if you are looking for something else.
>
> I have to admit I don't feel all that strongly either way, though.

I don't, either, and as I said, I found that the placement of new
text was OK.



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] documentation symlink restrictions for .git* files
  2020-10-27 18:45               ` Junio C Hamano
@ 2020-10-27 21:00                 ` Philip Oakley
  2020-10-28 19:14                   ` Junio C Hamano
  0 siblings, 1 reply; 55+ messages in thread
From: Philip Oakley @ 2020-10-27 21:00 UTC (permalink / raw)
  To: Junio C Hamano, Jeff King; +Cc: git, Jonathan Nieder

On 27/10/2020 18:45, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
>
>>>  DESCRIPTION
>>>  -----------
>>>  
>>> -A `gitattributes` file is a simple text file that gives
>>> -`attributes` to pathnames.
>>> +A `gitattributes` file is a simple text file (it cannot be a
>>> +symbolic link to anything) that gives `attributes` to pathnames.
>> I worried that even a short mention like this would be distracting. Not
>> because it's so long, but because it's right there in the very first
>> sentence, and I really think this is a corner case that most people
>> would not even think about.
>>
>> So it is helpful if you are looking for info on symlinks and these
>> files, but probably clutter if you are looking for something else.
>>
>> I have to admit I don't feel all that strongly either way, though.
> I don't, either, and as I said, I found that the placement of new
> text was OK.
>
>
I do think that the extra text above is the right thing to do.

We should be informing readers early about things that are expressly
prohibited.

Leaving just the note till nearly the end of the rather long
attributes/ignore man pages makes it very hard to discover for
frustrated users, which would accidentally reinforce the idea of the
docs being poor (rather than being focussed).

Philip

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 7/8] verify_path(): disallow symlinks in .gitattributes and .gitignore
  2020-10-27  7:58       ` Jeff King
@ 2020-10-27 22:00         ` Junio C Hamano
  2020-10-28  9:41           ` Jeff King
  2020-10-27 23:43         ` Jonathan Nieder
  1 sibling, 1 reply; 55+ messages in thread
From: Junio C Hamano @ 2020-10-27 22:00 UTC (permalink / raw)
  To: Jeff King; +Cc: Jonathan Nieder, git

Jeff King <peff@peff.net> writes:

> diff --git a/environment.c b/environment.c
> index bb518c61cd..7c233e0e0e 100644
> --- a/environment.c
> +++ b/environment.c
> @@ -73,6 +73,7 @@ int merge_log_config = -1;
>  int precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */
>  unsigned long pack_size_limit_cfg;
>  enum log_refs_config log_all_ref_updates = LOG_REFS_UNSET;
> +int allow_external_symlinks = 1;

OK, so by default it is not blocked...

> +static int symlink_leaves_repo(const char *target, const char *linkpath)
> +{
> +	/*
> +	 * Absolute paths are always considered to leave the repository (even
> +	 * if they happen to point to the working tree path).
> +	 */
> +	if (is_absolute_path(target))
> +		return 1;

Very sensible.

> +	/*
> +	 * Allow relative paths that start with a sequence of "../",
> +	 * as long as they do not break out of the symlink's root.
> +	 * This loop will detect break-out cases and return; otherwise, at the
> +	 * end of the loop "target" will point to the first non-".." component.
> +	 *
> +	 * We count the depth of linkpath by eating up directory components left
> +	 * to right. Technically the symlink would resolve right-to-left, but
> +	 * we don't care about the actual values, only the number.
> +	 */
> +	while (target[0] == '.') {
> +		if (!target[1]) {
> +			/* trailing "." -- ignore */
> +			target++;
> +		} else if (is_dir_sep(target[1])) {
> +			/* "./" -- ignore */
> +			target += 2;
> +		} else if (target[1] == '.' &&
> +			   (!target[2] || is_dir_sep(target[2]))) {
> +			/* ".." or "../" -- drop one from linkpath depth */
> +			while (!is_dir_sep(*linkpath)) {
> +				/* end-of-string; target exceeded our depth */
> +				if (!*linkpath)
> +					return 1;
> +				linkpath++;
> +			}
> +			/* skip final "/" */
> +			linkpath++;
> +
> +			/* skip past ".." */
> +			target += 2;
> +			/* and "/" if present */
> +			if (is_dir_sep(*target))
> +				target++;
> +		}
> +	}
> +
> +	/*
> +	 * Now we have a path in "target" that only go down into the tree.
> +	 * Disallow any interior "../", like "foo/../bar". These might be
> +	 * OK, but we cannot know unless we know whether "foo" is itself a
> +	 * symlink. So err on the side of caution.
> +	 */
> +	while (*target) {
> +		const char *v;
> +		if (skip_prefix(target, "..", &v) && (!*v || is_dir_sep(*v)))
> +			return 1;
> +		target++;
> +	}
> +
> +	return 0;
> +}
> +
> +int safe_symlink(const char *target, const char *linkpath)
> +{
> +	if (!allow_external_symlinks &&
> +	    symlink_leaves_repo(target, linkpath)) {
> +		errno = EPERM;
> +		return -1;
> +	}
> +
> +	return symlink(target, linkpath);
> +}

OK.  This is only about blocking creation of new symbolic links that
goes outside the working tree.  It obviously is a good thing to do.

We have some "symlink safety" in various parts of the system [*1*],
and I wonder if we can somehow consolidate the support to a more
central place.

Thanks.


[Footnote]

*1* For example, apply tries to be careful not to take the "path"
    recorded in the incoming patch blindly, and instead checks if
    any path component in it is a symbolic link before touching.
    Similarly, callers of has_symlink_leading_path() all try to be
    careful when the "path" they want to use to access a filesystem
    entity has a symbolic link in the middle on the filesystem.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 7/8] verify_path(): disallow symlinks in .gitattributes and .gitignore
  2020-10-27  7:58       ` Jeff King
  2020-10-27 22:00         ` Junio C Hamano
@ 2020-10-27 23:43         ` Jonathan Nieder
  2020-10-28 19:18           ` Junio C Hamano
  1 sibling, 1 reply; 55+ messages in thread
From: Jonathan Nieder @ 2020-10-27 23:43 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King wrote:
> On Mon, Oct 26, 2020 at 08:35:18PM -0700, Jonathan Nieder wrote:

>> Observations:
>>
>> - since some widely used repositories have .gitignore symlinks, I
>>   think we can't forbid it in fsck, alas
>
> I am a little puzzled here. You said you had the fsck checks for the
> last year, so why did they just come up now? I guess nobody sets
> transfer.fsckObjects, and because you were testing only with clients,
> your server implementation didn't reject pushes?
>
> I agree it's annoying for them if they fail fsck, but it's not entirely
> a show-stopper. There are config options for fine-tuning what you're
> willing to enforce or ignore. But they of course are also annoying to
> use, because every receiver of a transfer needs to set them (on GitHub,
> for example, you have to email Support).

Yes.  And to reiterate the point a little: the reason nobody sets
transfer.fsckObjects is that we haven't made it easy to distinguish
between "hard error, should never be overridden" checks (like
BAD_PARENT_SHA1), "new tools shouldn't write these but they exist in
important repos like perl.git and anything consuming Git repositories
needs to cope with them" (like MISSING_SPACE_BEFORE_DATE from some
commits' concatenated authors), and so on.

> So I won't be too devastated to remove the symlink checks, or possibly
> downgrade them to purely warnings (or "info"; the naming in fsck.c is
> confusing, because the transfer operations take even warnings as fatal.
> I suspect we could do with some cleanup there).

Downgrading the .gitignore check to warning sounds okay.  .gitmodules
would still want to be an error, of course.

>> - it would be useful to be able to check whether these symlinks would
>>   not escape the worktree, for a more targeted check.  It might be
>>   nice to even respect these settings when they would not escape the
>>   worktree, but not necessarily
>
> I actually wrote a patch several years ago for checking symlinks (not
> just these ones, but _any_ symlinks in the repo, but of course it would
> be easy to limit it more). It's included at the end of this mail. It's
> been part of my daily build for many years, so I'm confident it doesn't
> crash or have other bad behavior. But it's possible the logic for what
> it catches is faulty.
[...]
> Here's the patch.

Nice!  I'll try to find time to experiment with this.

[...]
>> - we could use a clearer error message than "invalid path".
>
> That part is tricky. The "invalid path" error comes from the caller of
> verify_path(), and we have no way to pass back an intelligent error
> there. We can call error() ourselves, of course. That adds an extra line
> of output, but it's rare enough for verify_path() to fail that it's
> likely OK. However, I would worry that some callers might be surprised
> by it producing output at all.
>
> An alternative is letting the caller pass in a strbuf that we fill out
> with an extra error string.

I think passing in a struct to govern error handling sounds nice.
Alternatively, this might suggest that verify_path is not the right
place for the user-facing check: it's useful because it's exhaustive,
but it may be that we can also catch the same issue earlier and
produce a nicer diagnostic.

>> There's some room for improvement in "git checkout"'s error handling,
>> too --- I think my ideal would be if the operation would fail
>> entirely, with an advice message describing a checkout command that
>> would succeed (But how do I checkout another commit while excluding
>> some files? Should it suggest a sparse checkout?).
>
> I suspect it's too late for "fail entirely". We may have already written
> to the filesystem, and rolling back is difficult and error-prone. In
> general I'd expect to checkout what we can, produce errors for the rest,
> and let the user work from there with "git status".
>
> But I may be wrong. The problem is loading the value into the index, not
> writing it to the filesystem. So perhaps the relevant code paths load
> the index fully before writing out anything to the filesystem, and it's
> easy to rollback. But I imagine they are using unpack-trees' flag to
> update the filesystem, and I assume that checks out as it loads entries
> (but I didn't confirm).

Right, this would require taking two passes.  I think we do already
perform two passes (first deletions, then additions); I'll have to
look a little more closely to see whether this is straightforward to
do.

I do still like the idea of not respecting .gitignore and
.gitattributes symlinks, by the way.  What the experiences upthread
tell me is just that users need better facilities for repairing
existing repositories.

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 7/8] verify_path(): disallow symlinks in .gitattributes and .gitignore
  2020-10-27 22:00         ` Junio C Hamano
@ 2020-10-28  9:41           ` Jeff King
  0 siblings, 0 replies; 55+ messages in thread
From: Jeff King @ 2020-10-28  9:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Nieder, git

On Tue, Oct 27, 2020 at 03:00:36PM -0700, Junio C Hamano wrote:

> > +	/*
> > +	 * Now we have a path in "target" that only go down into the tree.
> > +	 * Disallow any interior "../", like "foo/../bar". These might be
> > +	 * OK, but we cannot know unless we know whether "foo" is itself a
> > +	 * symlink. So err on the side of caution.
> > +	 */
> > +	while (*target) {
> > +		const char *v;
> > +		if (skip_prefix(target, "..", &v) && (!*v || is_dir_sep(*v)))
> > +			return 1;
> > +		target++;
> > +	}

Just reading over the patch again, I suspect this is overly eager to
find ".." that is not bordered by a directory separator on the left-hand
side (e.g., I think it will match "foo../").

> > +int safe_symlink(const char *target, const char *linkpath)
> > +{
> > +	if (!allow_external_symlinks &&
> > +	    symlink_leaves_repo(target, linkpath)) {
> > +		errno = EPERM;
> > +		return -1;
> > +	}
> > +
> > +	return symlink(target, linkpath);
> > +}
> 
> OK.  This is only about blocking creation of new symbolic links that
> goes outside the working tree.  It obviously is a good thing to do.
> 
> We have some "symlink safety" in various parts of the system [*1*],
> and I wonder if we can somehow consolidate the support to a more
> central place.

Note that it is overly conservative, as described in the comment quoted
at the top of this email. It's possible that other code might be able to
be more accurate because it can see "/foo/" in the middle of a symlink
target and actually _look_ at "foo".  Does it exist, is it a symlink
itself, etc.

Whereas here we're taking a purely textual look at the target. We have
to, I think, because we don't know if or when that "foo" will get
updated. And maybe that same restriction applies to other parts of the
system, or maybe not.

At any rate, I don't really have plans to push this particular topic
forward, at least not in the near term. I was mostly sharing it in case
anybody found it useful. If somebody wants to run with it, please be my
guest.

-Peff

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] documentation symlink restrictions for .git* files
  2020-10-27 21:00                 ` Philip Oakley
@ 2020-10-28 19:14                   ` Junio C Hamano
  0 siblings, 0 replies; 55+ messages in thread
From: Junio C Hamano @ 2020-10-28 19:14 UTC (permalink / raw)
  To: Philip Oakley; +Cc: Jeff King, git, Jonathan Nieder

Philip Oakley <philipoakley@iee.email> writes:

> On 27/10/2020 18:45, Junio C Hamano wrote:
>> Jeff King <peff@peff.net> writes:
>>
>>>>  DESCRIPTION
>>>>  -----------
>>>>  
>>>> -A `gitattributes` file is a simple text file that gives
>>>> -`attributes` to pathnames.
>>>> +A `gitattributes` file is a simple text file (it cannot be a
>>>> +symbolic link to anything) that gives `attributes` to pathnames.
>>> I worried that even a short mention like this would be distracting. Not
>>> because it's so long, but because it's right there in the very first
>>> sentence, and I really think this is a corner case that most people
>>> would not even think about.
>>>
>>> So it is helpful if you are looking for info on symlinks and these
>>> files, but probably clutter if you are looking for something else.
>>>
>>> I have to admit I don't feel all that strongly either way, though.
>> I don't, either, and as I said, I found that the placement of new
>> text was OK.
>>
>>
> I do think that the extra text above is the right thing to do.
>
> We should be informing readers early about things that are expressly
> prohibited.

It needs to be balanced, though.  If it is so common a temptation
among readers to make a symbolic link out of the tracked contents,
perhaps it is worth telling them that they shouldn't do so, but I
just do not get the feeling that it is the case.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 7/8] verify_path(): disallow symlinks in .gitattributes and .gitignore
  2020-10-27 23:43         ` Jonathan Nieder
@ 2020-10-28 19:18           ` Junio C Hamano
  0 siblings, 0 replies; 55+ messages in thread
From: Junio C Hamano @ 2020-10-28 19:18 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Jeff King, git

Jonathan Nieder <jrnieder@gmail.com> writes:

> Yes.  And to reiterate the point a little: the reason nobody sets
> transfer.fsckObjects is that we haven't made it easy to distinguish
> between "hard error, should never be overridden" checks (like
> BAD_PARENT_SHA1), "new tools shouldn't write these but they exist in
> important repos like perl.git and anything consuming Git repositories
> needs to cope with them" (like MISSING_SPACE_BEFORE_DATE from some
> commits' concatenated authors), and so on.

Hmph, don't we "distinguish" them by setting appropriate default
levels, though?  Perhaps some classes of errors are set too strict?

>> So I won't be too devastated to remove the symlink checks, or possibly
>> downgrade them to purely warnings (or "info"; the naming in fsck.c is
>> confusing, because the transfer operations take even warnings as fatal.
>> I suspect we could do with some cleanup there).
>
> Downgrading the .gitignore check to warning sounds okay.  .gitmodules
> would still want to be an error, of course.

.gitattributes (and any other .git<thing> we may have in the
future), too.

Thanks.


^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2020-10-28 23:08 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-05  7:17 [PATCH 0/7] forbidding symlinked .gitattributes and .gitignore Jeff King
2020-10-05  7:19 ` [PATCH 1/7] fsck_tree(): fix shadowed variable Jeff King
2020-10-05  7:44   ` Jonathan Nieder
2020-10-05  8:20     ` Jeff King
2020-10-05  8:29       ` Jonathan Nieder
2020-10-05  7:19 ` [PATCH 2/7] fsck_tree(): wrap some long lines Jeff King
2020-10-05  7:46   ` Jonathan Nieder
2020-10-05  7:19 ` [PATCH 3/7] t7415: rename to expand scope Jeff King
2020-10-05  7:50   ` Jonathan Nieder
2020-10-05  8:24     ` Jeff King
2020-10-05  8:34       ` Jonathan Nieder
2020-10-05  8:49         ` Jeff King
2020-10-05  7:20 ` [PATCH 4/7] t7450: test verify_path() handling of gitmodules Jeff King
2020-10-05  7:53   ` Jonathan Nieder
2020-10-05  8:30     ` Jeff King
2020-10-05  8:38       ` Jonathan Nieder
2020-10-05  7:21 ` [PATCH 5/7] t0060: test obscured .gitattributes and .gitignore matching Jeff King
2020-10-05  8:03   ` Jonathan Nieder
2020-10-05  8:40     ` Jeff King
2020-10-05 21:20       ` Johannes Schindelin
2020-10-06 14:01         ` Jeff King
2020-10-05  7:24 ` [PATCH 6/7] verify_path(): disallow symlinks in .gitattributes and .gitignore Jeff King
2020-10-05  8:09   ` Jonathan Nieder
2020-10-05 12:07     ` Jeff King
2020-10-05  7:25 ` [PATCH 7/7] fsck: complain when .gitattributes or .gitignore is a symlink Jeff King
2020-10-05  8:12   ` Jonathan Nieder
2020-10-05  8:53     ` Jeff King
2020-10-05  7:32 ` [PATCH 0/7] forbidding symlinked .gitattributes and .gitignore Jonathan Nieder
2020-10-05  8:58   ` Jeff King
2020-10-05 12:16 ` [PATCH v2 0/8] " Jeff King
2020-10-05 12:16   ` [PATCH v2 1/8] fsck_tree(): fix shadowed variable Jeff King
2020-10-05 12:16   ` [PATCH v2 2/8] fsck_tree(): wrap some long lines Jeff King
2020-10-05 12:16   ` [PATCH v2 3/8] t7415: rename to expand scope Jeff King
2020-10-05 12:16   ` [PATCH v2 4/8] t7450: test verify_path() handling of gitmodules Jeff King
2020-10-05 12:16   ` [PATCH v2 5/8] t7450: test .gitmodules symlink matching against obscured names Jeff King
2020-10-05 12:16   ` [PATCH v2 6/8] t0060: test obscured .gitattributes and .gitignore matching Jeff King
2020-10-05 12:16   ` [PATCH v2 7/8] verify_path(): disallow symlinks in .gitattributes and .gitignore Jeff King
2020-10-27  3:35     ` Jonathan Nieder
2020-10-27  7:58       ` Jeff King
2020-10-27 22:00         ` Junio C Hamano
2020-10-28  9:41           ` Jeff King
2020-10-27 23:43         ` Jonathan Nieder
2020-10-28 19:18           ` Junio C Hamano
2020-10-05 12:16   ` [PATCH v2 8/8] fsck: complain when .gitattributes or .gitignore is a symlink Jeff King
2020-10-06 20:41   ` [PATCH v2 0/8] forbidding symlinked .gitattributes and .gitignore Junio C Hamano
2020-10-20 23:19   ` Philip Oakley
2020-10-23  8:17     ` [PATCH] documentation symlink restrictions for .git* files Jeff King
2020-10-23  8:27       ` Jeff King
2020-10-26 22:18       ` Philip Oakley
2020-10-26 22:53         ` Jeff King
2020-10-26 23:32           ` Junio C Hamano
2020-10-27  7:26             ` Jeff King
2020-10-27 18:45               ` Junio C Hamano
2020-10-27 21:00                 ` Philip Oakley
2020-10-28 19:14                   ` Junio C Hamano

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).