git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/9] leftover bits from symlinked gitattributes, etc topics
@ 2021-05-01 15:40 Jeff King
  2021-05-01 15:41 ` [PATCH 1/9] t7415: remove out-dated comment about translation Jeff King
                   ` (10 more replies)
  0 siblings, 11 replies; 35+ messages in thread
From: Jeff King @ 2021-05-01 15:40 UTC (permalink / raw)
  To: git

A while back, I had a topic[1] that treated symlinked .gitattributes
(and .gitignore and .mailmap) the same as .gitmodules: forbidding them
in the index, complaining about them via fsck, etc.

In the end, we decided not to do that[2], and instead just open the
files with O_NOFOLLOW instead. As I said in that thread, we could
salvage some of the cleanups, fsck checks, and docs from the original
topic. So here that is. (The new topic is in master but not yet
released; so while this is not strictly a bug-fix for an existing topic,
it would be good to get especially the doc improvements into the same
release).

[1] https://lore.kernel.org/git/20201005071751.GA2290770@coredump.intra.peff.net/
[2] https://lore.kernel.org/git/YDiWs6yyv3U9YvC2@coredump.intra.peff.net/

The patches are:

  [1/9]: t7415: remove out-dated comment about translation
  [2/9]: fsck_tree(): fix shadowed variable
  [3/9]: fsck_tree(): wrap some long lines

    These three are really independent fixes and cleanups, that could be
    taken separately.

  [4/9]: t7415: rename to expand scope
  [5/9]: t7450: test verify_path() handling of gitmodules
  [6/9]: t7450: test .gitmodules symlink matching against obscured names

    These three are just cleaning up and improving the tests for the
    existing handling of the .gitmodules symlinks.

  [7/9]: t0060: test ntfs/hfs-obscured dotfiles
  [8/9]: fsck: warn about symlinked dotfiles we'll open with O_NOFOLLOW

    These two extend fsck checks (warnings, not errors) to the new
    files. I think this is an improvement, but I could be persuaded
    otherwise (and these two could be dropped independent of the rest).

  [9/9]: docs: document symlink restrictions for dot-files

    And this covers documentation for all of the files (including
    .gitmodules).

 Documentation/gitattributes.txt               |   7 ++
 Documentation/gitignore.txt                   |   5 +
 Documentation/gitmailmap.txt                  |   7 ++
 Documentation/gitmodules.txt                  |   8 ++
 cache.h                                       |   1 +
 fsck.c                                        |  84 ++++++++++---
 fsck.h                                        |   3 +
 path.c                                        |   5 +
 t/helper/test-path-utils.c                    |  46 +++++--
 t/t0060-path-utils.sh                         |  30 +++++
 ...ule-names.sh => t7450-bad-git-dotfiles.sh} | 116 +++++++++++++-----
 utf8.c                                        |   5 +
 utf8.h                                        |   1 +
 13 files changed, 257 insertions(+), 61 deletions(-)
 rename t/{t7415-submodule-names.sh => t7450-bad-git-dotfiles.sh} (70%)

-Peff

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 1/9] t7415: remove out-dated comment about translation
  2021-05-01 15:40 [PATCH 0/9] leftover bits from symlinked gitattributes, etc topics Jeff King
@ 2021-05-01 15:41 ` Jeff King
  2021-05-03  9:46   ` Ævar Arnfjörð Bjarmason
  2021-05-01 15:41 ` [PATCH 2/9] fsck_tree(): fix shadowed variable Jeff King
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 35+ messages in thread
From: Jeff King @ 2021-05-01 15:41 UTC (permalink / raw)
  To: git

Since GETTEXT_POISON does not exist anymore, there is no point warning
people about whether we should use test_i18ngrep. This is doubly
confusing because the comment was describing why it was OK to use grep,
but it got caught up in the mass conversion of 674ba34038 (fsck: mark
strings for translation, 2018-11-10).

Note there are other uses of test_i18ngrep in this script which are now
obsolete; I'll save those for a mass-cleanup. My goal here was just to
fix the confusing comment in code I'm about to refactor.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t7415-submodule-names.sh | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/t/t7415-submodule-names.sh b/t/t7415-submodule-names.sh
index f70368bc2e..fef6561d80 100755
--- a/t/t7415-submodule-names.sh
+++ b/t/t7415-submodule-names.sh
@@ -151,10 +151,9 @@ test_expect_success 'fsck detects symlinked .gitmodules file' '
 		} | git mktree &&
 
 		# Check not only that we fail, but that it is due to the
-		# symlink detector; this grep string comes from the config
-		# variable name and will not be translated.
+		# symlink detector
 		test_must_fail git fsck 2>output &&
-		test_i18ngrep gitmodulesSymlink output
+		grep gitmodulesSymlink output
 	)
 '
 
-- 
2.31.1.875.g5dccece0aa


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 2/9] fsck_tree(): fix shadowed variable
  2021-05-01 15:40 [PATCH 0/9] leftover bits from symlinked gitattributes, etc topics Jeff King
  2021-05-01 15:41 ` [PATCH 1/9] t7415: remove out-dated comment about translation Jeff King
@ 2021-05-01 15:41 ` Jeff King
  2021-05-03 11:15   ` Ævar Arnfjörð Bjarmason
  2021-05-01 15:41 ` [PATCH 3/9] fsck_tree(): wrap some long lines Jeff King
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 35+ messages in thread
From: Jeff King @ 2021-05-01 15:41 UTC (permalink / raw)
  To: git

Commit b2f2039c2b (fsck: accept an oid instead of a "struct tree" for
fsck_tree(), 2019-10-18) introduced a new "oid" parameter to
fsck_tree(), and we pass it to the report() function when we find
problems. However, that is shadowed within the tree-walking loop by the
existing "oid" variable which we use to store the oid of each tree
entry. As a result, we may report the wrong oid for some problems we
detect within the loop (the entry oid, instead of the tree oid).

Our tests didn't catch this because they checked only that we found the
expected fsck problem, not that it was attached to the correct object.

Let's rename both variables in the function to avoid confusion. This
makes the diff a little noisy (e.g., all of the report() calls outside
the loop were already correct but need to be touched), but makes sure we
catch all cases and will avoid similar confusion in the future.

And we can update the test to be a bit more specific and catch this
problem.

Signed-off-by: Jeff King <peff@peff.net>
---
 fsck.c                     | 42 ++++++++++++++++++++------------------
 t/t7415-submodule-names.sh |  5 +++--
 2 files changed, 25 insertions(+), 22 deletions(-)

diff --git a/fsck.c b/fsck.c
index f5ed6a2635..dd31ed1413 100644
--- a/fsck.c
+++ b/fsck.c
@@ -558,7 +558,7 @@ static int verify_ordered(unsigned mode1, const char *name1,
 	return c1 < c2 ? 0 : TREE_UNORDERED;
 }
 
-static int fsck_tree(const struct object_id *oid,
+static int fsck_tree(const struct object_id *tree_oid,
 		     const char *buffer, unsigned long size,
 		     struct fsck_options *options)
 {
@@ -579,7 +579,7 @@ static int fsck_tree(const struct object_id *oid,
 	struct name_stack df_dup_candidates = { NULL };
 
 	if (init_tree_desc_gently(&desc, buffer, size)) {
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree");
+		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree");
 		return retval;
 	}
 
@@ -589,11 +589,11 @@ static int fsck_tree(const struct object_id *oid,
 	while (desc.size) {
 		unsigned short mode;
 		const char *name, *backslash;
-		const struct object_id *oid;
+		const struct object_id *entry_oid;
 
-		oid = tree_entry_extract(&desc, &name, &mode);
+		entry_oid = tree_entry_extract(&desc, &name, &mode);
 
-		has_null_sha1 |= is_null_oid(oid);
+		has_null_sha1 |= is_null_oid(entry_oid);
 		has_full_path |= !!strchr(name, '/');
 		has_empty_name |= !*name;
 		has_dot |= !strcmp(name, ".");
@@ -603,10 +603,11 @@ static int fsck_tree(const struct object_id *oid,
 
 		if (is_hfs_dotgitmodules(name) || is_ntfs_dotgitmodules(name)) {
 			if (!S_ISLNK(mode))
-				oidset_insert(&options->gitmodules_found, oid);
+				oidset_insert(&options->gitmodules_found,
+					      entry_oid);
 			else
 				retval += report(options,
-						 oid, OBJ_TREE,
+						 tree_oid, OBJ_TREE,
 						 FSCK_MSG_GITMODULES_SYMLINK,
 						 ".gitmodules is a symbolic link");
 		}
@@ -617,9 +618,10 @@ static int fsck_tree(const struct object_id *oid,
 				has_dotgit |= is_ntfs_dotgit(backslash);
 				if (is_ntfs_dotgitmodules(backslash)) {
 					if (!S_ISLNK(mode))
-						oidset_insert(&options->gitmodules_found, oid);
+						oidset_insert(&options->gitmodules_found,
+							      entry_oid);
 					else
-						retval += report(options, oid, OBJ_TREE,
+						retval += report(options, tree_oid, OBJ_TREE,
 								 FSCK_MSG_GITMODULES_SYMLINK,
 								 ".gitmodules is a symbolic link");
 				}
@@ -628,7 +630,7 @@ static int fsck_tree(const struct object_id *oid,
 		}
 
 		if (update_tree_entry_gently(&desc)) {
-			retval += report(options, oid, OBJ_TREE, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree");
+			retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree");
 			break;
 		}
 
@@ -676,25 +678,25 @@ static int fsck_tree(const struct object_id *oid,
 	name_stack_clear(&df_dup_candidates);
 
 	if (has_null_sha1)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_NULL_SHA1, "contains entries pointing to null sha1");
+		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_NULL_SHA1, "contains entries pointing to null sha1");
 	if (has_full_path)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_FULL_PATHNAME, "contains full pathnames");
+		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_FULL_PATHNAME, "contains full pathnames");
 	if (has_empty_name)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_EMPTY_NAME, "contains empty pathname");
+		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_EMPTY_NAME, "contains empty pathname");
 	if (has_dot)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_HAS_DOT, "contains '.'");
+		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_HAS_DOT, "contains '.'");
 	if (has_dotdot)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_HAS_DOTDOT, "contains '..'");
+		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_HAS_DOTDOT, "contains '..'");
 	if (has_dotgit)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_HAS_DOTGIT, "contains '.git'");
+		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_HAS_DOTGIT, "contains '.git'");
 	if (has_zero_pad)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_ZERO_PADDED_FILEMODE, "contains zero-padded file modes");
+		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_ZERO_PADDED_FILEMODE, "contains zero-padded file modes");
 	if (has_bad_modes)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_BAD_FILEMODE, "contains bad file modes");
+		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_BAD_FILEMODE, "contains bad file modes");
 	if (has_dup_entries)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_DUPLICATE_ENTRIES, "contains duplicate file entries");
+		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_DUPLICATE_ENTRIES, "contains duplicate file entries");
 	if (not_properly_sorted)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_TREE_NOT_SORTED, "not properly sorted");
+		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_TREE_NOT_SORTED, "not properly sorted");
 	return retval;
 }
 
diff --git a/t/t7415-submodule-names.sh b/t/t7415-submodule-names.sh
index fef6561d80..6a8cf3f47b 100755
--- a/t/t7415-submodule-names.sh
+++ b/t/t7415-submodule-names.sh
@@ -148,12 +148,13 @@ test_expect_success 'fsck detects symlinked .gitmodules file' '
 		{
 			printf "100644 blob $content\t$tricky\n" &&
 			printf "120000 blob $target\t.gitmodules\n"
-		} | git mktree &&
+		} >bad-tree &&
+		tree=$(git mktree <bad-tree) &&
 
 		# Check not only that we fail, but that it is due to the
 		# symlink detector
 		test_must_fail git fsck 2>output &&
-		grep gitmodulesSymlink output
+		grep "tree $tree: gitmodulesSymlink" output
 	)
 '
 
-- 
2.31.1.875.g5dccece0aa


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 3/9] fsck_tree(): wrap some long lines
  2021-05-01 15:40 [PATCH 0/9] leftover bits from symlinked gitattributes, etc topics Jeff King
  2021-05-01 15:41 ` [PATCH 1/9] t7415: remove out-dated comment about translation Jeff King
  2021-05-01 15:41 ` [PATCH 2/9] fsck_tree(): fix shadowed variable Jeff King
@ 2021-05-01 15:41 ` Jeff King
  2021-05-03 11:22   ` Ævar Arnfjörð Bjarmason
  2021-05-01 15:42 ` [PATCH 4/9] t7415: rename to expand scope Jeff King
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 35+ messages in thread
From: Jeff King @ 2021-05-01 15:41 UTC (permalink / raw)
  To: git

Many calls to report() in fsck_tree() are kept on a single line and are
quite long. Most were pretty big to begin with, but have gotten even
longer over the years as we've added more parameters. Let's accept the
churn of wrapping them in order to conform to our usual line limits.

Signed-off-by: Jeff King <peff@peff.net>
---
 fsck.c | 48 ++++++++++++++++++++++++++++++++++++------------
 1 file changed, 36 insertions(+), 12 deletions(-)

diff --git a/fsck.c b/fsck.c
index dd31ed1413..db94817898 100644
--- a/fsck.c
+++ b/fsck.c
@@ -579,7 +579,9 @@ static int fsck_tree(const struct object_id *tree_oid,
 	struct name_stack df_dup_candidates = { NULL };
 
 	if (init_tree_desc_gently(&desc, buffer, size)) {
-		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree");
+		retval += report(options, tree_oid, OBJ_TREE,
+				 FSCK_MSG_BAD_TREE,
+				 "cannot be parsed as a tree");
 		return retval;
 	}
 
@@ -630,7 +632,9 @@ static int fsck_tree(const struct object_id *tree_oid,
 		}
 
 		if (update_tree_entry_gently(&desc)) {
-			retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree");
+			retval += report(options, tree_oid, OBJ_TREE,
+					 FSCK_MSG_BAD_TREE,
+					 "cannot be parsed as a tree");
 			break;
 		}
 
@@ -678,25 +682,45 @@ static int fsck_tree(const struct object_id *tree_oid,
 	name_stack_clear(&df_dup_candidates);
 
 	if (has_null_sha1)
-		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_NULL_SHA1, "contains entries pointing to null sha1");
+		retval += report(options, tree_oid, OBJ_TREE,
+				 FSCK_MSG_NULL_SHA1,
+				 "contains entries pointing to null sha1");
 	if (has_full_path)
-		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_FULL_PATHNAME, "contains full pathnames");
+		retval += report(options, tree_oid, OBJ_TREE,
+				 FSCK_MSG_FULL_PATHNAME,
+				 "contains full pathnames");
 	if (has_empty_name)
-		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_EMPTY_NAME, "contains empty pathname");
+		retval += report(options, tree_oid, OBJ_TREE,
+				 FSCK_MSG_EMPTY_NAME,
+				 "contains empty pathname");
 	if (has_dot)
-		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_HAS_DOT, "contains '.'");
+		retval += report(options, tree_oid, OBJ_TREE,
+				 FSCK_MSG_HAS_DOT,
+				 "contains '.'");
 	if (has_dotdot)
-		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_HAS_DOTDOT, "contains '..'");
+		retval += report(options, tree_oid, OBJ_TREE,
+				 FSCK_MSG_HAS_DOTDOT,
+				 "contains '..'");
 	if (has_dotgit)
-		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_HAS_DOTGIT, "contains '.git'");
+		retval += report(options, tree_oid, OBJ_TREE,
+				 FSCK_MSG_HAS_DOTGIT,
+				 "contains '.git'");
 	if (has_zero_pad)
-		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_ZERO_PADDED_FILEMODE, "contains zero-padded file modes");
+		retval += report(options, tree_oid, OBJ_TREE,
+				 FSCK_MSG_ZERO_PADDED_FILEMODE,
+				 "contains zero-padded file modes");
 	if (has_bad_modes)
-		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_BAD_FILEMODE, "contains bad file modes");
+		retval += report(options, tree_oid, OBJ_TREE,
+				 FSCK_MSG_BAD_FILEMODE,
+				 "contains bad file modes");
 	if (has_dup_entries)
-		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_DUPLICATE_ENTRIES, "contains duplicate file entries");
+		retval += report(options, tree_oid, OBJ_TREE,
+				 FSCK_MSG_DUPLICATE_ENTRIES,
+				 "contains duplicate file entries");
 	if (not_properly_sorted)
-		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_TREE_NOT_SORTED, "not properly sorted");
+		retval += report(options, tree_oid, OBJ_TREE,
+				 FSCK_MSG_TREE_NOT_SORTED,
+				 "not properly sorted");
 	return retval;
 }
 
-- 
2.31.1.875.g5dccece0aa


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 4/9] t7415: rename to expand scope
  2021-05-01 15:40 [PATCH 0/9] leftover bits from symlinked gitattributes, etc topics Jeff King
                   ` (2 preceding siblings ...)
  2021-05-01 15:41 ` [PATCH 3/9] fsck_tree(): wrap some long lines Jeff King
@ 2021-05-01 15:42 ` Jeff King
  2021-05-01 15:42 ` [PATCH 5/9] t7450: test verify_path() handling of gitmodules Jeff King
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 35+ messages in thread
From: Jeff King @ 2021-05-01 15:42 UTC (permalink / raw)
  To: git

This script has already expanded beyond its original intent of ".. in
submodule names" to include other malicious submodule bits. Let's update
the name and description to reflect that, as well as the fact that we'll
soon be adding similar tests for other dotfiles (.gitattributes, etc).
We'll also renumber it to move it out of the group of submodule-specific
tests.

Signed-off-by: Jeff King <peff@peff.net>
---
 ...submodule-names.sh => t7450-bad-git-dotfiles.sh} | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)
 rename t/{t7415-submodule-names.sh => t7450-bad-git-dotfiles.sh} (95%)

diff --git a/t/t7415-submodule-names.sh b/t/t7450-bad-git-dotfiles.sh
similarity index 95%
rename from t/t7415-submodule-names.sh
rename to t/t7450-bad-git-dotfiles.sh
index 6a8cf3f47b..34d4dc6def 100755
--- a/t/t7415-submodule-names.sh
+++ b/t/t7450-bad-git-dotfiles.sh
@@ -1,9 +1,16 @@
 #!/bin/sh
 
-test_description='check handling of .. in submodule names
+test_description='check broken or malicious patterns in .git* files
 
-Exercise the name-checking function on a variety of names, and then give a
-real-world setup that confirms we catch this in practice.
+Such as:
+
+  - presence of .. in submodule names;
+    Exercise the name-checking function on a variety of names, and then give a
+    real-world setup that confirms we catch this in practice.
+
+  - nested submodule names
+
+  - symlinked .gitmodules, etc
 '
 . ./test-lib.sh
 . "$TEST_DIRECTORY"/lib-pack.sh
-- 
2.31.1.875.g5dccece0aa


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 5/9] t7450: test verify_path() handling of gitmodules
  2021-05-01 15:40 [PATCH 0/9] leftover bits from symlinked gitattributes, etc topics Jeff King
                   ` (3 preceding siblings ...)
  2021-05-01 15:42 ` [PATCH 4/9] t7415: rename to expand scope Jeff King
@ 2021-05-01 15:42 ` Jeff King
  2021-05-01 18:55   ` Eric Sunshine
  2021-05-03 10:12   ` Ævar Arnfjörð Bjarmason
  2021-05-01 15:42 ` [PATCH 6/9] t7450: test .gitmodules symlink matching against obscured names Jeff King
                   ` (5 subsequent siblings)
  10 siblings, 2 replies; 35+ messages in thread
From: Jeff King @ 2021-05-01 15:42 UTC (permalink / raw)
  To: git

Commit 10ecfa7649 (verify_path: disallow symlinks in .gitmodules,
2018-05-04) made it impossible to load a symlink .gitmodules file into
the index. However, there are no tests of this behavior. Let's make sure
this case is covered. We can easily reuse the test setup created by
the matching b7b1fca175 (fsck: complain when .gitmodules is a symlink,
2018-05-04).

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t7450-bad-git-dotfiles.sh | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/t/t7450-bad-git-dotfiles.sh b/t/t7450-bad-git-dotfiles.sh
index 34d4dc6def..c021d4e752 100755
--- a/t/t7450-bad-git-dotfiles.sh
+++ b/t/t7450-bad-git-dotfiles.sh
@@ -139,7 +139,7 @@ test_expect_success 'index-pack --strict works for non-repo pack' '
 	grep gitmodulesName output
 '
 
-test_expect_success 'fsck detects symlinked .gitmodules file' '
+test_expect_success 'set up repo with symlinked .gitmodules file' '
 	git init symlink &&
 	(
 		cd symlink &&
@@ -155,8 +155,14 @@ test_expect_success 'fsck detects symlinked .gitmodules file' '
 		{
 			printf "100644 blob $content\t$tricky\n" &&
 			printf "120000 blob $target\t.gitmodules\n"
-		} >bad-tree &&
-		tree=$(git mktree <bad-tree) &&
+		} >bad-tree
+	) &&
+	tree=$(git -C symlink mktree <symlink/bad-tree)
+'
+
+test_expect_success 'fsck detects symlinked .gitmodules file' '
+	(
+		cd symlink &&
 
 		# Check not only that we fail, but that it is due to the
 		# symlink detector
@@ -165,6 +171,13 @@ test_expect_success 'fsck detects symlinked .gitmodules file' '
 	)
 '
 
+test_expect_success 'refuse to load symlinked .gitmodules into index' '
+	test_must_fail git -C symlink read-tree $tree 2>err &&
+	test_i18ngrep "invalid path.*gitmodules" err &&
+	git -C symlink ls-files >out &&
+	test_must_be_empty out
+'
+
 test_expect_success 'fsck detects non-blob .gitmodules' '
 	git init non-blob &&
 	(
-- 
2.31.1.875.g5dccece0aa


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 6/9] t7450: test .gitmodules symlink matching against obscured names
  2021-05-01 15:40 [PATCH 0/9] leftover bits from symlinked gitattributes, etc topics Jeff King
                   ` (4 preceding siblings ...)
  2021-05-01 15:42 ` [PATCH 5/9] t7450: test verify_path() handling of gitmodules Jeff King
@ 2021-05-01 15:42 ` Jeff King
  2021-05-01 15:42 ` [PATCH 7/9] t0060: test ntfs/hfs-obscured dotfiles Jeff King
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 35+ messages in thread
From: Jeff King @ 2021-05-01 15:42 UTC (permalink / raw)
  To: git

In t7450 we check that both verify_path() and fsck catch malformed
.gitmodules entries in trees. However, we don't check that we catch
filesystem-equivalent forms of these (e.g., ".GITMOD~1" on Windows).
Our name-matching functions are exercised well in t0060, but there's
nothing to test that we correctly call the matching functions from the
actual fsck and verify_path() code.

So instead of testing just .gitmodules, let's repeat our tests for a few
basic cases. We don't need to be exhaustive here (t0060 handles that),
but just make sure we hit one name of each type.

Besides pushing the tests into a function that takes the path as a
parameter, we'll need to do a few things:

  - adjust the directory name to accommodate the tests running multiple
    times

  - set core.protecthfs for index checks. Fsck always protects all types
    by default, but we want to be able to exercise the HFS routines on
    every system. Note that core.protectntfs is already the default
    these days, but it doesn't hurt to explicitly label our need for it.

  - we'll also take the filename ("gitmodules") as a parameter. All
    calls use the same name for now, but a future patch will extend this
    to handle other .gitfoo files. Note that our fake-content symlink
    destination is somewhat .gitmodules specific. But it isn't necessary
    for other files (which don't do a content check). And it happens to
    be a valid attribute and ignore file anyway.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t7450-bad-git-dotfiles.sh | 91 +++++++++++++++++++++----------------
 1 file changed, 53 insertions(+), 38 deletions(-)

diff --git a/t/t7450-bad-git-dotfiles.sh b/t/t7450-bad-git-dotfiles.sh
index c021d4e752..4e142c798f 100755
--- a/t/t7450-bad-git-dotfiles.sh
+++ b/t/t7450-bad-git-dotfiles.sh
@@ -139,44 +139,59 @@ test_expect_success 'index-pack --strict works for non-repo pack' '
 	grep gitmodulesName output
 '
 
-test_expect_success 'set up repo with symlinked .gitmodules file' '
-	git init symlink &&
-	(
-		cd symlink &&
-
-		# Make the tree directly to avoid index restrictions.
-		#
-		# Because symlinks store the target as a blob, choose
-		# a pathname that could be parsed as a .gitmodules file
-		# to trick naive non-symlink-aware checking.
-		tricky="[foo]bar=true" &&
-		content=$(git hash-object -w ../.gitmodules) &&
-		target=$(printf "$tricky" | git hash-object -w --stdin) &&
-		{
-			printf "100644 blob $content\t$tricky\n" &&
-			printf "120000 blob $target\t.gitmodules\n"
-		} >bad-tree
-	) &&
-	tree=$(git -C symlink mktree <symlink/bad-tree)
-'
-
-test_expect_success 'fsck detects symlinked .gitmodules file' '
-	(
-		cd symlink &&
-
-		# Check not only that we fail, but that it is due to the
-		# symlink detector
-		test_must_fail git fsck 2>output &&
-		grep "tree $tree: gitmodulesSymlink" output
-	)
-'
-
-test_expect_success 'refuse to load symlinked .gitmodules into index' '
-	test_must_fail git -C symlink read-tree $tree 2>err &&
-	test_i18ngrep "invalid path.*gitmodules" err &&
-	git -C symlink ls-files >out &&
-	test_must_be_empty out
-'
+check_dotx_symlink () {
+	name=$1
+	type=$2
+	path=$3
+	dir=symlink-$name-$type
+
+	test_expect_success "set up repo with symlinked $name ($type)" '
+		git init $dir &&
+		(
+			cd $dir &&
+
+			# Make the tree directly to avoid index restrictions.
+			#
+			# Because symlinks store the target as a blob, choose
+			# a pathname that could be parsed as a .gitmodules file
+			# to trick naive non-symlink-aware checking.
+			tricky="[foo]bar=true" &&
+			content=$(git hash-object -w ../.gitmodules) &&
+			target=$(printf "$tricky" | git hash-object -w --stdin) &&
+			{
+				printf "100644 blob $content\t$tricky\n" &&
+				printf "120000 blob $target\t$path\n"
+			} >bad-tree
+		) &&
+		tree=$(git -C $dir mktree <$dir/bad-tree)
+	'
+
+	test_expect_success "fsck detects symlinked $name ($type)" '
+		(
+			cd $dir &&
+
+			# Check not only that we fail, but that it is due to the
+			# symlink detector
+			test_must_fail git fsck 2>output &&
+			grep "tree $tree: ${name}Symlink" output
+		)
+	'
+
+	test_expect_success "refuse to load symlinked $name into index ($type)" '
+		test_must_fail \
+			git -C $dir \
+			    -c core.protectntfs \
+			    -c core.protecthfs \
+			    read-tree $tree 2>err &&
+		test_i18ngrep "invalid path.*$name" err &&
+		git -C $dir ls-files -s >out &&
+		test_must_be_empty out
+	'
+}
+
+check_dotx_symlink gitmodules vanilla .gitmodules
+check_dotx_symlink gitmodules ntfs ".gitmodules ."
+check_dotx_symlink gitmodules hfs ".${u200c}gitmodules"
 
 test_expect_success 'fsck detects non-blob .gitmodules' '
 	git init non-blob &&
-- 
2.31.1.875.g5dccece0aa


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 7/9] t0060: test ntfs/hfs-obscured dotfiles
  2021-05-01 15:40 [PATCH 0/9] leftover bits from symlinked gitattributes, etc topics Jeff King
                   ` (5 preceding siblings ...)
  2021-05-01 15:42 ` [PATCH 6/9] t7450: test .gitmodules symlink matching against obscured names Jeff King
@ 2021-05-01 15:42 ` Jeff King
  2021-05-01 15:43 ` [PATCH 8/9] fsck: warn about symlinked dotfiles we'll open with O_NOFOLLOW Jeff King
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 35+ messages in thread
From: Jeff King @ 2021-05-01 15:42 UTC (permalink / raw)
  To: git

We have tests that cover various filesystem-specific spellings of
".gitmodules", because we need to reliably identify that path for some
security checks. These are from dc2d9ba318 (is_{hfs,ntfs}_dotgitmodules:
add tests, 2018-05-12), with the actual code coming from e7cb0b4455
(is_ntfs_dotgit: match other .git files, 2018-05-11) and 0fc333ba20
(is_hfs_dotgit: match other .git files, 2018-05-02).

Those latter two commits also added similar matching functions for
.gitattributes and .gitignore. These ended up not being used in the
final series, and are currently dead code. But in preparation for them
being used in some fsck checks, let's make sure they actually work by
throwing a few basic tests at them. Likewise, let's cover .mailmap
(which does need matching code added).

I didn't bother with the whole battery of tests that we cover for
.gitmodules. These functions are all based on the same generic matcher,
so it's sufficient to test most of the corner cases just once.

Note that the ntfs magic prefix names in the tests come from the
algorithm described in e7cb0b4455 (and are different for each file).

Signed-off-by: Jeff King <peff@peff.net>
---
 cache.h                    |  1 +
 path.c                     |  5 +++++
 t/helper/test-path-utils.c | 46 +++++++++++++++++++++++++++-----------
 t/t0060-path-utils.sh      | 30 +++++++++++++++++++++++++
 utf8.c                     |  5 +++++
 utf8.h                     |  1 +
 6 files changed, 75 insertions(+), 13 deletions(-)

diff --git a/cache.h b/cache.h
index b785ffb383..e6dda88fb0 100644
--- a/cache.h
+++ b/cache.h
@@ -1271,6 +1271,7 @@ int is_ntfs_dotgit(const char *name);
 int is_ntfs_dotgitmodules(const char *name);
 int is_ntfs_dotgitignore(const char *name);
 int is_ntfs_dotgitattributes(const char *name);
+int is_ntfs_dotmailmap(const char *name);
 
 /*
  * Returns true iff "str" could be confused as a command-line option when
diff --git a/path.c b/path.c
index 9e883eb524..7bccd830e9 100644
--- a/path.c
+++ b/path.c
@@ -1493,6 +1493,11 @@ int is_ntfs_dotgitattributes(const char *name)
 	return is_ntfs_dot_str(name, "gitattributes", "gi7d29");
 }
 
+int is_ntfs_dotmailmap(const char *name)
+{
+	return is_ntfs_dot_str(name, "mailmap", "maba30");
+}
+
 int looks_like_command_line_option(const char *str)
 {
 	return str && str[0] == '-';
diff --git a/t/helper/test-path-utils.c b/t/helper/test-path-utils.c
index 313a153209..229ed416b0 100644
--- a/t/helper/test-path-utils.c
+++ b/t/helper/test-path-utils.c
@@ -172,9 +172,22 @@ static struct test_data dirname_data[] = {
 	{ NULL,              NULL     }
 };
 
-static int is_dotgitmodules(const char *path)
+static int check_dotfile(const char *x, const char **argv,
+			 int (*is_hfs)(const char *),
+			 int (*is_ntfs)(const char *))
 {
-	return is_hfs_dotgitmodules(path) || is_ntfs_dotgitmodules(path);
+	int res = 0, expect = 1;
+	for (; *argv; argv++) {
+		if (!strcmp("--not", *argv))
+			expect = !expect;
+		else if (expect != (is_hfs(*argv) || is_ntfs(*argv)))
+			res = error("'%s' is %s.git%s", *argv,
+				    expect ? "not " : "", x);
+		else
+			fprintf(stderr, "ok: '%s' is %s.git%s\n",
+				*argv, expect ? "" : "not ", x);
+	}
+	return !!res;
 }
 
 static int cmp_by_st_size(const void *a, const void *b)
@@ -382,17 +395,24 @@ int cmd__path_utils(int argc, const char **argv)
 		return test_function(dirname_data, posix_dirname, argv[1]);
 
 	if (argc > 2 && !strcmp(argv[1], "is_dotgitmodules")) {
-		int res = 0, expect = 1, i;
-		for (i = 2; i < argc; i++)
-			if (!strcmp("--not", argv[i]))
-				expect = !expect;
-			else if (expect != is_dotgitmodules(argv[i]))
-				res = error("'%s' is %s.gitmodules", argv[i],
-					    expect ? "not " : "");
-			else
-				fprintf(stderr, "ok: '%s' is %s.gitmodules\n",
-					argv[i], expect ? "" : "not ");
-		return !!res;
+		return check_dotfile("modules", argv + 2,
+				     is_hfs_dotgitmodules,
+				     is_ntfs_dotgitmodules);
+	}
+	if (argc > 2 && !strcmp(argv[1], "is_dotgitignore")) {
+		return check_dotfile("ignore", argv + 2,
+				     is_hfs_dotgitignore,
+				     is_ntfs_dotgitignore);
+	}
+	if (argc > 2 && !strcmp(argv[1], "is_dotgitattributes")) {
+		return check_dotfile("attributes", argv + 2,
+				     is_hfs_dotgitattributes,
+				     is_ntfs_dotgitattributes);
+	}
+	if (argc > 2 && !strcmp(argv[1], "is_dotmailmap")) {
+		return check_dotfile("mailmap", argv + 2,
+				     is_hfs_dotmailmap,
+				     is_ntfs_dotmailmap);
 	}
 
 	if (argc > 2 && !strcmp(argv[1], "file-size")) {
diff --git a/t/t0060-path-utils.sh b/t/t0060-path-utils.sh
index 0ff06b5d1b..de4960783f 100755
--- a/t/t0060-path-utils.sh
+++ b/t/t0060-path-utils.sh
@@ -468,6 +468,36 @@ test_expect_success 'match .gitmodules' '
 		.gitmodules,:\$DATA
 '
 
+test_expect_success 'match .gitattributes' '
+	test-tool path-utils is_dotgitattributes \
+		.gitattributes \
+		.git${u200c}attributes \
+		.Gitattributes \
+		.gitattributeS \
+		GITATT~1 \
+		GI7D29~1
+'
+
+test_expect_success 'match .gitignore' '
+	test-tool path-utils is_dotgitignore \
+		.gitignore \
+		.git${u200c}ignore \
+		.Gitignore \
+		.gitignorE \
+		GITIGN~1 \
+		GI250A~1
+'
+
+test_expect_success 'match .mailmap' '
+	test-tool path-utils is_dotmailmap \
+		.mailmap \
+		.mail${u200c}map \
+		.Mailmap \
+		.mailmaP \
+		MAILMA~1 \
+		MABA30~1
+'
+
 test_expect_success MINGW 'is_valid_path() on Windows' '
 	test-tool path-utils is_valid_path \
 		win32 \
diff --git a/utf8.c b/utf8.c
index 5b39361ada..de4ce5c0e6 100644
--- a/utf8.c
+++ b/utf8.c
@@ -777,6 +777,11 @@ int is_hfs_dotgitattributes(const char *path)
 	return is_hfs_dot_str(path, "gitattributes");
 }
 
+int is_hfs_dotmailmap(const char *path)
+{
+	return is_hfs_dot_str(path, "mailmap");
+}
+
 const char utf8_bom[] = "\357\273\277";
 
 int skip_utf8_bom(char **text, size_t len)
diff --git a/utf8.h b/utf8.h
index fcd5167baf..9a16c8679d 100644
--- a/utf8.h
+++ b/utf8.h
@@ -61,6 +61,7 @@ int is_hfs_dotgit(const char *path);
 int is_hfs_dotgitmodules(const char *path);
 int is_hfs_dotgitignore(const char *path);
 int is_hfs_dotgitattributes(const char *path);
+int is_hfs_dotmailmap(const char *path);
 
 typedef enum {
 	ALIGN_LEFT,
-- 
2.31.1.875.g5dccece0aa


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 8/9] fsck: warn about symlinked dotfiles we'll open with O_NOFOLLOW
  2021-05-01 15:40 [PATCH 0/9] leftover bits from symlinked gitattributes, etc topics Jeff King
                   ` (6 preceding siblings ...)
  2021-05-01 15:42 ` [PATCH 7/9] t0060: test ntfs/hfs-obscured dotfiles Jeff King
@ 2021-05-01 15:43 ` Jeff King
  2021-05-01 15:43 ` [PATCH 9/9] docs: document symlink restrictions for dot-files Jeff King
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 35+ messages in thread
From: Jeff King @ 2021-05-01 15:43 UTC (permalink / raw)
  To: git

In the commits merged in via 204333b015 (Merge branch
'jk/open-dotgitx-with-nofollow', 2021-03-22), we stopped following
symbolic links for .gitattributes, .gitignore, and .mailmap files.

Let's teach fsck to warn that these symlinks are not going to do
anything. Note that this is just a warning, and won't block the objects
via transfer.fsckObjects, since there are reported to be cases of this
in the wild (and even once fixed, they will continue to exist in the
commit history of those projects, but are not particularly dangerous).

Note that we won't add these to the existing gitmodules block in the
fsck code. The logic for gitmodules is a bit more complicated, as we
also check the content of non-symlink instances we find. But for these
new files, there is no content check; we're just looking at the name and
mode of the tree entry (and we can avoid even the complicated name
checks in the common case that the mode doesn't indicate a symlink).

We can reuse the test helper function we defined for .gitmodules, though
(it needs some slight adjustments for the fsck error code, and because
we don't block these symlinks via verify_path()).

Note that I didn't explicitly test the transfer.fsckObjects case here
(nor does the existing .gitmodules test that it blocks a push). The
translation of fsck severities to outcomes is covered in general in
t5504.

Signed-off-by: Jeff King <peff@peff.net>
---
 fsck.c                      | 18 ++++++++++++++++++
 fsck.h                      |  3 +++
 t/t7450-bad-git-dotfiles.sh | 29 +++++++++++++++++++++++++++--
 3 files changed, 48 insertions(+), 2 deletions(-)

diff --git a/fsck.c b/fsck.c
index db94817898..3ec500d707 100644
--- a/fsck.c
+++ b/fsck.c
@@ -614,6 +614,24 @@ static int fsck_tree(const struct object_id *tree_oid,
 						 ".gitmodules is a symbolic link");
 		}
 
+		if (S_ISLNK(mode)) {
+			if (is_hfs_dotgitignore(name) ||
+			    is_ntfs_dotgitignore(name))
+				retval += report(options, tree_oid, OBJ_TREE,
+						 FSCK_MSG_GITIGNORE_SYMLINK,
+						 ".gitignore is a symlink");
+			if (is_hfs_dotgitattributes(name) ||
+			    is_ntfs_dotgitattributes(name))
+				retval += report(options, tree_oid, OBJ_TREE,
+						 FSCK_MSG_GITATTRIBUTES_SYMLINK,
+						 ".gitattributes is a symlink");
+			if (is_hfs_dotmailmap(name) ||
+			    is_ntfs_dotmailmap(name))
+				retval += report(options, tree_oid, OBJ_TREE,
+						 FSCK_MSG_MAILMAP_SYMLINK,
+						 ".mailmap is a symlink");
+		}
+
 		if ((backslash = strchr(name, '\\'))) {
 			while (backslash) {
 				backslash++;
diff --git a/fsck.h b/fsck.h
index 7202c3c87e..d07f7a2459 100644
--- a/fsck.h
+++ b/fsck.h
@@ -67,6 +67,9 @@ enum fsck_msg_type {
 	FUNC(NUL_IN_COMMIT, WARN) \
 	/* infos (reported as warnings, but ignored by default) */ \
 	FUNC(GITMODULES_PARSE, INFO) \
+	FUNC(GITIGNORE_SYMLINK, INFO) \
+	FUNC(GITATTRIBUTES_SYMLINK, INFO) \
+	FUNC(MAILMAP_SYMLINK, INFO) \
 	FUNC(BAD_TAG_NAME, INFO) \
 	FUNC(MISSING_TAGGER_ENTRY, INFO) \
 	/* ignored (elevated when requested) */ \
diff --git a/t/t7450-bad-git-dotfiles.sh b/t/t7450-bad-git-dotfiles.sh
index 4e142c798f..164282ab8f 100755
--- a/t/t7450-bad-git-dotfiles.sh
+++ b/t/t7450-bad-git-dotfiles.sh
@@ -140,6 +140,18 @@ test_expect_success 'index-pack --strict works for non-repo pack' '
 '
 
 check_dotx_symlink () {
+	fsck_must_fail=test_must_fail
+	fsck_prefix=error
+	refuse_index=t
+	case "$1" in
+	--warning)
+		fsck_must_fail=
+		fsck_prefix=warning
+		refuse_index=
+		shift
+		;;
+	esac
+
 	name=$1
 	type=$2
 	path=$3
@@ -172,11 +184,12 @@ check_dotx_symlink () {
 
 			# Check not only that we fail, but that it is due to the
 			# symlink detector
-			test_must_fail git fsck 2>output &&
-			grep "tree $tree: ${name}Symlink" output
+			$fsck_must_fail git fsck 2>output &&
+			grep "$fsck_prefix.*tree $tree: ${name}Symlink" output
 		)
 	'
 
+	test -n "$refuse_index" &&
 	test_expect_success "refuse to load symlinked $name into index ($type)" '
 		test_must_fail \
 			git -C $dir \
@@ -193,6 +206,18 @@ check_dotx_symlink gitmodules vanilla .gitmodules
 check_dotx_symlink gitmodules ntfs ".gitmodules ."
 check_dotx_symlink gitmodules hfs ".${u200c}gitmodules"
 
+check_dotx_symlink --warning gitattributes vanilla .gitattributes
+check_dotx_symlink --warning gitattributes ntfs ".gitattributes ."
+check_dotx_symlink --warning gitattributes hfs ".${u200c}gitattributes"
+
+check_dotx_symlink --warning gitignore vanilla .gitignore
+check_dotx_symlink --warning gitignore ntfs ".gitignore ."
+check_dotx_symlink --warning gitignore hfs ".${u200c}gitignore"
+
+check_dotx_symlink --warning mailmap vanilla .mailmap
+check_dotx_symlink --warning mailmap ntfs ".mailmap ."
+check_dotx_symlink --warning mailmap hfs ".${u200c}mailmap"
+
 test_expect_success 'fsck detects non-blob .gitmodules' '
 	git init non-blob &&
 	(
-- 
2.31.1.875.g5dccece0aa


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 9/9] docs: document symlink restrictions for dot-files
  2021-05-01 15:40 [PATCH 0/9] leftover bits from symlinked gitattributes, etc topics Jeff King
                   ` (7 preceding siblings ...)
  2021-05-01 15:43 ` [PATCH 8/9] fsck: warn about symlinked dotfiles we'll open with O_NOFOLLOW Jeff King
@ 2021-05-01 15:43 ` Jeff King
  2021-05-01 19:16   ` Eric Sunshine
  2021-05-03  5:36 ` [PATCH 0/9] leftover bits from symlinked gitattributes, etc topics Junio C Hamano
  2021-05-03 20:42 ` [PATCH v2 " Jeff King
  10 siblings, 1 reply; 35+ messages in thread
From: Jeff King @ 2021-05-01 15:43 UTC (permalink / raw)
  To: git

We stopped allowing symlinks for .gitmodules files in 10ecfa7649
(verify_path: disallow symlinks in .gitmodules, 2018-05-04), and we
stopped following symlinks for .gitattributes, .gitignore, and .mailmap
in the commits from 204333b015 (Merge branch 'jk/open-dotgitx-with-nofollow',
2021-03-22). The reasons are discussed in detail there, but we never
adjusted the documentation to let users know.

This hasn't been a big deal since the point is that such setups were
mildly broken and thought to be unusual anyway. But it certainly doesn't
hurt to be clear and explicit about it.

Suggested-by: Philip Oakley <philipoakley@iee.email>
Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/gitattributes.txt | 7 +++++++
 Documentation/gitignore.txt     | 5 +++++
 Documentation/gitmailmap.txt    | 7 +++++++
 Documentation/gitmodules.txt    | 8 ++++++++
 4 files changed, 27 insertions(+)

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index cfcfa800c2..dfda94d996 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -1247,6 +1247,13 @@ to:
 [attr]binary -diff -merge -text
 ------------
 
+NOTES
+-----
+
+Note that Git does not follow symbolic links when accessing a
+`.gitattributes` file in the working tree. This keeps behavior
+consistent when the file is accessed from the index or a tree versus
+from the filesystem.
 
 EXAMPLES
 --------
diff --git a/Documentation/gitignore.txt b/Documentation/gitignore.txt
index 5751603b13..4b6fd8d2cd 100644
--- a/Documentation/gitignore.txt
+++ b/Documentation/gitignore.txt
@@ -149,6 +149,11 @@ not tracked by Git remain untracked.
 To stop tracking a file that is currently tracked, use
 'git rm --cached'.
 
+Note that Git does not follow symbolic links when accessing a
+`.gitignore` file in the working tree. This keeps behavior consistent
+when the file is accessed from the index or a tree versus from the
+filesystem.
+
 EXAMPLES
 --------
 
diff --git a/Documentation/gitmailmap.txt b/Documentation/gitmailmap.txt
index 3fb39f801f..eb65eeb37f 100644
--- a/Documentation/gitmailmap.txt
+++ b/Documentation/gitmailmap.txt
@@ -55,6 +55,13 @@ this would also match the 'Commit Name <commit&#64;email.xx>' above:
 	Proper Name <proper@email.xx> CoMmIt NaMe <CoMmIt@EmAiL.xX>
 --
 
+NOTES
+-----
+
+Note that Git does not follow symbolic links when accessing a `.mailmap`
+file in the working tree. This keeps behavior consistent when the file
+is accessed from the index or a tree versus from the filesystem.
+
 EXAMPLES
 --------
 
diff --git a/Documentation/gitmodules.txt b/Documentation/gitmodules.txt
index 8e333dde1b..ca1c42b405 100644
--- a/Documentation/gitmodules.txt
+++ b/Documentation/gitmodules.txt
@@ -98,6 +98,14 @@ submodule.<name>.shallow::
 	shallow clone (with a history depth of 1) unless the user explicitly
 	asks for a non-shallow clone.
 
+NOTES
+-----
+
+Note that Git does not allow the `.gitmodules` file within a working
+tree to be a symbolic link, and will refuse to check out such a tree
+entry. This keeps behavior consistent when the file is accessed from the
+index or a tree versus from the filesystem, and helps Git reliably
+enforce security checks of the file contents.
 
 EXAMPLES
 --------
-- 
2.31.1.875.g5dccece0aa

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 5/9] t7450: test verify_path() handling of gitmodules
  2021-05-01 15:42 ` [PATCH 5/9] t7450: test verify_path() handling of gitmodules Jeff King
@ 2021-05-01 18:55   ` Eric Sunshine
  2021-05-01 19:03     ` Eric Sunshine
  2021-05-03 10:12   ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 35+ messages in thread
From: Eric Sunshine @ 2021-05-01 18:55 UTC (permalink / raw)
  To: Jeff King; +Cc: Git List

On Sat, May 1, 2021 at 11:42 AM Jeff King <peff@peff.net> wrote:
> Commit 10ecfa7649 (verify_path: disallow symlinks in .gitmodules,
> 2018-05-04) made it impossible to load a symlink .gitmodules file into
> the index. However, there are no tests of this behavior. Let's make sure
> this case is covered. We can easily reuse the test setup created by
> the matching b7b1fca175 (fsck: complain when .gitmodules is a symlink,
> 2018-05-04).
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> diff --git a/t/t7450-bad-git-dotfiles.sh b/t/t7450-bad-git-dotfiles.sh
> @@ -139,7 +139,7 @@ test_expect_success 'index-pack --strict works for non-repo pack' '
> -test_expect_success 'fsck detects symlinked .gitmodules file' '
> +test_expect_success 'set up repo with symlinked .gitmodules file' '
>         git init symlink &&
>         (
>                 cd symlink &&
> @@ -155,8 +155,14 @@ test_expect_success 'fsck detects symlinked .gitmodules file' '
>                 {
>                         printf "100644 blob $content\t$tricky\n" &&
>                         printf "120000 blob $target\t.gitmodules\n"
> -               } >bad-tree &&
> -               tree=$(git mktree <bad-tree) &&
> +               } >bad-tree
> +       ) &&
> +       tree=$(git -C symlink mktree <symlink/bad-tree)
> +'

`tree` is an unusually generic name for this now-global variable. One
can easily imagine it being re-used by some unrelated test arbitrarily
inserted into this script, thus potentially breaking the following
tests which depend upon it. I wonder if a name such as `BAD_TREE`
would be more appropriate.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 5/9] t7450: test verify_path() handling of gitmodules
  2021-05-01 18:55   ` Eric Sunshine
@ 2021-05-01 19:03     ` Eric Sunshine
  2021-05-03 19:39       ` Jeff King
  0 siblings, 1 reply; 35+ messages in thread
From: Eric Sunshine @ 2021-05-01 19:03 UTC (permalink / raw)
  To: Jeff King; +Cc: Git List

On Sat, May 1, 2021 at 2:55 PM Eric Sunshine <sunshine@sunshineco.com> wrote:
> On Sat, May 1, 2021 at 11:42 AM Jeff King <peff@peff.net> wrote:
> > +       tree=$(git -C symlink mktree <symlink/bad-tree)
>
> `tree` is an unusually generic name for this now-global variable. One
> can easily imagine it being re-used by some unrelated test arbitrarily
> inserted into this script, thus potentially breaking the following
> tests which depend upon it. I wonder if a name such as `BAD_TREE`
> would be more appropriate.

I see that all `$tree` references get encapsulated into a shell
function by the next patch, so perhaps the generic name `tree` isn't a
big deal after all.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 9/9] docs: document symlink restrictions for dot-files
  2021-05-01 15:43 ` [PATCH 9/9] docs: document symlink restrictions for dot-files Jeff King
@ 2021-05-01 19:16   ` Eric Sunshine
  2021-05-03 20:33     ` Jeff King
  0 siblings, 1 reply; 35+ messages in thread
From: Eric Sunshine @ 2021-05-01 19:16 UTC (permalink / raw)
  To: Jeff King; +Cc: Git List

On Sat, May 1, 2021 at 11:43 AM Jeff King <peff@peff.net> wrote:
> We stopped allowing symlinks for .gitmodules files in 10ecfa7649
> (verify_path: disallow symlinks in .gitmodules, 2018-05-04), and we
> stopped following symlinks for .gitattributes, .gitignore, and .mailmap
> in the commits from 204333b015 (Merge branch 'jk/open-dotgitx-with-nofollow',
> 2021-03-22). The reasons are discussed in detail there, but we never
> adjusted the documentation to let users know.
>
> This hasn't been a big deal since the point is that such setups were
> mildly broken and thought to be unusual anyway. But it certainly doesn't
> hurt to be clear and explicit about it.

Just a really microscopic nit... feel free to ignore...

> diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
> @@ -1247,6 +1247,13 @@ to:
> +NOTES
> +-----
> +
> +Note that Git does not follow symbolic links when accessing a
> +`.gitattributes` file in the working tree. This keeps behavior
> +consistent when the file is accessed from the index or a tree versus
> +from the filesystem.

We're in the "NOTES" section, so it feels redundant to begin the
sentence with "Note that".

> diff --git a/Documentation/gitmodules.txt b/Documentation/gitmodules.txt
> @@ -98,6 +98,14 @@ submodule.<name>.shallow::
> +NOTES
> +-----
> +
> +Note that Git does not allow the `.gitmodules` file within a working
> +tree to be a symbolic link, and will refuse to check out such a tree
> +entry. This keeps behavior consistent when the file is accessed from the
> +index or a tree versus from the filesystem, and helps Git reliably
> +enforce security checks of the file contents.

Ditto.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/9] leftover bits from symlinked gitattributes, etc topics
  2021-05-01 15:40 [PATCH 0/9] leftover bits from symlinked gitattributes, etc topics Jeff King
                   ` (8 preceding siblings ...)
  2021-05-01 15:43 ` [PATCH 9/9] docs: document symlink restrictions for dot-files Jeff King
@ 2021-05-03  5:36 ` Junio C Hamano
  2021-05-03 20:42 ` [PATCH v2 " Jeff King
  10 siblings, 0 replies; 35+ messages in thread
From: Junio C Hamano @ 2021-05-03  5:36 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> A while back, I had a topic[1] that treated symlinked .gitattributes
> (and .gitignore and .mailmap) the same as .gitmodules: forbidding them
> in the index, complaining about them via fsck, etc.
>
> In the end, we decided not to do that[2], and instead just open the
> files with O_NOFOLLOW instead. As I said in that thread, we could
> salvage some of the cleanups, fsck checks, and docs from the original
> topic. So here that is. (The new topic is in master but not yet
> released; so while this is not strictly a bug-fix for an existing topic,
> it would be good to get especially the doc improvements into the same
> release).

Thanks.


> [1] https://lore.kernel.org/git/20201005071751.GA2290770@coredump.intra.peff.net/
> [2] https://lore.kernel.org/git/YDiWs6yyv3U9YvC2@coredump.intra.peff.net/
>
> The patches are:
>
>   [1/9]: t7415: remove out-dated comment about translation
>   [2/9]: fsck_tree(): fix shadowed variable
>   [3/9]: fsck_tree(): wrap some long lines
>
>     These three are really independent fixes and cleanups, that could be
>     taken separately.
>
>   [4/9]: t7415: rename to expand scope
>   [5/9]: t7450: test verify_path() handling of gitmodules
>   [6/9]: t7450: test .gitmodules symlink matching against obscured names
>
>     These three are just cleaning up and improving the tests for the
>     existing handling of the .gitmodules symlinks.
>
>   [7/9]: t0060: test ntfs/hfs-obscured dotfiles
>   [8/9]: fsck: warn about symlinked dotfiles we'll open with O_NOFOLLOW
>
>     These two extend fsck checks (warnings, not errors) to the new
>     files. I think this is an improvement, but I could be persuaded
>     otherwise (and these two could be dropped independent of the rest).
>
>   [9/9]: docs: document symlink restrictions for dot-files
>
>     And this covers documentation for all of the files (including
>     .gitmodules).
>
>  Documentation/gitattributes.txt               |   7 ++
>  Documentation/gitignore.txt                   |   5 +
>  Documentation/gitmailmap.txt                  |   7 ++
>  Documentation/gitmodules.txt                  |   8 ++
>  cache.h                                       |   1 +
>  fsck.c                                        |  84 ++++++++++---
>  fsck.h                                        |   3 +
>  path.c                                        |   5 +
>  t/helper/test-path-utils.c                    |  46 +++++--
>  t/t0060-path-utils.sh                         |  30 +++++
>  ...ule-names.sh => t7450-bad-git-dotfiles.sh} | 116 +++++++++++++-----
>  utf8.c                                        |   5 +
>  utf8.h                                        |   1 +
>  13 files changed, 257 insertions(+), 61 deletions(-)
>  rename t/{t7415-submodule-names.sh => t7450-bad-git-dotfiles.sh} (70%)
>
> -Peff

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/9] t7415: remove out-dated comment about translation
  2021-05-01 15:41 ` [PATCH 1/9] t7415: remove out-dated comment about translation Jeff King
@ 2021-05-03  9:46   ` Ævar Arnfjörð Bjarmason
  2021-05-03 20:29     ` Jeff King
  0 siblings, 1 reply; 35+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-05-03  9:46 UTC (permalink / raw)
  To: Jeff King; +Cc: git, SZEDER Gábor, Junio C Hamano


On Sat, May 01 2021, Jeff King wrote:

> Since GETTEXT_POISON does not exist anymore, there is no point warning
> people about whether we should use test_i18ngrep. This is doubly
> confusing because the comment was describing why it was OK to use grep,
> but it got caught up in the mass conversion of 674ba34038 (fsck: mark
> strings for translation, 2018-11-10).
>
> Note there are other uses of test_i18ngrep in this script which are now
> obsolete; I'll save those for a mass-cleanup. My goal here was just to
> fix the confusing comment in code I'm about to refactor.

For what it's worth between [1] and [2] I'm not sure what to do about
the test_i18ngrep cleanup. I think your patch below is fine, but the
"test_i18ngrep" has mutated into a "grep with debugging", not just
something needed for GETTEXT_POISON.

So that part of your patch right now is making it less friendly for
debugging. I don't care, and think if we want that we'd be better of
scraping the trace ouput for such common cases and/or use "verbose grep
[...]" and teach the "verbose" wrapper about these common cases, but
knowing of that objection + having other outstanding things has been why
I haven't sent s/test_i18ngrep/grep/g patches.

1. https://lore.kernel.org/git/20210120152132.GC8396@szeder.dev/
2. https://lore.kernel.org/git/20210120182759.31102-1-avarab@gmail.com/

> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  t/t7415-submodule-names.sh | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/t/t7415-submodule-names.sh b/t/t7415-submodule-names.sh
> index f70368bc2e..fef6561d80 100755
> --- a/t/t7415-submodule-names.sh
> +++ b/t/t7415-submodule-names.sh
> @@ -151,10 +151,9 @@ test_expect_success 'fsck detects symlinked .gitmodules file' '
>  		} | git mktree &&
>  
>  		# Check not only that we fail, but that it is due to the
> -		# symlink detector; this grep string comes from the config
> -		# variable name and will not be translated.
> +		# symlink detector
>  		test_must_fail git fsck 2>output &&
> -		test_i18ngrep gitmodulesSymlink output
> +		grep gitmodulesSymlink output
>  	)
>  '


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 5/9] t7450: test verify_path() handling of gitmodules
  2021-05-01 15:42 ` [PATCH 5/9] t7450: test verify_path() handling of gitmodules Jeff King
  2021-05-01 18:55   ` Eric Sunshine
@ 2021-05-03 10:12   ` Ævar Arnfjörð Bjarmason
  2021-05-03 20:32     ` Jeff King
  1 sibling, 1 reply; 35+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-05-03 10:12 UTC (permalink / raw)
  To: Jeff King; +Cc: git


On Sat, May 01 2021, Jeff King wrote:

> +test_expect_success 'refuse to load symlinked .gitmodules into index' '
> +	test_must_fail git -C symlink read-tree $tree 2>err &&
> +	test_i18ngrep "invalid path.*gitmodules" err &&
> +	git -C symlink ls-files >out &&
> +	test_must_be_empty out
> +'
> +

In 1/9 a test_i18ngrep is removed, but here's a new one.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/9] fsck_tree(): fix shadowed variable
  2021-05-01 15:41 ` [PATCH 2/9] fsck_tree(): fix shadowed variable Jeff King
@ 2021-05-03 11:15   ` Ævar Arnfjörð Bjarmason
  2021-05-03 20:13     ` Jeff King
  0 siblings, 1 reply; 35+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-05-03 11:15 UTC (permalink / raw)
  To: Jeff King; +Cc: git


On Sat, May 01 2021, Jeff King wrote:

> Commit b2f2039c2b (fsck: accept an oid instead of a "struct tree" for
> fsck_tree(), 2019-10-18) introduced a new "oid" parameter to
> fsck_tree(), and we pass it to the report() function when we find
> problems. However, that is shadowed within the tree-walking loop by the
> existing "oid" variable which we use to store the oid of each tree
> entry. As a result, we may report the wrong oid for some problems we
> detect within the loop (the entry oid, instead of the tree oid).
>
> Our tests didn't catch this because they checked only that we found the
> expected fsck problem, not that it was attached to the correct object.
>
> Let's rename both variables in the function to avoid confusion. This
> makes the diff a little noisy (e.g., all of the report() calls outside
> the loop were already correct but need to be touched), but makes sure we
> catch all cases and will avoid similar confusion in the future.
>
> And we can update the test to be a bit more specific and catch this
> problem.

Have you considered just passing down the "obj" instead of the "oid"?
It's a bit of a bigger change, but seems to be worth it in this case as
the below diff (on top of master) shows. We spend quite a bit of effort
peeling an "obj" just to pass oid/type further down the stack.

If you're interested in picking that up consider it to have by SOB, but
I don't think me submitting it while you have this in-flight is worth
the conflict.

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 87a99b0108e..5ce8d4792d6 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -51,18 +51,18 @@ static int name_objects;
 #define ERROR_COMMIT_GRAPH 020
 #define ERROR_MULTI_PACK_INDEX 040
 
-static const char *describe_object(const struct object_id *oid)
+static const char *describe_object(const struct object *obj)
 {
-	return fsck_describe_object(&fsck_walk_options, oid);
+	return fsck_describe_object(&fsck_walk_options, obj);
 }
 
-static const char *printable_type(const struct object_id *oid,
-				  enum object_type type)
+static const char *printable_type(const struct object *obj)
 {
 	const char *ret;
+	enum object_type type = obj->type;
 
 	if (type == OBJ_NONE)
-		type = oid_object_info(the_repository, oid, NULL);
+		type = oid_object_info(the_repository, &obj->oid, NULL);
 
 	ret = type_name(type);
 	if (!ret)
@@ -76,14 +76,13 @@ static int objerror(struct object *obj, const char *err)
 	errors_found |= ERROR_OBJECT;
 	/* TRANSLATORS: e.g. error in tree 01bfda: <more explanation> */
 	fprintf_ln(stderr, _("error in %s %s: %s"),
-		   printable_type(&obj->oid, obj->type),
-		   describe_object(&obj->oid), err);
+		   printable_type(obj),
+		   describe_object(obj), err);
 	return -1;
 }
 
 static int fsck_error_func(struct fsck_options *o,
-			   const struct object_id *oid,
-			   enum object_type object_type,
+			   const struct object *obj,
 			   enum fsck_msg_type msg_type,
 			   enum fsck_msg_id msg_id,
 			   const char *message)
@@ -92,14 +91,14 @@ static int fsck_error_func(struct fsck_options *o,
 	case FSCK_WARN:
 		/* TRANSLATORS: e.g. warning in tree 01bfda: <more explanation> */
 		fprintf_ln(stderr, _("warning in %s %s: %s"),
-			   printable_type(oid, object_type),
-			   describe_object(oid), message);
+			   printable_type(obj),
+			   describe_object(obj), message);
 		return 0;
 	case FSCK_ERROR:
 		/* TRANSLATORS: e.g. error in tree 01bfda: <more explanation> */
 		fprintf_ln(stderr, _("error in %s %s: %s"),
-			   printable_type(oid, object_type),
-			   describe_object(oid), message);
+			   printable_type(obj),
+			   describe_object(obj), message);
 		return 1;
 	default:
 		BUG("%d (FSCK_IGNORE?) should never trigger this callback",
@@ -121,8 +120,8 @@ static int mark_object(struct object *obj, int type, void *data, struct fsck_opt
 	if (!obj) {
 		/* ... these references to parent->fld are safe here */
 		printf_ln(_("broken link from %7s %s"),
-			  printable_type(&parent->oid, parent->type),
-			  describe_object(&parent->oid));
+			  printable_type(parent),
+			  describe_object(parent));
 		printf_ln(_("broken link from %7s %s"),
 			  (type == OBJ_ANY ? _("unknown") : type_name(type)),
 			  _("unknown"));
@@ -150,10 +149,10 @@ static int mark_object(struct object *obj, int type, void *data, struct fsck_opt
 		if (parent && !has_object(the_repository, &obj->oid, 1)) {
 			printf_ln(_("broken link from %7s %s\n"
 				    "              to %7s %s"),
-				  printable_type(&parent->oid, parent->type),
-				  describe_object(&parent->oid),
-				  printable_type(&obj->oid, obj->type),
-				  describe_object(&obj->oid));
+				  printable_type(parent),
+				  describe_object(parent),
+				  printable_type(obj),
+				  describe_object(obj));
 			errors_found |= ERROR_REACHABLE;
 		}
 		return 1;
@@ -261,8 +260,8 @@ static void check_reachable_object(struct object *obj)
 		if (has_object_pack(&obj->oid))
 			return; /* it is in pack - forget about it */
 		printf_ln(_("missing %s %s"),
-			  printable_type(&obj->oid, obj->type),
-			  describe_object(&obj->oid));
+			  printable_type(obj),
+			  describe_object(obj));
 		errors_found |= ERROR_REACHABLE;
 		return;
 	}
@@ -288,8 +287,8 @@ static void check_unreachable_object(struct object *obj)
 	 */
 	if (show_unreachable) {
 		printf_ln(_("unreachable %s %s"),
-			  printable_type(&obj->oid, obj->type),
-			  describe_object(&obj->oid));
+			  printable_type(obj),
+			  describe_object(obj));
 		return;
 	}
 
@@ -308,12 +307,12 @@ static void check_unreachable_object(struct object *obj)
 	if (!(obj->flags & USED)) {
 		if (show_dangling)
 			printf_ln(_("dangling %s %s"),
-				  printable_type(&obj->oid, obj->type),
-				  describe_object(&obj->oid));
+				  printable_type(obj),
+				  describe_object(obj));
 		if (write_lost_and_found) {
 			char *filename = git_pathdup("lost-found/%s/%s",
 				obj->type == OBJ_COMMIT ? "commit" : "other",
-				describe_object(&obj->oid));
+				describe_object(obj));
 			FILE *f;
 
 			if (safe_create_leading_directories_const(filename)) {
@@ -326,7 +325,7 @@ static void check_unreachable_object(struct object *obj)
 				if (stream_blob_to_fd(fileno(f), &obj->oid, NULL, 1))
 					die_errno(_("could not write '%s'"), filename);
 			} else
-				fprintf(f, "%s\n", describe_object(&obj->oid));
+				fprintf(f, "%s\n", describe_object(obj));
 			if (fclose(f))
 				die_errno(_("could not finish '%s'"),
 					  filename);
@@ -345,7 +344,7 @@ static void check_unreachable_object(struct object *obj)
 static void check_object(struct object *obj)
 {
 	if (verbose)
-		fprintf_ln(stderr, _("Checking %s"), describe_object(&obj->oid));
+		fprintf_ln(stderr, _("Checking %s"), describe_object(obj));
 
 	if (obj->flags & REACHABLE)
 		check_reachable_object(obj);
@@ -403,8 +402,8 @@ static int fsck_obj(struct object *obj, void *buffer, unsigned long size)
 
 	if (verbose)
 		fprintf_ln(stderr, _("Checking %s %s"),
-			   printable_type(&obj->oid, obj->type),
-			   describe_object(&obj->oid));
+			   printable_type(obj),
+			   describe_object(obj));
 
 	if (fsck_walk(obj, NULL, &fsck_obj_options))
 		objerror(obj, _("broken links"));
@@ -417,7 +416,7 @@ static int fsck_obj(struct object *obj, void *buffer, unsigned long size)
 
 		if (!commit->parents && show_root)
 			printf_ln(_("root %s"),
-				  describe_object(&commit->object.oid));
+				  describe_object(&commit->object));
 	}
 
 	if (obj->type == OBJ_TAG) {
@@ -425,10 +424,10 @@ static int fsck_obj(struct object *obj, void *buffer, unsigned long size)
 
 		if (show_tags && tag->tagged) {
 			printf_ln(_("tagged %s %s (%s) in %s"),
-				  printable_type(&tag->tagged->oid, tag->tagged->type),
-				  describe_object(&tag->tagged->oid),
+				  printable_type(tag->tagged),
+				  describe_object(tag->tagged),
 				  tag->tag,
-				  describe_object(&tag->object.oid));
+				  describe_object(&tag->object));
 		}
 	}
 
diff --git a/builtin/mktag.c b/builtin/mktag.c
index dddcccdd368..4ea768d6cc1 100644
--- a/builtin/mktag.c
+++ b/builtin/mktag.c
@@ -15,8 +15,7 @@ static int option_strict = 1;
 static struct fsck_options fsck_options = FSCK_OPTIONS_STRICT;
 
 static int mktag_fsck_error_func(struct fsck_options *o,
-				 const struct object_id *oid,
-				 enum object_type object_type,
+				 const struct object *obj,
 				 enum fsck_msg_type msg_type,
 				 enum fsck_msg_id msg_id,
 				 const char *message)
diff --git a/fsck.c b/fsck.c
index f5ed6a26358..4ded78b5af9 100644
--- a/fsck.c
+++ b/fsck.c
@@ -199,14 +199,14 @@ void fsck_set_msg_types(struct fsck_options *options, const char *values)
 }
 
 static int object_on_skiplist(struct fsck_options *opts,
-			      const struct object_id *oid)
+			      const struct object *obj)
 {
-	return opts && oid && oidset_contains(&opts->skiplist, oid);
+	return opts && obj && oidset_contains(&opts->skiplist, &obj->oid);
 }
 
-__attribute__((format (printf, 5, 6)))
+__attribute__((format (printf, 4, 5)))
 static int report(struct fsck_options *options,
-		  const struct object_id *oid, enum object_type object_type,
+		  const struct object *obj,
 		  enum fsck_msg_id msg_id, const char *fmt, ...)
 {
 	va_list ap;
@@ -217,7 +217,7 @@ static int report(struct fsck_options *options,
 	if (msg_type == FSCK_IGNORE)
 		return 0;
 
-	if (object_on_skiplist(options, oid))
+	if (object_on_skiplist(options, obj))
 		return 0;
 
 	if (msg_type == FSCK_FATAL)
@@ -230,7 +230,7 @@ static int report(struct fsck_options *options,
 
 	va_start(ap, fmt);
 	strbuf_vaddf(&sb, fmt, ap);
-	result = options->error_func(options, oid, object_type,
+	result = options->error_func(options, obj,
 				     msg_type, msg_id, sb.buf);
 	strbuf_release(&sb);
 	va_end(ap);
@@ -278,8 +278,9 @@ void fsck_put_object_name(struct fsck_options *options,
 }
 
 const char *fsck_describe_object(struct fsck_options *options,
-				 const struct object_id *oid)
+				 const struct object *obj)
 {
+	const struct object_id *oid = &obj->oid;
 	static struct strbuf bufs[] = {
 		STRBUF_INIT, STRBUF_INIT, STRBUF_INIT, STRBUF_INIT
 	};
@@ -333,7 +334,7 @@ static int fsck_walk_tree(struct tree *tree, void *data, struct fsck_options *op
 		}
 		else {
 			result = error("in tree %s: entry %s has bad mode %.6o",
-				       fsck_describe_object(options, &tree->object.oid),
+				       fsck_describe_object(options, &tree->object),
 				       entry.path, entry.mode);
 		}
 		if (result < 0)
@@ -443,7 +444,7 @@ int fsck_walk(struct object *obj, void *data, struct fsck_options *options)
 		return fsck_walk_tag((struct tag *)obj, data, options);
 	default:
 		error("Unknown object type for %s",
-		      fsck_describe_object(options, &obj->oid));
+		      fsck_describe_object(options, obj));
 		return -1;
 	}
 }
@@ -558,7 +559,7 @@ static int verify_ordered(unsigned mode1, const char *name1,
 	return c1 < c2 ? 0 : TREE_UNORDERED;
 }
 
-static int fsck_tree(const struct object_id *oid,
+static int fsck_tree(const struct object *obj,
 		     const char *buffer, unsigned long size,
 		     struct fsck_options *options)
 {
@@ -579,7 +580,7 @@ static int fsck_tree(const struct object_id *oid,
 	struct name_stack df_dup_candidates = { NULL };
 
 	if (init_tree_desc_gently(&desc, buffer, size)) {
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree");
+		retval += report(options, obj, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree");
 		return retval;
 	}
 
@@ -606,7 +607,7 @@ static int fsck_tree(const struct object_id *oid,
 				oidset_insert(&options->gitmodules_found, oid);
 			else
 				retval += report(options,
-						 oid, OBJ_TREE,
+						 obj,
 						 FSCK_MSG_GITMODULES_SYMLINK,
 						 ".gitmodules is a symbolic link");
 		}
@@ -619,7 +620,7 @@ static int fsck_tree(const struct object_id *oid,
 					if (!S_ISLNK(mode))
 						oidset_insert(&options->gitmodules_found, oid);
 					else
-						retval += report(options, oid, OBJ_TREE,
+						retval += report(options, obj,
 								 FSCK_MSG_GITMODULES_SYMLINK,
 								 ".gitmodules is a symbolic link");
 				}
@@ -628,7 +629,7 @@ static int fsck_tree(const struct object_id *oid,
 		}
 
 		if (update_tree_entry_gently(&desc)) {
-			retval += report(options, oid, OBJ_TREE, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree");
+			retval += report(options, obj, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree");
 			break;
 		}
 
@@ -676,30 +677,30 @@ static int fsck_tree(const struct object_id *oid,
 	name_stack_clear(&df_dup_candidates);
 
 	if (has_null_sha1)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_NULL_SHA1, "contains entries pointing to null sha1");
+		retval += report(options, obj, FSCK_MSG_NULL_SHA1, "contains entries pointing to null sha1");
 	if (has_full_path)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_FULL_PATHNAME, "contains full pathnames");
+		retval += report(options, obj, FSCK_MSG_FULL_PATHNAME, "contains full pathnames");
 	if (has_empty_name)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_EMPTY_NAME, "contains empty pathname");
+		retval += report(options, obj, FSCK_MSG_EMPTY_NAME, "contains empty pathname");
 	if (has_dot)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_HAS_DOT, "contains '.'");
+		retval += report(options, obj, FSCK_MSG_HAS_DOT, "contains '.'");
 	if (has_dotdot)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_HAS_DOTDOT, "contains '..'");
+		retval += report(options, obj, FSCK_MSG_HAS_DOTDOT, "contains '..'");
 	if (has_dotgit)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_HAS_DOTGIT, "contains '.git'");
+		retval += report(options, obj, FSCK_MSG_HAS_DOTGIT, "contains '.git'");
 	if (has_zero_pad)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_ZERO_PADDED_FILEMODE, "contains zero-padded file modes");
+		retval += report(options, obj, FSCK_MSG_ZERO_PADDED_FILEMODE, "contains zero-padded file modes");
 	if (has_bad_modes)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_BAD_FILEMODE, "contains bad file modes");
+		retval += report(options, obj, FSCK_MSG_BAD_FILEMODE, "contains bad file modes");
 	if (has_dup_entries)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_DUPLICATE_ENTRIES, "contains duplicate file entries");
+		retval += report(options, obj, FSCK_MSG_DUPLICATE_ENTRIES, "contains duplicate file entries");
 	if (not_properly_sorted)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_TREE_NOT_SORTED, "not properly sorted");
+		retval += report(options, obj, FSCK_MSG_TREE_NOT_SORTED, "not properly sorted");
 	return retval;
 }
 
 static int verify_headers(const void *data, unsigned long size,
-			  const struct object_id *oid, enum object_type type,
+			  const struct object *obj,
 			  struct fsck_options *options)
 {
 	const char *buffer = (const char *)data;
@@ -708,7 +709,7 @@ static int verify_headers(const void *data, unsigned long size,
 	for (i = 0; i < size; i++) {
 		switch (buffer[i]) {
 		case '\0':
-			return report(options, oid, type,
+			return report(options, obj,
 				FSCK_MSG_NUL_IN_HEADER,
 				"unterminated header: NUL at offset %ld", i);
 		case '\n':
@@ -726,12 +727,12 @@ static int verify_headers(const void *data, unsigned long size,
 	if (size && buffer[size - 1] == '\n')
 		return 0;
 
-	return report(options, oid, type,
-		FSCK_MSG_UNTERMINATED_HEADER, "unterminated header");
+	return report(options, obj,
+		      FSCK_MSG_UNTERMINATED_HEADER, "unterminated header");
 }
 
 static int fsck_ident(const char **ident,
-		      const struct object_id *oid, enum object_type type,
+		      const struct object *obj,
 		      struct fsck_options *options)
 {
 	const char *p = *ident;
@@ -742,28 +743,28 @@ static int fsck_ident(const char **ident,
 		(*ident)++;
 
 	if (*p == '<')
-		return report(options, oid, type, FSCK_MSG_MISSING_NAME_BEFORE_EMAIL, "invalid author/committer line - missing space before email");
+		return report(options, obj, FSCK_MSG_MISSING_NAME_BEFORE_EMAIL, "invalid author/committer line - missing space before email");
 	p += strcspn(p, "<>\n");
 	if (*p == '>')
-		return report(options, oid, type, FSCK_MSG_BAD_NAME, "invalid author/committer line - bad name");
+		return report(options, obj, FSCK_MSG_BAD_NAME, "invalid author/committer line - bad name");
 	if (*p != '<')
-		return report(options, oid, type, FSCK_MSG_MISSING_EMAIL, "invalid author/committer line - missing email");
+		return report(options, obj, FSCK_MSG_MISSING_EMAIL, "invalid author/committer line - missing email");
 	if (p[-1] != ' ')
-		return report(options, oid, type, FSCK_MSG_MISSING_SPACE_BEFORE_EMAIL, "invalid author/committer line - missing space before email");
+		return report(options, obj, FSCK_MSG_MISSING_SPACE_BEFORE_EMAIL, "invalid author/committer line - missing space before email");
 	p++;
 	p += strcspn(p, "<>\n");
 	if (*p != '>')
-		return report(options, oid, type, FSCK_MSG_BAD_EMAIL, "invalid author/committer line - bad email");
+		return report(options, obj, FSCK_MSG_BAD_EMAIL, "invalid author/committer line - bad email");
 	p++;
 	if (*p != ' ')
-		return report(options, oid, type, FSCK_MSG_MISSING_SPACE_BEFORE_DATE, "invalid author/committer line - missing space before date");
+		return report(options, obj, FSCK_MSG_MISSING_SPACE_BEFORE_DATE, "invalid author/committer line - missing space before date");
 	p++;
 	if (*p == '0' && p[1] != ' ')
-		return report(options, oid, type, FSCK_MSG_ZERO_PADDED_DATE, "invalid author/committer line - zero-padded date");
+		return report(options, obj, FSCK_MSG_ZERO_PADDED_DATE, "invalid author/committer line - zero-padded date");
 	if (date_overflows(parse_timestamp(p, &end, 10)))
-		return report(options, oid, type, FSCK_MSG_BAD_DATE_OVERFLOW, "invalid author/committer line - date causes integer overflow");
+		return report(options, obj, FSCK_MSG_BAD_DATE_OVERFLOW, "invalid author/committer line - date causes integer overflow");
 	if ((end == p || *end != ' '))
-		return report(options, oid, type, FSCK_MSG_BAD_DATE, "invalid author/committer line - bad date");
+		return report(options, obj, FSCK_MSG_BAD_DATE, "invalid author/committer line - bad date");
 	p = end + 1;
 	if ((*p != '+' && *p != '-') ||
 	    !isdigit(p[1]) ||
@@ -771,12 +772,12 @@ static int fsck_ident(const char **ident,
 	    !isdigit(p[3]) ||
 	    !isdigit(p[4]) ||
 	    (p[5] != '\n'))
-		return report(options, oid, type, FSCK_MSG_BAD_TIMEZONE, "invalid author/committer line - bad time zone");
+		return report(options, obj, FSCK_MSG_BAD_TIMEZONE, "invalid author/committer line - bad time zone");
 	p += 6;
 	return 0;
 }
 
-static int fsck_commit(const struct object_id *oid,
+static int fsck_commit(const struct object *obj,
 		       const char *buffer, unsigned long size,
 		       struct fsck_options *options)
 {
@@ -786,20 +787,20 @@ static int fsck_commit(const struct object_id *oid,
 	const char *buffer_begin = buffer;
 	const char *p;
 
-	if (verify_headers(buffer, size, oid, OBJ_COMMIT, options))
+	if (verify_headers(buffer, size, obj, options))
 		return -1;
 
 	if (!skip_prefix(buffer, "tree ", &buffer))
-		return report(options, oid, OBJ_COMMIT, FSCK_MSG_MISSING_TREE, "invalid format - expected 'tree' line");
+		return report(options, obj, FSCK_MSG_MISSING_TREE, "invalid format - expected 'tree' line");
 	if (parse_oid_hex(buffer, &tree_oid, &p) || *p != '\n') {
-		err = report(options, oid, OBJ_COMMIT, FSCK_MSG_BAD_TREE_SHA1, "invalid 'tree' line format - bad sha1");
+		err = report(options, obj, FSCK_MSG_BAD_TREE_SHA1, "invalid 'tree' line format - bad sha1");
 		if (err)
 			return err;
 	}
 	buffer = p + 1;
 	while (skip_prefix(buffer, "parent ", &buffer)) {
 		if (parse_oid_hex(buffer, &parent_oid, &p) || *p != '\n') {
-			err = report(options, oid, OBJ_COMMIT, FSCK_MSG_BAD_PARENT_SHA1, "invalid 'parent' line format - bad sha1");
+			err = report(options, obj, FSCK_MSG_BAD_PARENT_SHA1, "invalid 'parent' line format - bad sha1");
 			if (err)
 				return err;
 		}
@@ -808,23 +809,23 @@ static int fsck_commit(const struct object_id *oid,
 	author_count = 0;
 	while (skip_prefix(buffer, "author ", &buffer)) {
 		author_count++;
-		err = fsck_ident(&buffer, oid, OBJ_COMMIT, options);
+		err = fsck_ident(&buffer, obj, options);
 		if (err)
 			return err;
 	}
 	if (author_count < 1)
-		err = report(options, oid, OBJ_COMMIT, FSCK_MSG_MISSING_AUTHOR, "invalid format - expected 'author' line");
+		err = report(options, obj, FSCK_MSG_MISSING_AUTHOR, "invalid format - expected 'author' line");
 	else if (author_count > 1)
-		err = report(options, oid, OBJ_COMMIT, FSCK_MSG_MULTIPLE_AUTHORS, "invalid format - multiple 'author' lines");
+		err = report(options, obj, FSCK_MSG_MULTIPLE_AUTHORS, "invalid format - multiple 'author' lines");
 	if (err)
 		return err;
 	if (!skip_prefix(buffer, "committer ", &buffer))
-		return report(options, oid, OBJ_COMMIT, FSCK_MSG_MISSING_COMMITTER, "invalid format - expected 'committer' line");
-	err = fsck_ident(&buffer, oid, OBJ_COMMIT, options);
+		return report(options, obj, FSCK_MSG_MISSING_COMMITTER, "invalid format - expected 'committer' line");
+	err = fsck_ident(&buffer, obj, options);
 	if (err)
 		return err;
 	if (memchr(buffer_begin, '\0', size)) {
-		err = report(options, oid, OBJ_COMMIT, FSCK_MSG_NUL_IN_COMMIT,
+		err = report(options, obj, FSCK_MSG_NUL_IN_COMMIT,
 			     "NUL byte in the commit object body");
 		if (err)
 			return err;
@@ -832,16 +833,16 @@ static int fsck_commit(const struct object_id *oid,
 	return 0;
 }
 
-static int fsck_tag(const struct object_id *oid, const char *buffer,
+static int fsck_tag(const struct object *obj, const char *buffer,
 		    unsigned long size, struct fsck_options *options)
 {
 	struct object_id tagged_oid;
 	int tagged_type;
-	return fsck_tag_standalone(oid, buffer, size, options, &tagged_oid,
+	return fsck_tag_standalone(obj, buffer, size, options, &tagged_oid,
 				   &tagged_type);
 }
 
-int fsck_tag_standalone(const struct object_id *oid, const char *buffer,
+int fsck_tag_standalone(const struct object *obj, const char *buffer,
 			unsigned long size, struct fsck_options *options,
 			struct object_id *tagged_oid,
 			int *tagged_type)
@@ -851,49 +852,49 @@ int fsck_tag_standalone(const struct object_id *oid, const char *buffer,
 	struct strbuf sb = STRBUF_INIT;
 	const char *p;
 
-	ret = verify_headers(buffer, size, oid, OBJ_TAG, options);
+	ret = verify_headers(buffer, size, obj, options);
 	if (ret)
 		goto done;
 
 	if (!skip_prefix(buffer, "object ", &buffer)) {
-		ret = report(options, oid, OBJ_TAG, FSCK_MSG_MISSING_OBJECT, "invalid format - expected 'object' line");
+		ret = report(options, obj, FSCK_MSG_MISSING_OBJECT, "invalid format - expected 'object' line");
 		goto done;
 	}
 	if (parse_oid_hex(buffer, tagged_oid, &p) || *p != '\n') {
-		ret = report(options, oid, OBJ_TAG, FSCK_MSG_BAD_OBJECT_SHA1, "invalid 'object' line format - bad sha1");
+		ret = report(options, obj, FSCK_MSG_BAD_OBJECT_SHA1, "invalid 'object' line format - bad sha1");
 		if (ret)
 			goto done;
 	}
 	buffer = p + 1;
 
 	if (!skip_prefix(buffer, "type ", &buffer)) {
-		ret = report(options, oid, OBJ_TAG, FSCK_MSG_MISSING_TYPE_ENTRY, "invalid format - expected 'type' line");
+		ret = report(options, obj, FSCK_MSG_MISSING_TYPE_ENTRY, "invalid format - expected 'type' line");
 		goto done;
 	}
 	eol = strchr(buffer, '\n');
 	if (!eol) {
-		ret = report(options, oid, OBJ_TAG, FSCK_MSG_MISSING_TYPE, "invalid format - unexpected end after 'type' line");
+		ret = report(options, obj, FSCK_MSG_MISSING_TYPE, "invalid format - unexpected end after 'type' line");
 		goto done;
 	}
 	*tagged_type = type_from_string_gently(buffer, eol - buffer, 1);
 	if (*tagged_type < 0)
-		ret = report(options, oid, OBJ_TAG, FSCK_MSG_BAD_TYPE, "invalid 'type' value");
+		ret = report(options, obj, FSCK_MSG_BAD_TYPE, "invalid 'type' value");
 	if (ret)
 		goto done;
 	buffer = eol + 1;
 
 	if (!skip_prefix(buffer, "tag ", &buffer)) {
-		ret = report(options, oid, OBJ_TAG, FSCK_MSG_MISSING_TAG_ENTRY, "invalid format - expected 'tag' line");
+		ret = report(options, obj, FSCK_MSG_MISSING_TAG_ENTRY, "invalid format - expected 'tag' line");
 		goto done;
 	}
 	eol = strchr(buffer, '\n');
 	if (!eol) {
-		ret = report(options, oid, OBJ_TAG, FSCK_MSG_MISSING_TAG, "invalid format - unexpected end after 'type' line");
+		ret = report(options, obj, FSCK_MSG_MISSING_TAG, "invalid format - unexpected end after 'type' line");
 		goto done;
 	}
 	strbuf_addf(&sb, "refs/tags/%.*s", (int)(eol - buffer), buffer);
 	if (check_refname_format(sb.buf, 0)) {
-		ret = report(options, oid, OBJ_TAG,
+		ret = report(options, obj,
 			     FSCK_MSG_BAD_TAG_NAME,
 			     "invalid 'tag' name: %.*s",
 			     (int)(eol - buffer), buffer);
@@ -904,12 +905,12 @@ int fsck_tag_standalone(const struct object_id *oid, const char *buffer,
 
 	if (!skip_prefix(buffer, "tagger ", &buffer)) {
 		/* early tags do not contain 'tagger' lines; warn only */
-		ret = report(options, oid, OBJ_TAG, FSCK_MSG_MISSING_TAGGER_ENTRY, "invalid format - expected 'tagger' line");
+		ret = report(options, obj, FSCK_MSG_MISSING_TAGGER_ENTRY, "invalid format - expected 'tagger' line");
 		if (ret)
 			goto done;
 	}
 	else
-		ret = fsck_ident(&buffer, oid, OBJ_TAG, options);
+		ret = fsck_ident(&buffer, obj, options);
 	if (!*buffer)
 		goto done;
 
@@ -921,7 +922,7 @@ int fsck_tag_standalone(const struct object_id *oid, const char *buffer,
 		 * garbage" could be a custom header. E.g. "mktag"
 		 * doesn't want any unknown headers.
 		 */
-		ret = report(options, oid, OBJ_TAG, FSCK_MSG_EXTRA_HEADER_ENTRY, "invalid format - extra header(s) after 'tagger'");
+		ret = report(options, obj, FSCK_MSG_EXTRA_HEADER_ENTRY, "invalid format - extra header(s) after 'tagger'");
 		if (ret)
 			goto done;
 	}
@@ -1075,7 +1076,7 @@ static int check_submodule_url(const char *url)
 }
 
 struct fsck_gitmodules_data {
-	const struct object_id *oid;
+	const struct object *obj;
 	struct fsck_options *options;
 	int ret;
 };
@@ -1094,27 +1095,27 @@ static int fsck_gitmodules_fn(const char *var, const char *value, void *vdata)
 	name = xmemdupz(subsection, subsection_len);
 	if (check_submodule_name(name) < 0)
 		data->ret |= report(data->options,
-				    data->oid, OBJ_BLOB,
+				    data->obj,
 				    FSCK_MSG_GITMODULES_NAME,
 				    "disallowed submodule name: %s",
 				    name);
 	if (!strcmp(key, "url") && value &&
 	    check_submodule_url(value) < 0)
 		data->ret |= report(data->options,
-				    data->oid, OBJ_BLOB,
+				    data->obj,
 				    FSCK_MSG_GITMODULES_URL,
 				    "disallowed submodule url: %s",
 				    value);
 	if (!strcmp(key, "path") && value &&
 	    looks_like_command_line_option(value))
 		data->ret |= report(data->options,
-				    data->oid, OBJ_BLOB,
+				    data->obj,
 				    FSCK_MSG_GITMODULES_PATH,
 				    "disallowed submodule path: %s",
 				    value);
 	if (!strcmp(key, "update") && value &&
 	    parse_submodule_update_type(value) == SM_UPDATE_COMMAND)
-		data->ret |= report(data->options, data->oid, OBJ_BLOB,
+		data->ret |= report(data->options, data->obj,
 				    FSCK_MSG_GITMODULES_UPDATE,
 				    "disallowed submodule update setting: %s",
 				    value);
@@ -1123,9 +1124,10 @@ static int fsck_gitmodules_fn(const char *var, const char *value, void *vdata)
 	return 0;
 }
 
-static int fsck_blob(const struct object_id *oid, const char *buf,
+static int fsck_blob(const struct object *obj, const char *buf,
 		     unsigned long size, struct fsck_options *options)
 {
+	const struct object_id *oid = &obj->oid;
 	struct fsck_gitmodules_data data;
 	struct config_options config_opts = { 0 };
 
@@ -1133,7 +1135,7 @@ static int fsck_blob(const struct object_id *oid, const char *buf,
 		return 0;
 	oidset_insert(&options->gitmodules_done, oid);
 
-	if (object_on_skiplist(options, oid))
+	if (object_on_skiplist(options, obj))
 		return 0;
 
 	if (!buf) {
@@ -1142,18 +1144,18 @@ static int fsck_blob(const struct object_id *oid, const char *buf,
 		 * blob too gigantic to load into memory. Let's just consider
 		 * that an error.
 		 */
-		return report(options, oid, OBJ_BLOB,
+		return report(options, obj,
 			      FSCK_MSG_GITMODULES_LARGE,
 			      ".gitmodules too large to parse");
 	}
 
-	data.oid = oid;
+	data.obj = obj;
 	data.options = options;
 	data.ret = 0;
 	config_opts.error_action = CONFIG_ERROR_SILENT;
 	if (git_config_from_mem(fsck_gitmodules_fn, CONFIG_ORIGIN_BLOB,
 				".gitmodules", buf, size, &data, &config_opts))
-		data.ret |= report(options, oid, OBJ_BLOB,
+		data.ret |= report(options, obj,
 				   FSCK_MSG_GITMODULES_PARSE,
 				   "could not parse gitmodules blob");
 
@@ -1164,35 +1166,34 @@ int fsck_object(struct object *obj, void *data, unsigned long size,
 	struct fsck_options *options)
 {
 	if (!obj)
-		return report(options, NULL, OBJ_NONE, FSCK_MSG_BAD_OBJECT_SHA1, "no valid object to fsck");
+		return report(options, obj, FSCK_MSG_BAD_OBJECT_SHA1, "no valid object to fsck");
 
 	if (obj->type == OBJ_BLOB)
-		return fsck_blob(&obj->oid, data, size, options);
+		return fsck_blob(obj, data, size, options);
 	if (obj->type == OBJ_TREE)
-		return fsck_tree(&obj->oid, data, size, options);
+		return fsck_tree(obj, data, size, options);
 	if (obj->type == OBJ_COMMIT)
-		return fsck_commit(&obj->oid, data, size, options);
+		return fsck_commit(obj, data, size, options);
 	if (obj->type == OBJ_TAG)
-		return fsck_tag(&obj->oid, data, size, options);
+		return fsck_tag(obj, data, size, options);
 
-	return report(options, &obj->oid, obj->type,
+	return report(options, obj,
 		      FSCK_MSG_UNKNOWN_TYPE,
 		      "unknown type '%d' (internal fsck error)",
 		      obj->type);
 }
 
 int fsck_error_function(struct fsck_options *o,
-			const struct object_id *oid,
-			enum object_type object_type,
+			const struct object *obj,
 			enum fsck_msg_type msg_type,
 			enum fsck_msg_id msg_id,
 			const char *message)
 {
 	if (msg_type == FSCK_WARN) {
-		warning("object %s: %s", fsck_describe_object(o, oid), message);
+		warning("object %s: %s", fsck_describe_object(o, obj), message);
 		return 0;
 	}
-	error("object %s: %s", fsck_describe_object(o, oid), message);
+	error("object %s: %s", fsck_describe_object(o, obj), message);
 	return 1;
 }
 
@@ -1204,6 +1205,7 @@ int fsck_finish(struct fsck_options *options)
 
 	oidset_iter_init(&options->gitmodules_found, &iter);
 	while ((oid = oidset_iter_next(&iter))) {
+		struct object *obj;
 		enum object_type type;
 		unsigned long size;
 		char *buf;
@@ -1215,20 +1217,24 @@ int fsck_finish(struct fsck_options *options)
 		if (!buf) {
 			if (is_promisor_object(oid))
 				continue;
+			obj = (struct object *)lookup_blob(the_repository, oid);
 			ret |= report(options,
-				      oid, OBJ_BLOB,
+				      obj,
 				      FSCK_MSG_GITMODULES_MISSING,
 				      "unable to read .gitmodules blob");
 			continue;
 		}
 
-		if (type == OBJ_BLOB)
-			ret |= fsck_blob(oid, buf, size, options);
-		else
+		if (type == OBJ_BLOB) {
+			obj = (struct object *)lookup_blob(the_repository, oid);
+			ret |= fsck_blob(obj, buf, size, options);
+		} else {
+			obj = lookup_object(the_repository, oid);
 			ret |= report(options,
-				      oid, type,
+				      obj,
 				      FSCK_MSG_GITMODULES_BLOB,
 				      "non-blob found at .gitmodules");
+		}
 		free(buf);
 	}
 
@@ -1267,15 +1273,14 @@ int git_fsck_config(const char *var, const char *value, void *cb)
  */
 
 int fsck_error_cb_print_missing_gitmodules(struct fsck_options *o,
-					   const struct object_id *oid,
-					   enum object_type object_type,
+					   const struct object *obj,
 					   enum fsck_msg_type msg_type,
 					   enum fsck_msg_id msg_id,
 					   const char *message)
 {
 	if (msg_id == FSCK_MSG_GITMODULES_MISSING) {
-		puts(oid_to_hex(oid));
+		puts(oid_to_hex(&obj->oid));
 		return 0;
 	}
-	return fsck_error_function(o, oid, object_type, msg_type, msg_id, message);
+	return fsck_error_function(o, obj, msg_type, msg_id, message);
 }
diff --git a/fsck.h b/fsck.h
index 7202c3c87e8..864032709c4 100644
--- a/fsck.h
+++ b/fsck.h
@@ -103,17 +103,16 @@ typedef int (*fsck_walk_func)(struct object *obj, enum object_type object_type,
 
 /* callback for fsck_object, type is FSCK_ERROR or FSCK_WARN */
 typedef int (*fsck_error)(struct fsck_options *o,
-			  const struct object_id *oid, enum object_type object_type,
+			  const struct object *obj,
 			  enum fsck_msg_type msg_type, enum fsck_msg_id msg_id,
 			  const char *message);
 
 int fsck_error_function(struct fsck_options *o,
-			const struct object_id *oid, enum object_type object_type,
+			const struct object *obj,
 			enum fsck_msg_type msg_type, enum fsck_msg_id msg_id,
 			const char *message);
 int fsck_error_cb_print_missing_gitmodules(struct fsck_options *o,
-					   const struct object_id *oid,
-					   enum object_type object_type,
+					   const struct object *obj,
 					   enum fsck_msg_type msg_type,
 					   enum fsck_msg_id msg_id,
 					   const char *message);
@@ -168,7 +167,7 @@ int fsck_object(struct object *obj, void *data, unsigned long size,
  * fsck a tag, and pass info about it back to the caller. This is
  * exposed fsck_object() internals for git-mktag(1).
  */
-int fsck_tag_standalone(const struct object_id *oid, const char *buffer,
+int fsck_tag_standalone(const struct object *obj, const char *buffer,
 			unsigned long size, struct fsck_options *options,
 			struct object_id *tagged_oid,
 			int *tag_type);
@@ -203,7 +202,7 @@ void fsck_put_object_name(struct fsck_options *options,
 			  const struct object_id *oid,
 			  const char *fmt, ...);
 const char *fsck_describe_object(struct fsck_options *options,
-				 const struct object_id *oid);
+				 const struct object *obj);
 
 /*
  * git_config() callback for use by fsck-y tools that want to support

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/9] fsck_tree(): wrap some long lines
  2021-05-01 15:41 ` [PATCH 3/9] fsck_tree(): wrap some long lines Jeff King
@ 2021-05-03 11:22   ` Ævar Arnfjörð Bjarmason
  2021-05-03 20:23     ` Jeff King
  0 siblings, 1 reply; 35+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-05-03 11:22 UTC (permalink / raw)
  To: Jeff King; +Cc: git


On Sat, May 01 2021, Jeff King wrote:

> Many calls to report() in fsck_tree() are kept on a single line and are
> quite long. Most were pretty big to begin with, but have gotten even
> longer over the years as we've added more parameters. Let's accept the
> churn of wrapping them in order to conform to our usual line limits.

If we're going to have the churn I'd say just wrap the rest of the file
as well, now it's mostly-consistent in having these long lines.

FWIW I think having the long lines makes things more readable in this
case, but the inconsistency is worse.

If you do take me up on refactoring it more generally, there's also the
one mis-alignment of an argument list to report() in verify_headers().

I wonder if this whole thing wouldn't be better by declaring the format
in the msg_id_info struct. I.e. add this to fsck.h, but that's an even
bigger change...

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 5/9] t7450: test verify_path() handling of gitmodules
  2021-05-01 19:03     ` Eric Sunshine
@ 2021-05-03 19:39       ` Jeff King
  0 siblings, 0 replies; 35+ messages in thread
From: Jeff King @ 2021-05-03 19:39 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Git List

On Sat, May 01, 2021 at 03:03:56PM -0400, Eric Sunshine wrote:

> On Sat, May 1, 2021 at 2:55 PM Eric Sunshine <sunshine@sunshineco.com> wrote:
> > On Sat, May 1, 2021 at 11:42 AM Jeff King <peff@peff.net> wrote:
> > > +       tree=$(git -C symlink mktree <symlink/bad-tree)
> >
> > `tree` is an unusually generic name for this now-global variable. One
> > can easily imagine it being re-used by some unrelated test arbitrarily
> > inserted into this script, thus potentially breaking the following
> > tests which depend upon it. I wonder if a name such as `BAD_TREE`
> > would be more appropriate.
> 
> I see that all `$tree` references get encapsulated into a shell
> function by the next patch, so perhaps the generic name `tree` isn't a
> big deal after all.

Yeah. I'd like to think it is not that big a deal between even just
adjacent tests, because anybody adding tests in the middle of a script
would take care not to split related ones. But that may be too
optimistic. ;)

At any rate, it seems OK to assume the function will all be used
together.

-Peff

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/9] fsck_tree(): fix shadowed variable
  2021-05-03 11:15   ` Ævar Arnfjörð Bjarmason
@ 2021-05-03 20:13     ` Jeff King
  2021-05-04 10:10       ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 35+ messages in thread
From: Jeff King @ 2021-05-03 20:13 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git

On Mon, May 03, 2021 at 01:15:03PM +0200, Ævar Arnfjörð Bjarmason wrote:

> > Commit b2f2039c2b (fsck: accept an oid instead of a "struct tree" for
> > fsck_tree(), 2019-10-18) introduced a new "oid" parameter to
> > fsck_tree(), and we pass it to the report() function when we find
> [...]
> 
> Have you considered just passing down the "obj" instead of the "oid"?
> It's a bit of a bigger change, but seems to be worth it in this case as
> the below diff (on top of master) shows. We spend quite a bit of effort
> peeling an "obj" just to pass oid/type further down the stack.

That would be undoing the point of the referenced commit above. More
discussion in this thread:

  https://lore.kernel.org/git/20191018044103.GA17625@sigill.intra.peff.net/

but the gist is:

  - there were bugs caused by looking at the object structs; not having
    it make sure we can't introduce similar bugs

  - using oids and buffers makes it possible for code to avoid having
    object structs at all. index-pack could take advantage of this (to
    reduce memory usage), but nobody has yet written that patch.

-Peff

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/9] fsck_tree(): wrap some long lines
  2021-05-03 11:22   ` Ævar Arnfjörð Bjarmason
@ 2021-05-03 20:23     ` Jeff King
  0 siblings, 0 replies; 35+ messages in thread
From: Jeff King @ 2021-05-03 20:23 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git

On Mon, May 03, 2021 at 01:22:13PM +0200, Ævar Arnfjörð Bjarmason wrote:

> On Sat, May 01 2021, Jeff King wrote:
> 
> > Many calls to report() in fsck_tree() are kept on a single line and are
> > quite long. Most were pretty big to begin with, but have gotten even
> > longer over the years as we've added more parameters. Let's accept the
> > churn of wrapping them in order to conform to our usual line limits.
> 
> If we're going to have the churn I'd say just wrap the rest of the file
> as well, now it's mostly-consistent in having these long lines.
> 
> FWIW I think having the long lines makes things more readable in this
> case, but the inconsistency is worse.

I'm not sure I agree. It depends on how big a chunk you consider for
consistency: a function, a file, or the whole project.

fsck_tree() was already inconsistent, so this is making that function
totally consistent. Since that was the function I was working in, that
seemed like the limit of "while I'm here", and I'd prefer to keep it
there for the series.

I certainly don't mind extra clean up on top, though.

As far as preferring the long lines, I don't mind lines a _little_ long,
but some of these are 120+ characters. They wrap awkwardly even on my
extra-wide terminals. ;) I guess we can have a discussion on whether
long lines are OK, but it should probably center on what we put into
CodingGuidelines, and not these particular lines.

> I wonder if this whole thing wouldn't be better by declaring the format
> in the msg_id_info struct. I.e. add this to fsck.h, but that's an even
> bigger change...

I think it gets tricky, as not all of the strings have the same number
and type of format specifiers (most don't have any, but verify_headers()
for example uses %ld).

-Peff

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/9] t7415: remove out-dated comment about translation
  2021-05-03  9:46   ` Ævar Arnfjörð Bjarmason
@ 2021-05-03 20:29     ` Jeff King
  0 siblings, 0 replies; 35+ messages in thread
From: Jeff King @ 2021-05-03 20:29 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, SZEDER Gábor, Junio C Hamano

On Mon, May 03, 2021 at 11:46:52AM +0200, Ævar Arnfjörð Bjarmason wrote:

> > Since GETTEXT_POISON does not exist anymore, there is no point warning
> > people about whether we should use test_i18ngrep. This is doubly
> > confusing because the comment was describing why it was OK to use grep,
> > but it got caught up in the mass conversion of 674ba34038 (fsck: mark
> > strings for translation, 2018-11-10).
> >
> > Note there are other uses of test_i18ngrep in this script which are now
> > obsolete; I'll save those for a mass-cleanup. My goal here was just to
> > fix the confusing comment in code I'm about to refactor.
> 
> For what it's worth between [1] and [2] I'm not sure what to do about
> the test_i18ngrep cleanup. I think your patch below is fine, but the
> "test_i18ngrep" has mutated into a "grep with debugging", not just
> something needed for GETTEXT_POISON.

Yes, I can see some value in that. Though if that's what it's going to
be, IMHO we should use it consistently and give a better name (test_grep
or something).

> So that part of your patch right now is making it less friendly for
> debugging. I don't care, and think if we want that we'd be better of
> scraping the trace ouput for such common cases and/or use "verbose grep
> [...]" and teach the "verbose" wrapper about these common cases, but
> knowing of that objection + having other outstanding things has been why
> I haven't sent s/test_i18ngrep/grep/g patches.

I was the one who introduced "verbose" long ago, and I did have dreams
that people would do "verbose grep" everywhere. But in the end, we made
"-x" a lot nicer to use, and I think that largely obsoletes it. Using
"verbose" makes the non-x verbose output slightly nicer, perhaps. But
not having to remember to sprinkle "verbose" through the test code (nor
read it) seems like a bigger win to me.

For the same reason, I'm pretty ambivalent about having test_grep. I
suppose one could make a similar argument about "test_path_is_missing",
etc, though some of those helpers are also encapsulating technique
(e.g., test_dir_is_empty).

-Peff

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 5/9] t7450: test verify_path() handling of gitmodules
  2021-05-03 10:12   ` Ævar Arnfjörð Bjarmason
@ 2021-05-03 20:32     ` Jeff King
  0 siblings, 0 replies; 35+ messages in thread
From: Jeff King @ 2021-05-03 20:32 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git

On Mon, May 03, 2021 at 12:12:32PM +0200, Ævar Arnfjörð Bjarmason wrote:

> 
> On Sat, May 01 2021, Jeff King wrote:
> 
> > +test_expect_success 'refuse to load symlinked .gitmodules into index' '
> > +	test_must_fail git -C symlink read-tree $tree 2>err &&
> > +	test_i18ngrep "invalid path.*gitmodules" err &&
> > +	git -C symlink ls-files >out &&
> > +	test_must_be_empty out
> > +'
> > +
> 
> In 1/9 a test_i18ngrep is removed, but here's a new one.

Thanks. Most of these patches were written either in 2018 or 2020 and
pulled forward (and back then, this call would have been necessary). I
fixed up a few cases as I was rebasing, but obviously missed this one.

-Peff

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 9/9] docs: document symlink restrictions for dot-files
  2021-05-01 19:16   ` Eric Sunshine
@ 2021-05-03 20:33     ` Jeff King
  0 siblings, 0 replies; 35+ messages in thread
From: Jeff King @ 2021-05-03 20:33 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Git List

On Sat, May 01, 2021 at 03:16:09PM -0400, Eric Sunshine wrote:

> Just a really microscopic nit... feel free to ignore...
> 
> > diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
> > @@ -1247,6 +1247,13 @@ to:
> > +NOTES
> > +-----
> > +
> > +Note that Git does not follow symbolic links when accessing a
> > +`.gitattributes` file in the working tree. This keeps behavior
> > +consistent when the file is accessed from the index or a tree versus
> > +from the filesystem.
> 
> We're in the "NOTES" section, so it feels redundant to begin the
> sentence with "Note that".

Thanks. I think it's a good suggestion. I agree it's pretty minor, but
I'm going to re-roll anyway to drop the test_i18ngrep, so I'll reword
this, too.

-Peff

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v2 0/9] leftover bits from symlinked gitattributes, etc topics
  2021-05-01 15:40 [PATCH 0/9] leftover bits from symlinked gitattributes, etc topics Jeff King
                   ` (9 preceding siblings ...)
  2021-05-03  5:36 ` [PATCH 0/9] leftover bits from symlinked gitattributes, etc topics Junio C Hamano
@ 2021-05-03 20:42 ` Jeff King
  2021-05-03 20:43   ` [PATCH v2 1/9] t7415: remove out-dated comment about translation Jeff King
                     ` (8 more replies)
  10 siblings, 9 replies; 35+ messages in thread
From: Jeff King @ 2021-05-03 20:42 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason,
	Eric Sunshine

On Sat, May 01, 2021 at 11:40:52AM -0400, Jeff King wrote:

> A while back, I had a topic[1] that treated symlinked .gitattributes
> (and .gitignore and .mailmap) the same as .gitmodules: forbidding them
> in the index, complaining about them via fsck, etc.
> 
> In the end, we decided not to do that[2], and instead just open the
> files with O_NOFOLLOW instead. As I said in that thread, we could
> salvage some of the cleanups, fsck checks, and docs from the original
> topic. So here that is. (The new topic is in master but not yet
> released; so while this is not strictly a bug-fix for an existing topic,
> it would be good to get especially the doc improvements into the same
> release).

Here's a re-roll with two small fixes: dropping the test_i18ngrep that
Ævar noticed, and taking Eric's wording suggestion for the docs.

I didn't take Ævar's suggestion to expand the line-wrapping fixes
further. I don't mind that happening, but I'd prefer doing it as a
separate series.

Range-diff (a little hard to read because the one-line change in the
tests percolates through several commits, and the two-word changes in
the docs caused rewrapping):

 1:  c91ce2ed34 =  1:  c91ce2ed34 t7415: remove out-dated comment about translation
 2:  99fe934110 =  2:  99fe934110 fsck_tree(): fix shadowed variable
 3:  9695e4370c =  3:  9695e4370c fsck_tree(): wrap some long lines
 4:  2cf9839145 =  4:  2cf9839145 t7415: rename to expand scope
 5:  ad18686096 !  5:  1664953e71 t7450: test verify_path() handling of gitmodules
    @@ t/t7450-bad-git-dotfiles.sh: test_expect_success 'fsck detects symlinked .gitmod
      
     +test_expect_success 'refuse to load symlinked .gitmodules into index' '
     +	test_must_fail git -C symlink read-tree $tree 2>err &&
    -+	test_i18ngrep "invalid path.*gitmodules" err &&
    ++	grep "invalid path.*gitmodules" err &&
     +	git -C symlink ls-files >out &&
     +	test_must_be_empty out
     +'
 6:  9691fb8d5c !  6:  41000ce022 t7450: test .gitmodules symlink matching against obscured names
    @@ t/t7450-bad-git-dotfiles.sh: test_expect_success 'index-pack --strict works for
     -
     -test_expect_success 'refuse to load symlinked .gitmodules into index' '
     -	test_must_fail git -C symlink read-tree $tree 2>err &&
    --	test_i18ngrep "invalid path.*gitmodules" err &&
    +-	grep "invalid path.*gitmodules" err &&
     -	git -C symlink ls-files >out &&
     -	test_must_be_empty out
     -'
    @@ t/t7450-bad-git-dotfiles.sh: test_expect_success 'index-pack --strict works for
     +			    -c core.protectntfs \
     +			    -c core.protecthfs \
     +			    read-tree $tree 2>err &&
    -+		test_i18ngrep "invalid path.*$name" err &&
    ++		grep "invalid path.*$name" err &&
     +		git -C $dir ls-files -s >out &&
     +		test_must_be_empty out
     +	'
 7:  670705dca2 =  7:  58efbbbbb6 t0060: test ntfs/hfs-obscured dotfiles
 8:  422162a7ae =  8:  aeff66bf1e fsck: warn about symlinked dotfiles we'll open with O_NOFOLLOW
 9:  f1b226ca4f !  9:  a8f9255d9b docs: document symlink restrictions for dot-files
    @@ Documentation/gitattributes.txt: to:
     +NOTES
     +-----
     +
    -+Note that Git does not follow symbolic links when accessing a
    -+`.gitattributes` file in the working tree. This keeps behavior
    -+consistent when the file is accessed from the index or a tree versus
    -+from the filesystem.
    ++Git does not follow symbolic links when accessing a `.gitattributes`
    ++file in the working tree. This keeps behavior consistent when the file
    ++is accessed from the index or a tree versus from the filesystem.
      
      EXAMPLES
      --------
    @@ Documentation/gitignore.txt: not tracked by Git remain untracked.
      To stop tracking a file that is currently tracked, use
      'git rm --cached'.
      
    -+Note that Git does not follow symbolic links when accessing a
    -+`.gitignore` file in the working tree. This keeps behavior consistent
    -+when the file is accessed from the index or a tree versus from the
    -+filesystem.
    ++Git does not follow symbolic links when accessing a `.gitignore` file in
    ++the working tree. This keeps behavior consistent when the file is
    ++accessed from the index or a tree versus from the filesystem.
     +
      EXAMPLES
      --------
    @@ Documentation/gitmailmap.txt: this would also match the 'Commit Name <commit&#64
     +NOTES
     +-----
     +
    -+Note that Git does not follow symbolic links when accessing a `.mailmap`
    -+file in the working tree. This keeps behavior consistent when the file
    -+is accessed from the index or a tree versus from the filesystem.
    ++Git does not follow symbolic links when accessing a `.mailmap` file in
    ++the working tree. This keeps behavior consistent when the file is
    ++accessed from the index or a tree versus from the filesystem.
     +
      EXAMPLES
      --------
    @@ Documentation/gitmodules.txt: submodule.<name>.shallow::
     +NOTES
     +-----
     +
    -+Note that Git does not allow the `.gitmodules` file within a working
    -+tree to be a symbolic link, and will refuse to check out such a tree
    -+entry. This keeps behavior consistent when the file is accessed from the
    -+index or a tree versus from the filesystem, and helps Git reliably
    -+enforce security checks of the file contents.
    ++Git does not allow the `.gitmodules` file within a working tree to be a
    ++symbolic link, and will refuse to check out such a tree entry. This
    ++keeps behavior consistent when the file is accessed from the index or a
    ++tree versus from the filesystem, and helps Git reliably enforce security
    ++checks of the file contents.
      
      EXAMPLES
      --------

  [1/9]: t7415: remove out-dated comment about translation
  [2/9]: fsck_tree(): fix shadowed variable
  [3/9]: fsck_tree(): wrap some long lines
  [4/9]: t7415: rename to expand scope
  [5/9]: t7450: test verify_path() handling of gitmodules
  [6/9]: t7450: test .gitmodules symlink matching against obscured names
  [7/9]: t0060: test ntfs/hfs-obscured dotfiles
  [8/9]: fsck: warn about symlinked dotfiles we'll open with O_NOFOLLOW
  [9/9]: docs: document symlink restrictions for dot-files

 Documentation/gitattributes.txt               |   6 +
 Documentation/gitignore.txt                   |   4 +
 Documentation/gitmailmap.txt                  |   7 ++
 Documentation/gitmodules.txt                  |   8 ++
 cache.h                                       |   1 +
 fsck.c                                        |  84 ++++++++++---
 fsck.h                                        |   3 +
 path.c                                        |   5 +
 t/helper/test-path-utils.c                    |  46 +++++--
 t/t0060-path-utils.sh                         |  30 +++++
 ...ule-names.sh => t7450-bad-git-dotfiles.sh} | 116 +++++++++++++-----
 utf8.c                                        |   5 +
 utf8.h                                        |   1 +
 13 files changed, 255 insertions(+), 61 deletions(-)
 rename t/{t7415-submodule-names.sh => t7450-bad-git-dotfiles.sh} (70%)

-Peff

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v2 1/9] t7415: remove out-dated comment about translation
  2021-05-03 20:42 ` [PATCH v2 " Jeff King
@ 2021-05-03 20:43   ` Jeff King
  2021-05-03 20:43   ` [PATCH v2 2/9] fsck_tree(): fix shadowed variable Jeff King
                     ` (7 subsequent siblings)
  8 siblings, 0 replies; 35+ messages in thread
From: Jeff King @ 2021-05-03 20:43 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason,
	Eric Sunshine

Since GETTEXT_POISON does not exist anymore, there is no point warning
people about whether we should use test_i18ngrep. This is doubly
confusing because the comment was describing why it was OK to use grep,
but it got caught up in the mass conversion of 674ba34038 (fsck: mark
strings for translation, 2018-11-10).

Note there are other uses of test_i18ngrep in this script which are now
obsolete; I'll save those for a mass-cleanup. My goal here was just to
fix the confusing comment in code I'm about to refactor.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t7415-submodule-names.sh | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/t/t7415-submodule-names.sh b/t/t7415-submodule-names.sh
index f70368bc2e..fef6561d80 100755
--- a/t/t7415-submodule-names.sh
+++ b/t/t7415-submodule-names.sh
@@ -151,10 +151,9 @@ test_expect_success 'fsck detects symlinked .gitmodules file' '
 		} | git mktree &&
 
 		# Check not only that we fail, but that it is due to the
-		# symlink detector; this grep string comes from the config
-		# variable name and will not be translated.
+		# symlink detector
 		test_must_fail git fsck 2>output &&
-		test_i18ngrep gitmodulesSymlink output
+		grep gitmodulesSymlink output
 	)
 '
 
-- 
2.31.1.926.gd12152deb6


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 2/9] fsck_tree(): fix shadowed variable
  2021-05-03 20:42 ` [PATCH v2 " Jeff King
  2021-05-03 20:43   ` [PATCH v2 1/9] t7415: remove out-dated comment about translation Jeff King
@ 2021-05-03 20:43   ` Jeff King
  2021-05-03 20:43   ` [PATCH v2 3/9] fsck_tree(): wrap some long lines Jeff King
                     ` (6 subsequent siblings)
  8 siblings, 0 replies; 35+ messages in thread
From: Jeff King @ 2021-05-03 20:43 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason,
	Eric Sunshine

Commit b2f2039c2b (fsck: accept an oid instead of a "struct tree" for
fsck_tree(), 2019-10-18) introduced a new "oid" parameter to
fsck_tree(), and we pass it to the report() function when we find
problems. However, that is shadowed within the tree-walking loop by the
existing "oid" variable which we use to store the oid of each tree
entry. As a result, we may report the wrong oid for some problems we
detect within the loop (the entry oid, instead of the tree oid).

Our tests didn't catch this because they checked only that we found the
expected fsck problem, not that it was attached to the correct object.

Let's rename both variables in the function to avoid confusion. This
makes the diff a little noisy (e.g., all of the report() calls outside
the loop were already correct but need to be touched), but makes sure we
catch all cases and will avoid similar confusion in the future.

And we can update the test to be a bit more specific and catch this
problem.

Signed-off-by: Jeff King <peff@peff.net>
---
 fsck.c                     | 42 ++++++++++++++++++++------------------
 t/t7415-submodule-names.sh |  5 +++--
 2 files changed, 25 insertions(+), 22 deletions(-)

diff --git a/fsck.c b/fsck.c
index f5ed6a2635..dd31ed1413 100644
--- a/fsck.c
+++ b/fsck.c
@@ -558,7 +558,7 @@ static int verify_ordered(unsigned mode1, const char *name1,
 	return c1 < c2 ? 0 : TREE_UNORDERED;
 }
 
-static int fsck_tree(const struct object_id *oid,
+static int fsck_tree(const struct object_id *tree_oid,
 		     const char *buffer, unsigned long size,
 		     struct fsck_options *options)
 {
@@ -579,7 +579,7 @@ static int fsck_tree(const struct object_id *oid,
 	struct name_stack df_dup_candidates = { NULL };
 
 	if (init_tree_desc_gently(&desc, buffer, size)) {
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree");
+		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree");
 		return retval;
 	}
 
@@ -589,11 +589,11 @@ static int fsck_tree(const struct object_id *oid,
 	while (desc.size) {
 		unsigned short mode;
 		const char *name, *backslash;
-		const struct object_id *oid;
+		const struct object_id *entry_oid;
 
-		oid = tree_entry_extract(&desc, &name, &mode);
+		entry_oid = tree_entry_extract(&desc, &name, &mode);
 
-		has_null_sha1 |= is_null_oid(oid);
+		has_null_sha1 |= is_null_oid(entry_oid);
 		has_full_path |= !!strchr(name, '/');
 		has_empty_name |= !*name;
 		has_dot |= !strcmp(name, ".");
@@ -603,10 +603,11 @@ static int fsck_tree(const struct object_id *oid,
 
 		if (is_hfs_dotgitmodules(name) || is_ntfs_dotgitmodules(name)) {
 			if (!S_ISLNK(mode))
-				oidset_insert(&options->gitmodules_found, oid);
+				oidset_insert(&options->gitmodules_found,
+					      entry_oid);
 			else
 				retval += report(options,
-						 oid, OBJ_TREE,
+						 tree_oid, OBJ_TREE,
 						 FSCK_MSG_GITMODULES_SYMLINK,
 						 ".gitmodules is a symbolic link");
 		}
@@ -617,9 +618,10 @@ static int fsck_tree(const struct object_id *oid,
 				has_dotgit |= is_ntfs_dotgit(backslash);
 				if (is_ntfs_dotgitmodules(backslash)) {
 					if (!S_ISLNK(mode))
-						oidset_insert(&options->gitmodules_found, oid);
+						oidset_insert(&options->gitmodules_found,
+							      entry_oid);
 					else
-						retval += report(options, oid, OBJ_TREE,
+						retval += report(options, tree_oid, OBJ_TREE,
 								 FSCK_MSG_GITMODULES_SYMLINK,
 								 ".gitmodules is a symbolic link");
 				}
@@ -628,7 +630,7 @@ static int fsck_tree(const struct object_id *oid,
 		}
 
 		if (update_tree_entry_gently(&desc)) {
-			retval += report(options, oid, OBJ_TREE, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree");
+			retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree");
 			break;
 		}
 
@@ -676,25 +678,25 @@ static int fsck_tree(const struct object_id *oid,
 	name_stack_clear(&df_dup_candidates);
 
 	if (has_null_sha1)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_NULL_SHA1, "contains entries pointing to null sha1");
+		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_NULL_SHA1, "contains entries pointing to null sha1");
 	if (has_full_path)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_FULL_PATHNAME, "contains full pathnames");
+		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_FULL_PATHNAME, "contains full pathnames");
 	if (has_empty_name)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_EMPTY_NAME, "contains empty pathname");
+		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_EMPTY_NAME, "contains empty pathname");
 	if (has_dot)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_HAS_DOT, "contains '.'");
+		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_HAS_DOT, "contains '.'");
 	if (has_dotdot)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_HAS_DOTDOT, "contains '..'");
+		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_HAS_DOTDOT, "contains '..'");
 	if (has_dotgit)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_HAS_DOTGIT, "contains '.git'");
+		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_HAS_DOTGIT, "contains '.git'");
 	if (has_zero_pad)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_ZERO_PADDED_FILEMODE, "contains zero-padded file modes");
+		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_ZERO_PADDED_FILEMODE, "contains zero-padded file modes");
 	if (has_bad_modes)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_BAD_FILEMODE, "contains bad file modes");
+		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_BAD_FILEMODE, "contains bad file modes");
 	if (has_dup_entries)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_DUPLICATE_ENTRIES, "contains duplicate file entries");
+		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_DUPLICATE_ENTRIES, "contains duplicate file entries");
 	if (not_properly_sorted)
-		retval += report(options, oid, OBJ_TREE, FSCK_MSG_TREE_NOT_SORTED, "not properly sorted");
+		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_TREE_NOT_SORTED, "not properly sorted");
 	return retval;
 }
 
diff --git a/t/t7415-submodule-names.sh b/t/t7415-submodule-names.sh
index fef6561d80..6a8cf3f47b 100755
--- a/t/t7415-submodule-names.sh
+++ b/t/t7415-submodule-names.sh
@@ -148,12 +148,13 @@ test_expect_success 'fsck detects symlinked .gitmodules file' '
 		{
 			printf "100644 blob $content\t$tricky\n" &&
 			printf "120000 blob $target\t.gitmodules\n"
-		} | git mktree &&
+		} >bad-tree &&
+		tree=$(git mktree <bad-tree) &&
 
 		# Check not only that we fail, but that it is due to the
 		# symlink detector
 		test_must_fail git fsck 2>output &&
-		grep gitmodulesSymlink output
+		grep "tree $tree: gitmodulesSymlink" output
 	)
 '
 
-- 
2.31.1.926.gd12152deb6


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 3/9] fsck_tree(): wrap some long lines
  2021-05-03 20:42 ` [PATCH v2 " Jeff King
  2021-05-03 20:43   ` [PATCH v2 1/9] t7415: remove out-dated comment about translation Jeff King
  2021-05-03 20:43   ` [PATCH v2 2/9] fsck_tree(): fix shadowed variable Jeff King
@ 2021-05-03 20:43   ` Jeff King
  2021-05-03 20:43   ` [PATCH v2 4/9] t7415: rename to expand scope Jeff King
                     ` (5 subsequent siblings)
  8 siblings, 0 replies; 35+ messages in thread
From: Jeff King @ 2021-05-03 20:43 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason,
	Eric Sunshine

Many calls to report() in fsck_tree() are kept on a single line and are
quite long. Most were pretty big to begin with, but have gotten even
longer over the years as we've added more parameters. Let's accept the
churn of wrapping them in order to conform to our usual line limits.

Signed-off-by: Jeff King <peff@peff.net>
---
 fsck.c | 48 ++++++++++++++++++++++++++++++++++++------------
 1 file changed, 36 insertions(+), 12 deletions(-)

diff --git a/fsck.c b/fsck.c
index dd31ed1413..db94817898 100644
--- a/fsck.c
+++ b/fsck.c
@@ -579,7 +579,9 @@ static int fsck_tree(const struct object_id *tree_oid,
 	struct name_stack df_dup_candidates = { NULL };
 
 	if (init_tree_desc_gently(&desc, buffer, size)) {
-		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree");
+		retval += report(options, tree_oid, OBJ_TREE,
+				 FSCK_MSG_BAD_TREE,
+				 "cannot be parsed as a tree");
 		return retval;
 	}
 
@@ -630,7 +632,9 @@ static int fsck_tree(const struct object_id *tree_oid,
 		}
 
 		if (update_tree_entry_gently(&desc)) {
-			retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree");
+			retval += report(options, tree_oid, OBJ_TREE,
+					 FSCK_MSG_BAD_TREE,
+					 "cannot be parsed as a tree");
 			break;
 		}
 
@@ -678,25 +682,45 @@ static int fsck_tree(const struct object_id *tree_oid,
 	name_stack_clear(&df_dup_candidates);
 
 	if (has_null_sha1)
-		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_NULL_SHA1, "contains entries pointing to null sha1");
+		retval += report(options, tree_oid, OBJ_TREE,
+				 FSCK_MSG_NULL_SHA1,
+				 "contains entries pointing to null sha1");
 	if (has_full_path)
-		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_FULL_PATHNAME, "contains full pathnames");
+		retval += report(options, tree_oid, OBJ_TREE,
+				 FSCK_MSG_FULL_PATHNAME,
+				 "contains full pathnames");
 	if (has_empty_name)
-		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_EMPTY_NAME, "contains empty pathname");
+		retval += report(options, tree_oid, OBJ_TREE,
+				 FSCK_MSG_EMPTY_NAME,
+				 "contains empty pathname");
 	if (has_dot)
-		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_HAS_DOT, "contains '.'");
+		retval += report(options, tree_oid, OBJ_TREE,
+				 FSCK_MSG_HAS_DOT,
+				 "contains '.'");
 	if (has_dotdot)
-		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_HAS_DOTDOT, "contains '..'");
+		retval += report(options, tree_oid, OBJ_TREE,
+				 FSCK_MSG_HAS_DOTDOT,
+				 "contains '..'");
 	if (has_dotgit)
-		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_HAS_DOTGIT, "contains '.git'");
+		retval += report(options, tree_oid, OBJ_TREE,
+				 FSCK_MSG_HAS_DOTGIT,
+				 "contains '.git'");
 	if (has_zero_pad)
-		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_ZERO_PADDED_FILEMODE, "contains zero-padded file modes");
+		retval += report(options, tree_oid, OBJ_TREE,
+				 FSCK_MSG_ZERO_PADDED_FILEMODE,
+				 "contains zero-padded file modes");
 	if (has_bad_modes)
-		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_BAD_FILEMODE, "contains bad file modes");
+		retval += report(options, tree_oid, OBJ_TREE,
+				 FSCK_MSG_BAD_FILEMODE,
+				 "contains bad file modes");
 	if (has_dup_entries)
-		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_DUPLICATE_ENTRIES, "contains duplicate file entries");
+		retval += report(options, tree_oid, OBJ_TREE,
+				 FSCK_MSG_DUPLICATE_ENTRIES,
+				 "contains duplicate file entries");
 	if (not_properly_sorted)
-		retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_TREE_NOT_SORTED, "not properly sorted");
+		retval += report(options, tree_oid, OBJ_TREE,
+				 FSCK_MSG_TREE_NOT_SORTED,
+				 "not properly sorted");
 	return retval;
 }
 
-- 
2.31.1.926.gd12152deb6


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 4/9] t7415: rename to expand scope
  2021-05-03 20:42 ` [PATCH v2 " Jeff King
                     ` (2 preceding siblings ...)
  2021-05-03 20:43   ` [PATCH v2 3/9] fsck_tree(): wrap some long lines Jeff King
@ 2021-05-03 20:43   ` Jeff King
  2021-05-03 20:43   ` [PATCH v2 5/9] t7450: test verify_path() handling of gitmodules Jeff King
                     ` (4 subsequent siblings)
  8 siblings, 0 replies; 35+ messages in thread
From: Jeff King @ 2021-05-03 20:43 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason,
	Eric Sunshine

This script has already expanded beyond its original intent of ".. in
submodule names" to include other malicious submodule bits. Let's update
the name and description to reflect that, as well as the fact that we'll
soon be adding similar tests for other dotfiles (.gitattributes, etc).
We'll also renumber it to move it out of the group of submodule-specific
tests.

Signed-off-by: Jeff King <peff@peff.net>
---
 ...submodule-names.sh => t7450-bad-git-dotfiles.sh} | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)
 rename t/{t7415-submodule-names.sh => t7450-bad-git-dotfiles.sh} (95%)

diff --git a/t/t7415-submodule-names.sh b/t/t7450-bad-git-dotfiles.sh
similarity index 95%
rename from t/t7415-submodule-names.sh
rename to t/t7450-bad-git-dotfiles.sh
index 6a8cf3f47b..34d4dc6def 100755
--- a/t/t7415-submodule-names.sh
+++ b/t/t7450-bad-git-dotfiles.sh
@@ -1,9 +1,16 @@
 #!/bin/sh
 
-test_description='check handling of .. in submodule names
+test_description='check broken or malicious patterns in .git* files
 
-Exercise the name-checking function on a variety of names, and then give a
-real-world setup that confirms we catch this in practice.
+Such as:
+
+  - presence of .. in submodule names;
+    Exercise the name-checking function on a variety of names, and then give a
+    real-world setup that confirms we catch this in practice.
+
+  - nested submodule names
+
+  - symlinked .gitmodules, etc
 '
 . ./test-lib.sh
 . "$TEST_DIRECTORY"/lib-pack.sh
-- 
2.31.1.926.gd12152deb6


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 5/9] t7450: test verify_path() handling of gitmodules
  2021-05-03 20:42 ` [PATCH v2 " Jeff King
                     ` (3 preceding siblings ...)
  2021-05-03 20:43   ` [PATCH v2 4/9] t7415: rename to expand scope Jeff King
@ 2021-05-03 20:43   ` Jeff King
  2021-05-03 20:43   ` [PATCH v2 6/9] t7450: test .gitmodules symlink matching against obscured names Jeff King
                     ` (3 subsequent siblings)
  8 siblings, 0 replies; 35+ messages in thread
From: Jeff King @ 2021-05-03 20:43 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason,
	Eric Sunshine

Commit 10ecfa7649 (verify_path: disallow symlinks in .gitmodules,
2018-05-04) made it impossible to load a symlink .gitmodules file into
the index. However, there are no tests of this behavior. Let's make sure
this case is covered. We can easily reuse the test setup created by
the matching b7b1fca175 (fsck: complain when .gitmodules is a symlink,
2018-05-04).

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t7450-bad-git-dotfiles.sh | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/t/t7450-bad-git-dotfiles.sh b/t/t7450-bad-git-dotfiles.sh
index 34d4dc6def..eace9a2216 100755
--- a/t/t7450-bad-git-dotfiles.sh
+++ b/t/t7450-bad-git-dotfiles.sh
@@ -139,7 +139,7 @@ test_expect_success 'index-pack --strict works for non-repo pack' '
 	grep gitmodulesName output
 '
 
-test_expect_success 'fsck detects symlinked .gitmodules file' '
+test_expect_success 'set up repo with symlinked .gitmodules file' '
 	git init symlink &&
 	(
 		cd symlink &&
@@ -155,8 +155,14 @@ test_expect_success 'fsck detects symlinked .gitmodules file' '
 		{
 			printf "100644 blob $content\t$tricky\n" &&
 			printf "120000 blob $target\t.gitmodules\n"
-		} >bad-tree &&
-		tree=$(git mktree <bad-tree) &&
+		} >bad-tree
+	) &&
+	tree=$(git -C symlink mktree <symlink/bad-tree)
+'
+
+test_expect_success 'fsck detects symlinked .gitmodules file' '
+	(
+		cd symlink &&
 
 		# Check not only that we fail, but that it is due to the
 		# symlink detector
@@ -165,6 +171,13 @@ test_expect_success 'fsck detects symlinked .gitmodules file' '
 	)
 '
 
+test_expect_success 'refuse to load symlinked .gitmodules into index' '
+	test_must_fail git -C symlink read-tree $tree 2>err &&
+	grep "invalid path.*gitmodules" err &&
+	git -C symlink ls-files >out &&
+	test_must_be_empty out
+'
+
 test_expect_success 'fsck detects non-blob .gitmodules' '
 	git init non-blob &&
 	(
-- 
2.31.1.926.gd12152deb6


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 6/9] t7450: test .gitmodules symlink matching against obscured names
  2021-05-03 20:42 ` [PATCH v2 " Jeff King
                     ` (4 preceding siblings ...)
  2021-05-03 20:43   ` [PATCH v2 5/9] t7450: test verify_path() handling of gitmodules Jeff King
@ 2021-05-03 20:43   ` Jeff King
  2021-05-03 20:43   ` [PATCH v2 7/9] t0060: test ntfs/hfs-obscured dotfiles Jeff King
                     ` (2 subsequent siblings)
  8 siblings, 0 replies; 35+ messages in thread
From: Jeff King @ 2021-05-03 20:43 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason,
	Eric Sunshine

In t7450 we check that both verify_path() and fsck catch malformed
.gitmodules entries in trees. However, we don't check that we catch
filesystem-equivalent forms of these (e.g., ".GITMOD~1" on Windows).
Our name-matching functions are exercised well in t0060, but there's
nothing to test that we correctly call the matching functions from the
actual fsck and verify_path() code.

So instead of testing just .gitmodules, let's repeat our tests for a few
basic cases. We don't need to be exhaustive here (t0060 handles that),
but just make sure we hit one name of each type.

Besides pushing the tests into a function that takes the path as a
parameter, we'll need to do a few things:

  - adjust the directory name to accommodate the tests running multiple
    times

  - set core.protecthfs for index checks. Fsck always protects all types
    by default, but we want to be able to exercise the HFS routines on
    every system. Note that core.protectntfs is already the default
    these days, but it doesn't hurt to explicitly label our need for it.

  - we'll also take the filename ("gitmodules") as a parameter. All
    calls use the same name for now, but a future patch will extend this
    to handle other .gitfoo files. Note that our fake-content symlink
    destination is somewhat .gitmodules specific. But it isn't necessary
    for other files (which don't do a content check). And it happens to
    be a valid attribute and ignore file anyway.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t7450-bad-git-dotfiles.sh | 91 +++++++++++++++++++++----------------
 1 file changed, 53 insertions(+), 38 deletions(-)

diff --git a/t/t7450-bad-git-dotfiles.sh b/t/t7450-bad-git-dotfiles.sh
index eace9a2216..b494d72976 100755
--- a/t/t7450-bad-git-dotfiles.sh
+++ b/t/t7450-bad-git-dotfiles.sh
@@ -139,44 +139,59 @@ test_expect_success 'index-pack --strict works for non-repo pack' '
 	grep gitmodulesName output
 '
 
-test_expect_success 'set up repo with symlinked .gitmodules file' '
-	git init symlink &&
-	(
-		cd symlink &&
-
-		# Make the tree directly to avoid index restrictions.
-		#
-		# Because symlinks store the target as a blob, choose
-		# a pathname that could be parsed as a .gitmodules file
-		# to trick naive non-symlink-aware checking.
-		tricky="[foo]bar=true" &&
-		content=$(git hash-object -w ../.gitmodules) &&
-		target=$(printf "$tricky" | git hash-object -w --stdin) &&
-		{
-			printf "100644 blob $content\t$tricky\n" &&
-			printf "120000 blob $target\t.gitmodules\n"
-		} >bad-tree
-	) &&
-	tree=$(git -C symlink mktree <symlink/bad-tree)
-'
-
-test_expect_success 'fsck detects symlinked .gitmodules file' '
-	(
-		cd symlink &&
-
-		# Check not only that we fail, but that it is due to the
-		# symlink detector
-		test_must_fail git fsck 2>output &&
-		grep "tree $tree: gitmodulesSymlink" output
-	)
-'
-
-test_expect_success 'refuse to load symlinked .gitmodules into index' '
-	test_must_fail git -C symlink read-tree $tree 2>err &&
-	grep "invalid path.*gitmodules" err &&
-	git -C symlink ls-files >out &&
-	test_must_be_empty out
-'
+check_dotx_symlink () {
+	name=$1
+	type=$2
+	path=$3
+	dir=symlink-$name-$type
+
+	test_expect_success "set up repo with symlinked $name ($type)" '
+		git init $dir &&
+		(
+			cd $dir &&
+
+			# Make the tree directly to avoid index restrictions.
+			#
+			# Because symlinks store the target as a blob, choose
+			# a pathname that could be parsed as a .gitmodules file
+			# to trick naive non-symlink-aware checking.
+			tricky="[foo]bar=true" &&
+			content=$(git hash-object -w ../.gitmodules) &&
+			target=$(printf "$tricky" | git hash-object -w --stdin) &&
+			{
+				printf "100644 blob $content\t$tricky\n" &&
+				printf "120000 blob $target\t$path\n"
+			} >bad-tree
+		) &&
+		tree=$(git -C $dir mktree <$dir/bad-tree)
+	'
+
+	test_expect_success "fsck detects symlinked $name ($type)" '
+		(
+			cd $dir &&
+
+			# Check not only that we fail, but that it is due to the
+			# symlink detector
+			test_must_fail git fsck 2>output &&
+			grep "tree $tree: ${name}Symlink" output
+		)
+	'
+
+	test_expect_success "refuse to load symlinked $name into index ($type)" '
+		test_must_fail \
+			git -C $dir \
+			    -c core.protectntfs \
+			    -c core.protecthfs \
+			    read-tree $tree 2>err &&
+		grep "invalid path.*$name" err &&
+		git -C $dir ls-files -s >out &&
+		test_must_be_empty out
+	'
+}
+
+check_dotx_symlink gitmodules vanilla .gitmodules
+check_dotx_symlink gitmodules ntfs ".gitmodules ."
+check_dotx_symlink gitmodules hfs ".${u200c}gitmodules"
 
 test_expect_success 'fsck detects non-blob .gitmodules' '
 	git init non-blob &&
-- 
2.31.1.926.gd12152deb6


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 7/9] t0060: test ntfs/hfs-obscured dotfiles
  2021-05-03 20:42 ` [PATCH v2 " Jeff King
                     ` (5 preceding siblings ...)
  2021-05-03 20:43   ` [PATCH v2 6/9] t7450: test .gitmodules symlink matching against obscured names Jeff King
@ 2021-05-03 20:43   ` Jeff King
  2021-05-03 20:43   ` [PATCH v2 8/9] fsck: warn about symlinked dotfiles we'll open with O_NOFOLLOW Jeff King
  2021-05-03 20:43   ` [PATCH v2 9/9] docs: document symlink restrictions for dot-files Jeff King
  8 siblings, 0 replies; 35+ messages in thread
From: Jeff King @ 2021-05-03 20:43 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason,
	Eric Sunshine

We have tests that cover various filesystem-specific spellings of
".gitmodules", because we need to reliably identify that path for some
security checks. These are from dc2d9ba318 (is_{hfs,ntfs}_dotgitmodules:
add tests, 2018-05-12), with the actual code coming from e7cb0b4455
(is_ntfs_dotgit: match other .git files, 2018-05-11) and 0fc333ba20
(is_hfs_dotgit: match other .git files, 2018-05-02).

Those latter two commits also added similar matching functions for
.gitattributes and .gitignore. These ended up not being used in the
final series, and are currently dead code. But in preparation for them
being used in some fsck checks, let's make sure they actually work by
throwing a few basic tests at them. Likewise, let's cover .mailmap
(which does need matching code added).

I didn't bother with the whole battery of tests that we cover for
.gitmodules. These functions are all based on the same generic matcher,
so it's sufficient to test most of the corner cases just once.

Note that the ntfs magic prefix names in the tests come from the
algorithm described in e7cb0b4455 (and are different for each file).

Signed-off-by: Jeff King <peff@peff.net>
---
 cache.h                    |  1 +
 path.c                     |  5 +++++
 t/helper/test-path-utils.c | 46 +++++++++++++++++++++++++++-----------
 t/t0060-path-utils.sh      | 30 +++++++++++++++++++++++++
 utf8.c                     |  5 +++++
 utf8.h                     |  1 +
 6 files changed, 75 insertions(+), 13 deletions(-)

diff --git a/cache.h b/cache.h
index b785ffb383..e6dda88fb0 100644
--- a/cache.h
+++ b/cache.h
@@ -1271,6 +1271,7 @@ int is_ntfs_dotgit(const char *name);
 int is_ntfs_dotgitmodules(const char *name);
 int is_ntfs_dotgitignore(const char *name);
 int is_ntfs_dotgitattributes(const char *name);
+int is_ntfs_dotmailmap(const char *name);
 
 /*
  * Returns true iff "str" could be confused as a command-line option when
diff --git a/path.c b/path.c
index 9e883eb524..7bccd830e9 100644
--- a/path.c
+++ b/path.c
@@ -1493,6 +1493,11 @@ int is_ntfs_dotgitattributes(const char *name)
 	return is_ntfs_dot_str(name, "gitattributes", "gi7d29");
 }
 
+int is_ntfs_dotmailmap(const char *name)
+{
+	return is_ntfs_dot_str(name, "mailmap", "maba30");
+}
+
 int looks_like_command_line_option(const char *str)
 {
 	return str && str[0] == '-';
diff --git a/t/helper/test-path-utils.c b/t/helper/test-path-utils.c
index 313a153209..229ed416b0 100644
--- a/t/helper/test-path-utils.c
+++ b/t/helper/test-path-utils.c
@@ -172,9 +172,22 @@ static struct test_data dirname_data[] = {
 	{ NULL,              NULL     }
 };
 
-static int is_dotgitmodules(const char *path)
+static int check_dotfile(const char *x, const char **argv,
+			 int (*is_hfs)(const char *),
+			 int (*is_ntfs)(const char *))
 {
-	return is_hfs_dotgitmodules(path) || is_ntfs_dotgitmodules(path);
+	int res = 0, expect = 1;
+	for (; *argv; argv++) {
+		if (!strcmp("--not", *argv))
+			expect = !expect;
+		else if (expect != (is_hfs(*argv) || is_ntfs(*argv)))
+			res = error("'%s' is %s.git%s", *argv,
+				    expect ? "not " : "", x);
+		else
+			fprintf(stderr, "ok: '%s' is %s.git%s\n",
+				*argv, expect ? "" : "not ", x);
+	}
+	return !!res;
 }
 
 static int cmp_by_st_size(const void *a, const void *b)
@@ -382,17 +395,24 @@ int cmd__path_utils(int argc, const char **argv)
 		return test_function(dirname_data, posix_dirname, argv[1]);
 
 	if (argc > 2 && !strcmp(argv[1], "is_dotgitmodules")) {
-		int res = 0, expect = 1, i;
-		for (i = 2; i < argc; i++)
-			if (!strcmp("--not", argv[i]))
-				expect = !expect;
-			else if (expect != is_dotgitmodules(argv[i]))
-				res = error("'%s' is %s.gitmodules", argv[i],
-					    expect ? "not " : "");
-			else
-				fprintf(stderr, "ok: '%s' is %s.gitmodules\n",
-					argv[i], expect ? "" : "not ");
-		return !!res;
+		return check_dotfile("modules", argv + 2,
+				     is_hfs_dotgitmodules,
+				     is_ntfs_dotgitmodules);
+	}
+	if (argc > 2 && !strcmp(argv[1], "is_dotgitignore")) {
+		return check_dotfile("ignore", argv + 2,
+				     is_hfs_dotgitignore,
+				     is_ntfs_dotgitignore);
+	}
+	if (argc > 2 && !strcmp(argv[1], "is_dotgitattributes")) {
+		return check_dotfile("attributes", argv + 2,
+				     is_hfs_dotgitattributes,
+				     is_ntfs_dotgitattributes);
+	}
+	if (argc > 2 && !strcmp(argv[1], "is_dotmailmap")) {
+		return check_dotfile("mailmap", argv + 2,
+				     is_hfs_dotmailmap,
+				     is_ntfs_dotmailmap);
 	}
 
 	if (argc > 2 && !strcmp(argv[1], "file-size")) {
diff --git a/t/t0060-path-utils.sh b/t/t0060-path-utils.sh
index 0ff06b5d1b..de4960783f 100755
--- a/t/t0060-path-utils.sh
+++ b/t/t0060-path-utils.sh
@@ -468,6 +468,36 @@ test_expect_success 'match .gitmodules' '
 		.gitmodules,:\$DATA
 '
 
+test_expect_success 'match .gitattributes' '
+	test-tool path-utils is_dotgitattributes \
+		.gitattributes \
+		.git${u200c}attributes \
+		.Gitattributes \
+		.gitattributeS \
+		GITATT~1 \
+		GI7D29~1
+'
+
+test_expect_success 'match .gitignore' '
+	test-tool path-utils is_dotgitignore \
+		.gitignore \
+		.git${u200c}ignore \
+		.Gitignore \
+		.gitignorE \
+		GITIGN~1 \
+		GI250A~1
+'
+
+test_expect_success 'match .mailmap' '
+	test-tool path-utils is_dotmailmap \
+		.mailmap \
+		.mail${u200c}map \
+		.Mailmap \
+		.mailmaP \
+		MAILMA~1 \
+		MABA30~1
+'
+
 test_expect_success MINGW 'is_valid_path() on Windows' '
 	test-tool path-utils is_valid_path \
 		win32 \
diff --git a/utf8.c b/utf8.c
index 5b39361ada..de4ce5c0e6 100644
--- a/utf8.c
+++ b/utf8.c
@@ -777,6 +777,11 @@ int is_hfs_dotgitattributes(const char *path)
 	return is_hfs_dot_str(path, "gitattributes");
 }
 
+int is_hfs_dotmailmap(const char *path)
+{
+	return is_hfs_dot_str(path, "mailmap");
+}
+
 const char utf8_bom[] = "\357\273\277";
 
 int skip_utf8_bom(char **text, size_t len)
diff --git a/utf8.h b/utf8.h
index fcd5167baf..9a16c8679d 100644
--- a/utf8.h
+++ b/utf8.h
@@ -61,6 +61,7 @@ int is_hfs_dotgit(const char *path);
 int is_hfs_dotgitmodules(const char *path);
 int is_hfs_dotgitignore(const char *path);
 int is_hfs_dotgitattributes(const char *path);
+int is_hfs_dotmailmap(const char *path);
 
 typedef enum {
 	ALIGN_LEFT,
-- 
2.31.1.926.gd12152deb6


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 8/9] fsck: warn about symlinked dotfiles we'll open with O_NOFOLLOW
  2021-05-03 20:42 ` [PATCH v2 " Jeff King
                     ` (6 preceding siblings ...)
  2021-05-03 20:43   ` [PATCH v2 7/9] t0060: test ntfs/hfs-obscured dotfiles Jeff King
@ 2021-05-03 20:43   ` Jeff King
  2021-05-03 20:43   ` [PATCH v2 9/9] docs: document symlink restrictions for dot-files Jeff King
  8 siblings, 0 replies; 35+ messages in thread
From: Jeff King @ 2021-05-03 20:43 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason,
	Eric Sunshine

In the commits merged in via 204333b015 (Merge branch
'jk/open-dotgitx-with-nofollow', 2021-03-22), we stopped following
symbolic links for .gitattributes, .gitignore, and .mailmap files.

Let's teach fsck to warn that these symlinks are not going to do
anything. Note that this is just a warning, and won't block the objects
via transfer.fsckObjects, since there are reported to be cases of this
in the wild (and even once fixed, they will continue to exist in the
commit history of those projects, but are not particularly dangerous).

Note that we won't add these to the existing gitmodules block in the
fsck code. The logic for gitmodules is a bit more complicated, as we
also check the content of non-symlink instances we find. But for these
new files, there is no content check; we're just looking at the name and
mode of the tree entry (and we can avoid even the complicated name
checks in the common case that the mode doesn't indicate a symlink).

We can reuse the test helper function we defined for .gitmodules, though
(it needs some slight adjustments for the fsck error code, and because
we don't block these symlinks via verify_path()).

Note that I didn't explicitly test the transfer.fsckObjects case here
(nor does the existing .gitmodules test that it blocks a push). The
translation of fsck severities to outcomes is covered in general in
t5504.

Signed-off-by: Jeff King <peff@peff.net>
---
 fsck.c                      | 18 ++++++++++++++++++
 fsck.h                      |  3 +++
 t/t7450-bad-git-dotfiles.sh | 29 +++++++++++++++++++++++++++--
 3 files changed, 48 insertions(+), 2 deletions(-)

diff --git a/fsck.c b/fsck.c
index db94817898..3ec500d707 100644
--- a/fsck.c
+++ b/fsck.c
@@ -614,6 +614,24 @@ static int fsck_tree(const struct object_id *tree_oid,
 						 ".gitmodules is a symbolic link");
 		}
 
+		if (S_ISLNK(mode)) {
+			if (is_hfs_dotgitignore(name) ||
+			    is_ntfs_dotgitignore(name))
+				retval += report(options, tree_oid, OBJ_TREE,
+						 FSCK_MSG_GITIGNORE_SYMLINK,
+						 ".gitignore is a symlink");
+			if (is_hfs_dotgitattributes(name) ||
+			    is_ntfs_dotgitattributes(name))
+				retval += report(options, tree_oid, OBJ_TREE,
+						 FSCK_MSG_GITATTRIBUTES_SYMLINK,
+						 ".gitattributes is a symlink");
+			if (is_hfs_dotmailmap(name) ||
+			    is_ntfs_dotmailmap(name))
+				retval += report(options, tree_oid, OBJ_TREE,
+						 FSCK_MSG_MAILMAP_SYMLINK,
+						 ".mailmap is a symlink");
+		}
+
 		if ((backslash = strchr(name, '\\'))) {
 			while (backslash) {
 				backslash++;
diff --git a/fsck.h b/fsck.h
index 7202c3c87e..d07f7a2459 100644
--- a/fsck.h
+++ b/fsck.h
@@ -67,6 +67,9 @@ enum fsck_msg_type {
 	FUNC(NUL_IN_COMMIT, WARN) \
 	/* infos (reported as warnings, but ignored by default) */ \
 	FUNC(GITMODULES_PARSE, INFO) \
+	FUNC(GITIGNORE_SYMLINK, INFO) \
+	FUNC(GITATTRIBUTES_SYMLINK, INFO) \
+	FUNC(MAILMAP_SYMLINK, INFO) \
 	FUNC(BAD_TAG_NAME, INFO) \
 	FUNC(MISSING_TAGGER_ENTRY, INFO) \
 	/* ignored (elevated when requested) */ \
diff --git a/t/t7450-bad-git-dotfiles.sh b/t/t7450-bad-git-dotfiles.sh
index b494d72976..e2773bb06d 100755
--- a/t/t7450-bad-git-dotfiles.sh
+++ b/t/t7450-bad-git-dotfiles.sh
@@ -140,6 +140,18 @@ test_expect_success 'index-pack --strict works for non-repo pack' '
 '
 
 check_dotx_symlink () {
+	fsck_must_fail=test_must_fail
+	fsck_prefix=error
+	refuse_index=t
+	case "$1" in
+	--warning)
+		fsck_must_fail=
+		fsck_prefix=warning
+		refuse_index=
+		shift
+		;;
+	esac
+
 	name=$1
 	type=$2
 	path=$3
@@ -172,11 +184,12 @@ check_dotx_symlink () {
 
 			# Check not only that we fail, but that it is due to the
 			# symlink detector
-			test_must_fail git fsck 2>output &&
-			grep "tree $tree: ${name}Symlink" output
+			$fsck_must_fail git fsck 2>output &&
+			grep "$fsck_prefix.*tree $tree: ${name}Symlink" output
 		)
 	'
 
+	test -n "$refuse_index" &&
 	test_expect_success "refuse to load symlinked $name into index ($type)" '
 		test_must_fail \
 			git -C $dir \
@@ -193,6 +206,18 @@ check_dotx_symlink gitmodules vanilla .gitmodules
 check_dotx_symlink gitmodules ntfs ".gitmodules ."
 check_dotx_symlink gitmodules hfs ".${u200c}gitmodules"
 
+check_dotx_symlink --warning gitattributes vanilla .gitattributes
+check_dotx_symlink --warning gitattributes ntfs ".gitattributes ."
+check_dotx_symlink --warning gitattributes hfs ".${u200c}gitattributes"
+
+check_dotx_symlink --warning gitignore vanilla .gitignore
+check_dotx_symlink --warning gitignore ntfs ".gitignore ."
+check_dotx_symlink --warning gitignore hfs ".${u200c}gitignore"
+
+check_dotx_symlink --warning mailmap vanilla .mailmap
+check_dotx_symlink --warning mailmap ntfs ".mailmap ."
+check_dotx_symlink --warning mailmap hfs ".${u200c}mailmap"
+
 test_expect_success 'fsck detects non-blob .gitmodules' '
 	git init non-blob &&
 	(
-- 
2.31.1.926.gd12152deb6


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 9/9] docs: document symlink restrictions for dot-files
  2021-05-03 20:42 ` [PATCH v2 " Jeff King
                     ` (7 preceding siblings ...)
  2021-05-03 20:43   ` [PATCH v2 8/9] fsck: warn about symlinked dotfiles we'll open with O_NOFOLLOW Jeff King
@ 2021-05-03 20:43   ` Jeff King
  8 siblings, 0 replies; 35+ messages in thread
From: Jeff King @ 2021-05-03 20:43 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason,
	Eric Sunshine

We stopped allowing symlinks for .gitmodules files in 10ecfa7649
(verify_path: disallow symlinks in .gitmodules, 2018-05-04), and we
stopped following symlinks for .gitattributes, .gitignore, and .mailmap
in the commits from 204333b015 (Merge branch 'jk/open-dotgitx-with-nofollow',
2021-03-22). The reasons are discussed in detail there, but we never
adjusted the documentation to let users know.

This hasn't been a big deal since the point is that such setups were
mildly broken and thought to be unusual anyway. But it certainly doesn't
hurt to be clear and explicit about it.

Suggested-by: Philip Oakley <philipoakley@iee.email>
Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/gitattributes.txt | 6 ++++++
 Documentation/gitignore.txt     | 4 ++++
 Documentation/gitmailmap.txt    | 7 +++++++
 Documentation/gitmodules.txt    | 8 ++++++++
 4 files changed, 25 insertions(+)

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index cfcfa800c2..83fd4e19a4 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -1247,6 +1247,12 @@ to:
 [attr]binary -diff -merge -text
 ------------
 
+NOTES
+-----
+
+Git does not follow symbolic links when accessing a `.gitattributes`
+file in the working tree. This keeps behavior consistent when the file
+is accessed from the index or a tree versus from the filesystem.
 
 EXAMPLES
 --------
diff --git a/Documentation/gitignore.txt b/Documentation/gitignore.txt
index 5751603b13..53e7d5c914 100644
--- a/Documentation/gitignore.txt
+++ b/Documentation/gitignore.txt
@@ -149,6 +149,10 @@ not tracked by Git remain untracked.
 To stop tracking a file that is currently tracked, use
 'git rm --cached'.
 
+Git does not follow symbolic links when accessing a `.gitignore` file in
+the working tree. This keeps behavior consistent when the file is
+accessed from the index or a tree versus from the filesystem.
+
 EXAMPLES
 --------
 
diff --git a/Documentation/gitmailmap.txt b/Documentation/gitmailmap.txt
index 3fb39f801f..06f4af93fe 100644
--- a/Documentation/gitmailmap.txt
+++ b/Documentation/gitmailmap.txt
@@ -55,6 +55,13 @@ this would also match the 'Commit Name <commit&#64;email.xx>' above:
 	Proper Name <proper@email.xx> CoMmIt NaMe <CoMmIt@EmAiL.xX>
 --
 
+NOTES
+-----
+
+Git does not follow symbolic links when accessing a `.mailmap` file in
+the working tree. This keeps behavior consistent when the file is
+accessed from the index or a tree versus from the filesystem.
+
 EXAMPLES
 --------
 
diff --git a/Documentation/gitmodules.txt b/Documentation/gitmodules.txt
index 8e333dde1b..dcee09b500 100644
--- a/Documentation/gitmodules.txt
+++ b/Documentation/gitmodules.txt
@@ -98,6 +98,14 @@ submodule.<name>.shallow::
 	shallow clone (with a history depth of 1) unless the user explicitly
 	asks for a non-shallow clone.
 
+NOTES
+-----
+
+Git does not allow the `.gitmodules` file within a working tree to be a
+symbolic link, and will refuse to check out such a tree entry. This
+keeps behavior consistent when the file is accessed from the index or a
+tree versus from the filesystem, and helps Git reliably enforce security
+checks of the file contents.
 
 EXAMPLES
 --------
-- 
2.31.1.926.gd12152deb6

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/9] fsck_tree(): fix shadowed variable
  2021-05-03 20:13     ` Jeff King
@ 2021-05-04 10:10       ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 35+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-05-04 10:10 UTC (permalink / raw)
  To: Jeff King; +Cc: git


On Mon, May 03 2021, Jeff King wrote:

> On Mon, May 03, 2021 at 01:15:03PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> > Commit b2f2039c2b (fsck: accept an oid instead of a "struct tree" for
>> > fsck_tree(), 2019-10-18) introduced a new "oid" parameter to
>> > fsck_tree(), and we pass it to the report() function when we find
>> [...]
>> 
>> Have you considered just passing down the "obj" instead of the "oid"?
>> It's a bit of a bigger change, but seems to be worth it in this case as
>> the below diff (on top of master) shows. We spend quite a bit of effort
>> peeling an "obj" just to pass oid/type further down the stack.
>
> That would be undoing the point of the referenced commit above. More
> discussion in this thread:
>
>   https://lore.kernel.org/git/20191018044103.GA17625@sigill.intra.peff.net/
>
> but the gist is:
>
>   - there were bugs caused by looking at the object structs; not having
>     it make sure we can't introduce similar bugs
>
>   - using oids and buffers makes it possible for code to avoid having
>     object structs at all. index-pack could take advantage of this (to
>     reduce memory usage), but nobody has yet written that patch.

Ah, I'd managed to read this series + done some previous fsck hacking
without noticing that recent-ish change.

FWIW I'd been meaning to slowly mutate some of the fsck code somewhat in
the opposite direction. I.e. we have N object parsers now between
{blob,commit,tree,tag,fsck,tree-walk}.c and elsewhere. Ideally we could
share all/most of the logic and piggy-back in fsck on the "main" parser
in some more exhaustive mode.

Unfortunately most of our fsck tests are rather non-exhaustive, so I see
e.g. in reverting some of that on our current test suite e.g. a test in
directory.t6102-rev-list-unexpected-objects was changed from:

    error in commit 536c6159e861ef1ac7967a68324a8e7614632970: missingParent: parent objects missing

To:

    error in commit 536c6159e861ef1ac7967a68324a8e7614632970: broken links

So by using the "main" parser in this case we lost the ability to
selectively ignore that error, looks like nobody cared (and the OID
could still be added to the skiplist).

Another thing that got worse (but we arguably never supported well) is
fsck_commit() and expecting to validate at least a single commit
object. Before we'd notice missing parents AFAICT, after we don't at
that entry point, but will of course if entering via fsck_walk().

I don't think any current caller cares, but when I rewrote
builtin/mktag.c recently there was a suggestion to do that "fsck the new
object" for the low-level commit/tree/blob APIs too. Before (and again,
AFACT) that would work to a level of 1, after you'll need to instrument
some limited walk.

Anyway. This inspired me to try re-arranging the "struct object" to be a
superset of "struct object_id", i.e. to make it start with the oid
instead of "parsed" etc. It means you can do this:

	static inline struct object_id *object_to_oid(const struct object *obj)
	{
	        return (struct object_id *)obj;
	}

And change common cases like oid_to_hex(&obj->oid) to
oid_to_hex(object_to_oid(obj)), i.e. a wrapper for a pure cast. Just
that seems to speed up fsck by 1-3%, but I'm not 100% certain yet. The
cachegrind numbers look unambiguously better. This is with gcc and -O3.

More generally we have various other structures that wrap an "oid" in
one way or the other, where the common access pattern is just grabbing
the OID out of it, but it's not the first entry. E.g. "name_entry",
"directory" etc. By re-arranging it like that we could also have a
"object_id_type" which is a 2-member touple of "oid/type", so by having
that "oid/type" first in "struct object" you can cast "struct object" to
"struct object_id_type" for fsck.c use, and further cast a "struct
object_id_type" to "struct object_id" for anything that needs that.

Anyway, right now I don't have anything conclusive. Just thought I'd
poke you/others with the above in case it's an appraoch that's been
tried/considered before.

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2021-05-04 12:26 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-01 15:40 [PATCH 0/9] leftover bits from symlinked gitattributes, etc topics Jeff King
2021-05-01 15:41 ` [PATCH 1/9] t7415: remove out-dated comment about translation Jeff King
2021-05-03  9:46   ` Ævar Arnfjörð Bjarmason
2021-05-03 20:29     ` Jeff King
2021-05-01 15:41 ` [PATCH 2/9] fsck_tree(): fix shadowed variable Jeff King
2021-05-03 11:15   ` Ævar Arnfjörð Bjarmason
2021-05-03 20:13     ` Jeff King
2021-05-04 10:10       ` Ævar Arnfjörð Bjarmason
2021-05-01 15:41 ` [PATCH 3/9] fsck_tree(): wrap some long lines Jeff King
2021-05-03 11:22   ` Ævar Arnfjörð Bjarmason
2021-05-03 20:23     ` Jeff King
2021-05-01 15:42 ` [PATCH 4/9] t7415: rename to expand scope Jeff King
2021-05-01 15:42 ` [PATCH 5/9] t7450: test verify_path() handling of gitmodules Jeff King
2021-05-01 18:55   ` Eric Sunshine
2021-05-01 19:03     ` Eric Sunshine
2021-05-03 19:39       ` Jeff King
2021-05-03 10:12   ` Ævar Arnfjörð Bjarmason
2021-05-03 20:32     ` Jeff King
2021-05-01 15:42 ` [PATCH 6/9] t7450: test .gitmodules symlink matching against obscured names Jeff King
2021-05-01 15:42 ` [PATCH 7/9] t0060: test ntfs/hfs-obscured dotfiles Jeff King
2021-05-01 15:43 ` [PATCH 8/9] fsck: warn about symlinked dotfiles we'll open with O_NOFOLLOW Jeff King
2021-05-01 15:43 ` [PATCH 9/9] docs: document symlink restrictions for dot-files Jeff King
2021-05-01 19:16   ` Eric Sunshine
2021-05-03 20:33     ` Jeff King
2021-05-03  5:36 ` [PATCH 0/9] leftover bits from symlinked gitattributes, etc topics Junio C Hamano
2021-05-03 20:42 ` [PATCH v2 " Jeff King
2021-05-03 20:43   ` [PATCH v2 1/9] t7415: remove out-dated comment about translation Jeff King
2021-05-03 20:43   ` [PATCH v2 2/9] fsck_tree(): fix shadowed variable Jeff King
2021-05-03 20:43   ` [PATCH v2 3/9] fsck_tree(): wrap some long lines Jeff King
2021-05-03 20:43   ` [PATCH v2 4/9] t7415: rename to expand scope Jeff King
2021-05-03 20:43   ` [PATCH v2 5/9] t7450: test verify_path() handling of gitmodules Jeff King
2021-05-03 20:43   ` [PATCH v2 6/9] t7450: test .gitmodules symlink matching against obscured names Jeff King
2021-05-03 20:43   ` [PATCH v2 7/9] t0060: test ntfs/hfs-obscured dotfiles Jeff King
2021-05-03 20:43   ` [PATCH v2 8/9] fsck: warn about symlinked dotfiles we'll open with O_NOFOLLOW Jeff King
2021-05-03 20:43   ` [PATCH v2 9/9] docs: document symlink restrictions for dot-files Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).